# cat /etc/sysctl.conf
fs.aio-max-nr=99999999
fs.file-max=99999999
kernel.pid_max=4194304
kernel.threads-max=99999999
kernel.sem=32768 1073741824 2000 32768
kernel.shmmni=32768
kernel.msgmni=32768
kernel.msgmax=65536
kernel.msgmnb=65536
vm.max_map_count=1048576
# cat /etc/security/limits.conf * soft core unlimited * hard core unlimited * soft data unlimited * hard data unlimited * soft fsize unlimited * hard fsize unlimited * soft memlock unlimited * hard memlock unlimited * soft nofile 1048576 * hard nofile 1048576 * soft rss unlimited * hard rss unlimited * soft stack unlimited * hard stack unlimited * soft cpu unlimited * hard cpu unlimited * soft nproc unlimited * hard nproc unlimited * soft as unlimited * hard as unlimited * soft maxlogins unlimited * hard maxlogins unlimited * soft maxsyslogins unlimited * hard maxsyslogins unlimited * soft locks unlimited * hard locks unlimited * soft sigpending unlimited * hard sigpending unlimited * soft msgqueue unlimited * hard msgqueue unlimited
# cat /etc/systemd/logind.conf
[Login]
UserTasksMax=infinity
# free -g total used free shared buff/cache available
Mem: 117 5 44 62 67 48
Swap: 15 8 7
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 194G 121G 74G 63% /
# cat /proc/meminfo
MemTotal: 123665416 kB
MemFree: 90979152 kB
MemAvailable: 95376636 kB
Buffers: 72260 kB
Cached: 25964076 kB
SwapCached: 0 kB
Active: 8706568 kB
Inactive: 22983044 kB
Active(anon): 7568968 kB
Inactive(anon): 18871224 kB
Active(file): 1137600 kB
Inactive(file): 4111820 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 16777212 kB
SwapFree: 16777212 kB
Dirty: 20 kB
Writeback: 0 kB
AnonPages: 5653128 kB
Mapped: 185100 kB
Shmem: 20786924 kB
KReclaimable: 281732 kB
Slab: 541000 kB
SReclaimable: 281732 kB
SUnreclaim: 259268 kB
KernelStack: 34384 kB
PageTables: 93216 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 78609920 kB
Committed_AS: 63750908 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 46584 kB
VmallocChunk: 0 kB
Percpu: 18944 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 183484 kB
DirectMap2M: 5058560 kB
DirectMap1G: 122683392 kBAnd for the user account used to run the scripts:
$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimitedYet
./somescript.sh: fork: retry: Resource temporarily unavailableThe server has medium load (~ 20 load average atm), and uses many scripts which do extensive forking (i.e. $(comecode) inside many scripts). The server (Google cloud instance) has 16 cores and 128GB ram with a 100GB tmpfs drive and 16GB swap. Even when the CPU, the memory and the swap are all under 50% use the message shows.
It is hard to believe it would be hitting any of these already high upper limits. I suspect there is some other setting that affect this.
What else can be tuned to avoid this fork: retry: Resource temporarily unavailable issue?
1 Answer
After more debugging I finally found the answer. The answer seems very valuable in that others may run into this. It may also be a bug in Ubuntu (TBD)
My scripts made the following change (in-script) in various places;
ulimit -u 20000 2>/dev/nullThe 20000 number would vary from 2000 to 40000 depending on the script/situation.
What thus seems to happen is that as soon as a number of processes somehow "maxed out" the maximum total of open files (1048576) - which would seem easy to do with for example only a limited number of scripts - multiplied each time by their respective ulimit settings. The result was that at max about 2000-2200 threads would be started.
I removed all the ulimit -u settings, and now do not get any fork: retry: resource temporarily unavailable anymore, nor any other related fork errors.
htop now also shows much more then 2000-2200 threads;
Tasks: 2349, 22334 thr, 318 kthr; 32 runningNow my machine becomes overloaded/unresponsibe, but that is another problem (server is likely swapping), and at that a much more enjoyable one then the fork issue :)
(As an interesting sidenote and reference, describes how to increase the max number of open files to an amount greater then 1048576.)
It should be easy to setup a test for this (bash nested fork script with a ulimit -n ${some_large_value} set inside each forked thread)