lazynero.blogg.se - Kms copyimage memory exhausted

kmalloc-64(8685:step_2503) butĮventually when there are excessive numbers of slab caches the node crashes with something like: 14:00:52 kmem_cache_destroy qib-user-sdma-pkts-0-02.00(353:step_397): Slab cache still has objectsīut on the PSM2 machines we see a different error: On a test machine with 7.4.z and a new uptime ls /sys/kernel/slab | wc -l symlinks /sys/kernel/slab/ | grep dangling | wc -lġ4:17:44 up 23 days, 6:10, 1 user, load average: 0.00, 0.01, 1.35 Quartz18722: uptime ls /sys/kernel/slab | wc -l symlinks /sys/kernel/slab/ | grep dangling | wc -lġ4:25:02 up 2 days, 7 min, 2 users, load average: 0.08, 0.03, 0.92 I personally don't know the association but from what I gather so far PSM uses this in some way with its intra-node MPI traffic.Īs evidence that we are leaking slabs on a 7.3.z cluster and old Slurm. They create a new slab for each and every job and then supposedly cleans it up at the end of the job. One thing is kind of unusual about PSM & PSM2 vs. It is possible that the leak is just slower on PSM machines. I don't think that we've seen one crash yet. Interestingly, it seems to leak slab caches but it may be more harmless on PSM machines which have qib cards. We're working on figuring out a minimal reproducer but we haven't pinned it down fully. What we see is _in_bytes and memory.limit_in_bytes are being set in /sys/fs/cgroup/memory/slurm/uid_/job_, whereas on the older slurm _in_bytes and memory.limit_in_bytes are being set. > slurm_cgroup_conf->min_kmem_space = XCGROUP_DEFAULT_MIN_RAM > slurm_cgroup_conf->max_kmem_percent = 100 > slurm_cgroup_conf->allowed_kmem_space = -1 > slurm_cgroup_conf->constrain_kmem_space = false The 4 new cgroup parameters the newer Slurm on rzgenie is now setting: They also use CPU and freezer cgroups on the cluster where the problem first appeared. sys/fs/cgroup/memory/slurm/uid_#/job_#/step_# These are HPC compute nodes and the layout of the control groups is along the lines of: This is not containers and the way that systemd creates slices, scopes, and services. The new slurm the 17.02.7 is more aggressive in how it uses cgroups.Ĭontrol groups are implicated and in particular memcg's are circumstantially implicated but possibly it has to do with the layout of the control groups or the way that they are used. Because of the size of the clusters, this ends up being at a rate of several per day. machines that have hfi1/OPA crash at a rate of about 1/node week of uptime. We upgraded SLURM from 2.3.3 to 17.02.7.on some of our HPC clusters and we are now having machines which use PSM2 i.e. Output of /proc/slabinfo on the machine showing the issue 0003-mm-memcg-slab-fix-races-in-per-memcg-cache-creation-destruction.patchĠ004-mm-memcg-slab-do-not-destroy-children-caches-if-parent-has-aliases.patchĠ001-memcg-slab-kmem-cache-create-memcg-fix-memleak-on-fail-path.patchĠ002-memcg-slab-clean-up-memcg-cache-initialization-destruction.patch