NetBSD 7.0/xen scheduling mystery, and how to fix it with processor sets

Today I had a need to do some number crunching using a home-brewn C program. In order to do some manual load balancing, I was firing up some Amazon AWS instances (which is Xen) with NetBSD 7.0. In this case, the system was assigned two CPUs, from dmesg: # dmesg | grep cpu vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4 vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4 I started two instances of my program, with the intent to have each one use one CPU. Which is not what happened! Here is what I observed, and how I fixed things for now. I was looking at top(1) to see that everything was running fine, and noticed funny WCPU and CPU values: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 2791 root 25 0 8816K 964K RUN/0 16:10 54.20% 54.20% myprog 2845 root 26 0 8816K 964K RUN/0 17:10 47.90% 47.90% myprog I expected something like WCPU and CPU being around 100%, assuming that each process was bound to its own CPU. The values I actually saw (and listed above) suggested that both programs were fighting for the same CPU. Huh?! top's CPU state shows: load averages: 2.15, 2.07, 1.82; up 0+00:45:19 18:00:55 27 processes: 2 runnable, 23 sleeping, 2 on CPU CPU states: 50.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 50.0% idle Memory: 119M Act, 7940K Exec, 101M File, 3546M Free Which is not too useful. Typing "1" in top(1) lists the actual per-CPU usage instead: load averages: 2.14, 2.08, 1.83; up 0+00:45:56 18:01:32 27 processes: 4 runnable, 21 sleeping, 2 on CPU CPU0 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Memory: 119M Act, 7940K Exec, 101M File, 3546M Free This confirmed my suspicion that both processes were bound to one CPU, and that the other one was idling. Bad! But how to fix? One option is to kick your operating system out of the window, but I still like NetBSD, so here's another solution: NetBSD allows to create "processor sets", assign CPU(s) to them and then assign processes to the processor sets. Let's have a look! Processor sets are manipulated using the psrset(8) utility. By default all CPUs are in the same (system) processor set: # psrset system processor set 0: processor(s) 0 1 First step is to create a new processor set: # psrset -c 1 # psrset system processor set 0: processor(s) 0 1 user processor set 1: empty Next, assign one CPU to the new set: # psrset -a 1 1 # psrset system processor set 0: processor(s) 0 user processor set 1: processor(s) 1 Last, find out what the process IDs of my two (running) processes are, and assign them to the two processor sets: # ps -u USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND root 2791 52.0 0.0 8816 964 pts/4 R+ 5:28PM 22:57.80 myprog root 2845 50.0 0.0 8816 964 pts/2 R+ 5:26PM 23:33.97 myprog # # psrset -b 0 2791 # psrset -b 1 2845 Note that this was done with the two processes running, there is no need to stop and restart them! The effect of the commands is imediate, as can be seen in top(1): load averages: 2.02, 2.05, 1.94; up 0+00:59:32 18:15:08 27 processes: 1 runnable, 24 sleeping, 2 on CPU CPU0 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU1 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle Memory: 119M Act, 7940K Exec, 101M File, 3546M Free Swap: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 2845 root 25 0 8816K 964K CPU/1 26:14 100% 100% myprog 2791 root 25 0 8816K 964K RUN/0 25:40 100% 100% myprog Things are as expected now, with each program being bound to its own CPU. Now why this didn't happen by default is left as an exercise to the reader. Hints that may help: # uname -a NetBSD foo.eu-west-1.compute.internal 7.0 NetBSD 7.0 (XEN3_DOMU.201509250726Z) amd64 # dmesg ... hypervisor0 at mainbus0: Xen version 4.2.amazon VIRQ_DEBUG interrupt using event channel 3 vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4 vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4 AWS Instance type: c3.large

AMI ID: NetBSD-x86_64-7.0-201511211930Z-20151121-1142 (ami-ac983ddf)



[Tags: amazon, aws, psrset, scheduler, smp, xen]





