Update on recent SMP contention work (2)

Another veritible ton of SMP performance work has gone into master. * The entire [v]fork/exec/exit/wait path has been streamlined and essentially no longer have any SMP contention. * The pid/process-group/session mechanics have been rewritten and essentially no longer have any SMP contention. * The entire VM fault path, particularly for initial COWs on binaries (to support concurrent exec's) is now able to use shared fine-grained locks end-to-end and have no SMP contention. As in zero. Running 8 threads doing fork/exec/wait of an ELF binary on one of the Haswell blades went from 10.60 seconds for 80000 total execs to 3.8 seconds. That's a multi-fold 2.7x improvement in performance. * tmpfs performance has been radically improved. It turns out that most of the code was fine-grained locked but still had coarse-grained per-mount locks wrapped around most of the VNOP operations. I took pass on it and removed most of the coarse-grained locks. The previous block of work got rid of 90% of the contention on the smaller systems (the 4-core/8-thread blades), but were lacking on the bigger system (monster's 48-core opteron). This most recent set of work has gotten rid of 98% of the contention on the smaller systems and probably 90%+ of the contention on monster. The only system paths which still have noticable contention are the filesystem write paths. -- Bulk package builds (dports) on monster are under test now, no results yet but the last week or two has brought the full build for 20,000+ packages, from scratch, down to around 15-hours. The current tests should be able to beat that. As with prior work, there may be some instability. I will continue to work through what bugs show up and exercise various subsystems such as swapcache and paging under heavy loads to locate and fix whatever problems show up. -Matt Matthew Dillon <dillon at backplane.com>