Overview

We are currently doing a scrub on a ZFS pool with 12 RAID-Z1 vdevs, and each vdev has 12 drives. Each vdev corresponds to an enclosure. The hardware is a Dell PowerEdge 730xd with two Dell 12Gbps SAS (LSI SAS3008) controllers, and 12 Dell MD1400 enclosures. The operating system is CentOS 7.6.1810.

We have not been able to successfully scrub the pool, because after some time drives become FAULTED in ZFS, and we must zpool clear to continue. The drives that become FAULTED are seemingly random, and smartctl says their SMART status is okay.

The only commonality is that before the drives are marked FAULTED , the error message mpt3sas_scsih_issue_tm: timeout appears in dmesg , followed by the controller resetting, and a flood of ZED errors amd read errors.

I am currently stuck on the following:

Is this a software or hardware issue?

If it's software, is there a configuration change or patch that can prevent the error?

If it's hardware, how can I narrow the issue down?

What We've Tried

We've tried the following:

Increasing the timeout values for each disk at /sys/block/*/device/timeout

Replacing all of the SAS cables

Upgrading all of the firmware

Running the SMART background long test on the FAULTED disk

disk Rebooting (3 times so far)

I also looked at this answer but it didn't help.

Details

Here's journalctl from when the event begins:

Apr 12 04:42:07 kernel: sd 5:0:18:0: attempting task abort! scmd(ffff8d36c295a4c0) Apr 12 04:42:07 kernel: sd 5:0:4:0: attempting task abort! scmd(ffff8d3745b20540) Apr 12 04:42:07 kernel: sd 5:0:4:0: [sdac] CDB: Read(32) Apr 12 04:42:07 kernel: sd 5:0:4:0: [sdac] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 Apr 12 04:42:07 kernel: sd 5:0:4:0: [sdac] CDB[10]: 60 2a b8 c8 60 2a b8 c8 00 00 00 00 00 00 00 08 Apr 12 04:42:07 kernel: scsi target5:0:4: handle(0x000e), sas_address(0x5000c500a6bb846e), phy(4) Apr 12 04:42:07 kernel: scsi target5:0:4: enclosure logical id(0x5204747299f56500), slot(4) Apr 12 04:42:07 kernel: scsi target5:0:4: enclosure level(0x0000), connector name( 1 ) Apr 12 04:42:07 kernel: sd 5:0:18:0: [sdap] CDB: Read(32) Apr 12 04:42:07 kernel: sd 5:0:18:0: [sdap] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 Apr 12 04:42:07 kernel: sd 5:0:18:0: [sdap] CDB[10]: 60 2b f7 f8 60 2b f7 f8 00 00 00 00 00 00 00 08 Apr 12 04:42:07 kernel: scsi target5:0:18: handle(0x001d), sas_address(0x5000c500a6bb68ce), phy(5) Apr 12 04:42:07 kernel: scsi target5:0:18: enclosure logical id(0x5204747299f5dd00), slot(0) Apr 12 04:42:07 kernel: scsi target5:0:18: enclosure level(0x0001), connector name( 1 ) Apr 12 04:42:37 kernel: mpt3sas_cm1: mpt3sas_scsih_issue_tm: timeout Apr 12 04:42:37 kernel: mf: Apr 12 04:42:37 kernel: 0100000e Apr 12 04:42:37 kernel: 00000100 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 00000000 Apr 12 04:42:37 kernel: 000000b6 Apr 12 04:42:37 kernel: Apr 12 04:42:47 kernel: mpt3sas_cm1: sending diag reset !! Apr 12 04:42:48 kernel: mpt3sas_cm1: diag reset: SUCCESS Apr 12 04:42:48 kernel: mpt3sas_cm1: LSISAS3008: FWVersion(16.00.04.00), ChipRevision(0x02), BiosVersion(18.00.00.00) Apr 12 04:42:48 kernel: mpt3sas_cm1: Protocol=( Apr 12 04:42:48 kernel: Initiator Apr 12 04:42:48 kernel: ,Target Apr 12 04:42:48 kernel: ), Apr 12 04:42:48 kernel: Capabilities=( Apr 12 04:42:48 kernel: TLR Apr 12 04:42:48 kernel: ,EEDP Apr 12 04:42:48 kernel: ,Snapshot Buffer Apr 12 04:42:48 kernel: ,Diag Trace Buffer Apr 12 04:42:48 kernel: ,Task Set Full Apr 12 04:42:48 kernel: ,NCQ Apr 12 04:42:48 kernel: ) Apr 12 04:42:48 kernel: mpt3sas_cm1: sending port enable !! Apr 12 04:42:55 kernel: mpt3sas_cm1: port enable: SUCCESS Apr 12 04:42:55 kernel: mpt3sas_cm1: search for end-devices: start Apr 12 04:42:55 kernel: scsi target5:0:0: handle(0x000a), sas_addr(0x5000c500a6bc5ef6) Apr 12 04:42:55 kernel: scsi target5:0:0: enclosure logical id(0x5204747299f56500), slot(9) Apr 12 04:42:55 kernel: scsi target5:0:1: handle(0x000b), sas_addr(0x5000c500a6bc6e66) Apr 12 04:42:55 kernel: scsi target5:0:1: enclosure logical id(0x5204747299f56500), slot(5) Apr 12 04:42:55 kernel: scsi target5:0:2: handle(0x000c), sas_addr(0x5000c500a6bbd86e) Apr 12 04:42:55 kernel: scsi target5:0:2: enclosure logical id(0x5204747299f56500), slot(1)

The handle and enclosure lines are repeated for every drive attached to the controller.

Then, it's followed by:

Apr 12 04:42:57 kernel: mpt3sas_cm1: search for end-devices: complete Apr 12 04:42:57 kernel: mpt3sas_cm1: search for expanders: start Apr 12 04:42:57 kernel: expander present: handle(0x0009), sas_addr(0x5204747299f565ff) Apr 12 04:42:57 kernel: expander present: handle(0x0016), sas_addr(0x5204747299f5ddff) Apr 12 04:42:57 kernel: expander present: handle(0x0024), sas_addr(0x520474729a0a68ff) Apr 12 04:42:57 kernel: expander present: handle(0x0032), sas_addr(0x520474729a0b61ff) Apr 12 04:42:57 kernel: expander present: handle(0x0040), sas_addr(0x520474729a09f1ff) Apr 12 04:42:57 kernel: mpt3sas_cm1: search for expanders: complete Apr 12 04:42:57 kernel: sd 5:0:4:0: task abort: SUCCESS scmd(ffff8d3745b20540) Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: start Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: end-devices Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: expanders Apr 12 04:42:57 kernel: mpt3sas_cm1: removing unresponding devices: complete Apr 12 04:42:57 kernel: mpt3sas_cm1: scan devices: start Apr 12 04:42:57 kernel: sd 5:0:18:0: task abort: SUCCESS scmd(ffff8d36c295a4c0) Apr 12 04:42:57 kernel: scsi_io_completion: 13 callbacks suppressed Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] CDB: Read(32) Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 Apr 12 04:42:57 kernel: sd 5:0:18:0: [sdap] CDB[10]: 60 2b f7 f8 60 2b f7 f8 00 00 00 00 00 00 00 08 Apr 12 04:42:57 kernel: blk_update_request: 13 callbacks suppressed Apr 12 04:42:57 kernel: blk_update_request: I/O error, dev sdap, sector 1613494264 Apr 12 04:42:57 kernel: sd 5:0:21:0: attempting task abort! scmd(ffff8d3acfef0540) Apr 12 04:42:57 kernel: sd 5:0:21:0: [sdas] CDB: Read(32) Apr 12 04:42:57 kernel: sd 5:0:21:0: [sdas] CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 03 Apr 12 04:42:57 kernel: sd 5:0:21:0: [sdas] CDB[10]: 01 af 8c b0 01 af 8c b0 00 00 00 00 00 00 00 08 Apr 12 04:42:57 kernel: scsi target5:0:21: handle(0x0020), sas_address(0x5000c500a6bc5f82), phy(8)

plus a lot more read timeouts. Then, we see a lot of zed errors:

Apr 12 04:42:57 zed[137074]: eid=2425 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137076]: eid=2426 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137078]: eid=2427 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137080]: eid=2428 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc59bb-part1 Apr 12 04:42:57 zed[137082]: eid=2429 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137084]: eid=2430 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137086]: eid=2431 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137088]: eid=2432 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc4337-part1 Apr 12 04:42:57 zed[137090]: eid=2433 class=io pool_guid=0x3317CEBDDE480DA0 Apr 12 04:42:57 zed[137092]: eid=2434 class=io pool_guid=0x3317CEBDDE480DA0 Apr 12 04:42:57 zed[137094]: eid=2435 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137096]: eid=2436 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137098]: eid=2437 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137100]: eid=2438 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bc5f83-part1 Apr 12 04:42:57 zed[137102]: eid=2439 class=delay pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bb68cf-part1 Apr 12 04:42:57 zed[137104]: eid=2440 class=io pool_guid=0x3317CEBDDE480DA0 vdev_path=/dev/disk/by-id/scsi-35000c500a6bb68cf-part1

After that, the drives are marked DEGRADED or FAULTED. I'll also include some more information that might be helpful.

Here is the output of zpool status for the two vdevs with FAULTED devices:

raidz1-4 DEGRADED 0 0 0 scsi-35000cca2513f78b8 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25157bfd0 ONLINE 0 0 0 (repairing) scsi-35000cca251597aa4 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2515de7b0 FAULTED 0 0 0 too many errors scsi-35000cca2516278c8 DEGRADED 0 0 0 too many errors scsi-35000cca25163ea64 ONLINE 0 0 0 (repairing) scsi-35000cca251644664 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2516576a0 DEGRADED 0 0 0 too many errors scsi-35000cca251699f68 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25169bd10 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25169be5c DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca25169c09c DEGRADED 0 0 0 too many errors (repairing) raidz1-5 DEGRADED 0 0 0 scsi-35000cca2516bc234 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2516bc26c ONLINE 0 0 0 scsi-35000cca2516c8e78 ONLINE 0 0 0 scsi-35000cca2516ca244 ONLINE 0 0 0 scsi-35000cca2516ca334 ONLINE 0 0 0 (repairing) scsi-35000cca2516ca848 ONLINE 0 0 0 (repairing) scsi-35000cca2516cb3e0 ONLINE 0 0 0 (repairing) scsi-35000cca2516cb420 DEGRADED 0 0 0 too many errors (repairing) scsi-35000cca2516cc210 ONLINE 0 0 0 scsi-35000cca2516ce390 FAULTED 0 0 0 too many errors (repairing) scsi-35000cca2516ce8e4 ONLINE 0 0 0 scsi-35000cca2516cf224 ONLINE 0 0 0

Here is the output of smartctl -a for the FAULTED drive in raidz1-4 :

=== START OF INFORMATION SECTION === Vendor: HGST Product: HUH721010AL5200 Revision: LS15 Compliance: SPC-4 User Capacity: 9,796,820,402,176 bytes [9.79 TB] Logical block size: 512 bytes Physical block size: 4096 bytes Formatted with type 2 protection LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000cca2515de7b0 Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Fri Apr 12 13:40:57 2019 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 29 C Drive Trip Temperature: 50 C Manufactured in week 02 of year 2017 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 5 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 889 Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 30677043943309312 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 40 0 294 10394513 118610.223 0 write: 0 0 0 0 239773 43528.082 0 verify: 0 0 0 0 18403 101.563 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed 96 18243 - [- - -] # 2 Background short Completed 96 16753 - [- - -] # 3 Reserved(7) Completed 64 2 - [- - -] Long (extended) Self Test duration: 64033 seconds [1067.2 minutes]

sysctl -a | grep -v 'net.' | grep -v 'kernel.sched_domain.' :

abi.vsyscall32 = 1 crypto.fips_enabled = 0 debug.exception-trace = 1 debug.kprobes-optimization = 1 debug.panic_on_rcu_stall = 0 dev.hpet.max-user-freq = 64 dev.mac_hid.mouse_button2_keycode = 97 dev.mac_hid.mouse_button3_keycode = 100 dev.mac_hid.mouse_button_emulation = 0 dev.raid.speed_limit_max = 200000 dev.raid.speed_limit_min = 1000 dev.scsi.logging_level = 0 fs.aio-max-nr = 65536 fs.aio-nr = 0 fs.binfmt_misc.status = enabled fs.dentry-state = 235028 190450 45 0 0 0 fs.dir-notify-enable = 1 fs.epoll.max_user_watches = 108185722 fs.file-max = 52384239 fs.file-nr = 2080 0 52384239 fs.inode-nr = 102807 662 fs.inode-state = 102807 662 0 0 0 0 0 fs.inotify.max_queued_events = 16384 fs.inotify.max_user_instances = 128 fs.inotify.max_user_watches = 8192 fs.lease-break-time = 45 fs.leases-enable = 1 fs.may_detach_mounts = 0 fs.mount-max = 100000 fs.mqueue.msg_default = 10 fs.mqueue.msg_max = 10 fs.mqueue.msgsize_default = 8192 fs.mqueue.msgsize_max = 8192 fs.mqueue.queues_max = 256 fs.nfs.nlm_grace_period = 0 fs.nfs.nlm_tcpport = 0 fs.nfs.nlm_timeout = 10 fs.nfs.nlm_udpport = 0 fs.nfs.nsm_local_state = 3 fs.nfs.nsm_use_hostnames = 0 fs.nr_open = 1048576 fs.overflowgid = 65534 fs.overflowuid = 65534 fs.pipe-max-size = 1048576 fs.pipe-user-pages-hard = 0 fs.pipe-user-pages-soft = 16384 fs.protected_hardlinks = 1 fs.protected_symlinks = 1 fs.quota.allocated_dquots = 0 fs.quota.cache_hits = 0 fs.quota.drops = 0 fs.quota.free_dquots = 0 fs.quota.lookups = 0 fs.quota.reads = 0 fs.quota.syncs = 0 fs.quota.warnings = 1 fs.quota.writes = 0 fs.suid_dumpable = 0 fs.xfs.age_buffer_centisecs = 1500 fs.xfs.error_level = 3 fs.xfs.filestream_centisecs = 3000 fs.xfs.inherit_noatime = 1 fs.xfs.inherit_nodefrag = 1 fs.xfs.inherit_nodump = 1 fs.xfs.inherit_nosymlinks = 0 fs.xfs.inherit_sync = 1 fs.xfs.irix_sgid_inherit = 0 fs.xfs.irix_symlink_mode = 0 fs.xfs.panic_mask = 0 fs.xfs.rotorstep = 1 fs.xfs.speculative_prealloc_lifetime = 300 fs.xfs.stats_clear = 0 fs.xfs.xfsbufd_centisecs = 100 fs.xfs.xfssyncd_centisecs = 3000 kernel.acct = 4 2 30 kernel.acpi_video_flags = 0 kernel.auto_msgmni = 1 kernel.bootloader_type = 114 kernel.bootloader_version = 2 kernel.cad_pid = 1 kernel.cap_last_cap = 36 kernel.compat-log = 1 kernel.core_pattern = core kernel.core_pipe_limit = 0 kernel.core_uses_pid = 1 kernel.ctrl-alt-del = 0 kernel.dmesg_restrict = 0 kernel.domainname = (none) kernel.ftrace_dump_on_oops = 0 kernel.ftrace_enabled = 1 kernel.hardlockup_all_cpu_backtrace = 0 kernel.hardlockup_panic = 1 kernel.hostname = htc-sblock-node197 kernel.hotplug = kernel.hung_task_check_count = 4194304 kernel.hung_task_panic = 0 kernel.hung_task_timeout_secs = 120 kernel.hung_task_warnings = 0 kernel.io_delay_type = 0 kernel.kexec_load_disabled = 0 kernel.keys.gc_delay = 300 kernel.keys.maxbytes = 20000 kernel.keys.maxkeys = 200 kernel.keys.persistent_keyring_expiry = 259200 kernel.keys.root_maxbytes = 25000000 kernel.keys.root_maxkeys = 1000000 kernel.kptr_restrict = 0 kernel.max_lock_depth = 1024 kernel.modprobe = /sbin/modprobe kernel.modules_disabled = 0 kernel.msg_next_id = -1 kernel.msgmax = 8192 kernel.msgmnb = 16384 kernel.msgmni = 32768 kernel.ngroups_max = 65536 kernel.nmi_watchdog = 1 kernel.ns_last_pid = 176562 kernel.numa_balancing = 1 kernel.numa_balancing_scan_delay_ms = 1000 kernel.numa_balancing_scan_period_max_ms = 60000 kernel.numa_balancing_scan_period_min_ms = 1000 kernel.numa_balancing_scan_size_mb = 256 kernel.numa_balancing_settle_count = 4 kernel.osrelease = 3.10.0-957.5.1.el7.x86_64 kernel.ostype = Linux kernel.overflowgid = 65534 kernel.overflowuid = 65534 kernel.panic = 0 kernel.panic_on_io_nmi = 0 kernel.panic_on_oops = 1 kernel.panic_on_stackoverflow = 0 kernel.panic_on_unrecovered_nmi = 0 kernel.panic_on_warn = 0 kernel.perf_cpu_time_max_percent = 25 kernel.perf_event_max_sample_rate = 32000 kernel.perf_event_mlock_kb = 516 kernel.perf_event_paranoid = 2 kernel.pid_max = 196608 kernel.poweroff_cmd = /sbin/poweroff kernel.print-fatal-signals = 0 kernel.printk = 7 4 1 7 kernel.printk_delay = 0 kernel.printk_ratelimit = 5 kernel.printk_ratelimit_burst = 10 kernel.pty.max = 4096 kernel.pty.nr = 4 kernel.pty.reserve = 1024 kernel.random.boot_id = 5bd2b4ab-221e-4157-98ad-fe4a81da7784 kernel.random.entropy_avail = 4034 kernel.random.poolsize = 4096 kernel.random.read_wakeup_threshold = 64 kernel.random.urandom_min_reseed_secs = 60 kernel.random.uuid = 4f4a6d22-d974-452d-b550-0e19b7a3c74e kernel.random.write_wakeup_threshold = 896 kernel.randomize_va_space = 2 kernel.real-root-dev = 0 kernel.sched_autogroup_enabled = 0 kernel.sched_cfs_bandwidth_slice_us = 5000 kernel.sched_child_runs_first = 0 kernel.sched_latency_ns = 24000000 kernel.sched_migration_cost_ns = 500000 kernel.sched_min_granularity_ns = 3000000 kernel.sched_nr_migrate = 32 kernel.sched_rr_timeslice_ms = 100 kernel.sched_rt_period_us = 1000000 kernel.sched_rt_runtime_us = 950000 kernel.sched_schedstats = 0 kernel.sched_shares_window_ns = 10000000 kernel.sched_time_avg_ms = 1000 kernel.sched_tunable_scaling = 1 kernel.sched_wakeup_granularity_ns = 4000000 kernel.seccomp.actions_avail = kill trap errno trace allow kernel.seccomp.actions_logged = kill trap errno trace kernel.sem = 250 32000 32 128 kernel.sem_next_id = -1 kernel.shm_next_id = -1 kernel.shm_rmid_forced = 0 kernel.shmall = 18446744073692774399 kernel.shmmax = 18446744073692774399 kernel.shmmni = 4096 kernel.softlockup_all_cpu_backtrace = 0 kernel.softlockup_panic = 0 kernel.spl.hostid = 0 kernel.spl.kmem.slab_kmem_alloc = 0 kernel.spl.kmem.slab_kmem_max = 0 kernel.spl.kmem.slab_kmem_total = 0 kernel.spl.kmem.slab_vmem_alloc = 305947392 kernel.spl.kmem.slab_vmem_max = 732324608 kernel.spl.kmem.slab_vmem_total = 347979264 kernel.spl.version = SPL v0.7.12-1 kernel.stack_tracer_enabled = 0 kernel.sysctl_writes_strict = 1 kernel.sysrq = 16 kernel.tainted = 12289 kernel.threads-max = 4126958 kernel.timer_migration = 1 kernel.traceoff_on_warning = 0 kernel.unknown_nmi_panic = 0 kernel.usermodehelper.bset = 4294967295 31 kernel.usermodehelper.inheritable = 4294967295 31 kernel.version = #1 SMP Fri Feb 1 14:54:57 UTC 2019 kernel.watchdog = 1 kernel.watchdog_cpumask = 0-191 kernel.watchdog_thresh = 10 kernel.yama.ptrace_scope = 0 sunrpc.max_resvport = 1023 sunrpc.min_resvport = 665 sunrpc.nfs_debug = 0x0000 sunrpc.nfsd_debug = 0x0000 sunrpc.nlm_debug = 0x0000 sunrpc.rpc_debug = 0x0000 sunrpc.tcp_fin_timeout = 15 sunrpc.tcp_max_slot_table_entries = 65536 sunrpc.tcp_slot_table_entries = 2 sunrpc.transports = tcp 1048576 sunrpc.transports = udp 32768 sunrpc.transports = tcp-bc 1048576 sunrpc.udp_slot_table_entries = 16 user.max_ipc_namespaces = 2063479 user.max_mnt_namespaces = 2063479 user.max_pid_namespaces = 2063479 user.max_user_namespaces = 0 user.max_uts_namespaces = 2063479 vm.admin_reserve_kbytes = 8192 vm.block_dump = 0 vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500 vm.drop_caches = 0 vm.extfrag_threshold = 500 vm.hugepages_treat_as_movable = 0 vm.hugetlb_shm_group = 0 vm.laptop_mode = 0 vm.legacy_va_layout = 0 vm.lowmem_reserve_ratio = 256 256 32 vm.max_map_count = 65530 vm.memory_failure_early_kill = 0 vm.memory_failure_recovery = 1 vm.min_free_kbytes = 90112 vm.min_slab_ratio = 5 vm.min_unmapped_ratio = 1 vm.mmap_min_addr = 4096 vm.mmap_rnd_bits = 28 vm.mmap_rnd_compat_bits = 8 vm.nr_hugepages = 0 vm.nr_hugepages_mempolicy = 0 vm.nr_overcommit_hugepages = 0 vm.nr_pdflush_threads = 0 vm.numa_zonelist_order = default vm.oom_dump_tasks = 1 vm.oom_kill_allocating_task = 0 vm.overcommit_kbytes = 0 vm.overcommit_memory = 0 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.panic_on_oom = 0 vm.percpu_pagelist_fraction = 0 vm.stat_interval = 1 vm.swappiness = 60 vm.user_reserve_kbytes = 131072 vm.vfs_cache_pressure = 100 vm.zone_reclaim_mode = 0

Let me know if I can include anything else that would be helpful.