Although I've browsed some of the questions here, I think every situation is different and maybe requires a totally different solution.

What I have now:

Linux software RAID5 on 4x4TB enterprise HDD

LVM on top with a few volumes

The most important, storage volume, a 10TB XFS

All setup with default parameters in Debian Wheezy

The volume is mounted with options 'noatime,nodiratime,allocsize=2m'

About 8GB RAM free and used for caching I guess, Quad core Intel CPU with HT not very used

This volume mostly stores about 10 million files (at most 20M in the future) between 100K and 2M. This is a more precise distribution of file size ranges (in K) and numbers in range:

4 6162 8 32 32 55 64 11577 128 7700 256 7610 512 555 1024 5876 2048 1841 4096 12251 8192 4981 16384 8255 32768 20068 65536 35464 131072 591115 262144 3411530 524288 4818746 1048576 413779 2097152 20333 4194304 72 8388608 43 16777216 21

The files are mostly stored at level 7 on the volume, something like:

/volume/data/customer/year/month/day/variant/file

There are usually ~1-2K files inside those folders, sometimes less, other times up to 5-10K (rare cases).

I/O isn't so heavy but I experience hangs when pushing it a little bit more. For example:

Application that performs most I/O is NGINX for both reading and writing

There are some random reads of 1-2MB/s TOTAL

I have some folders where data is continuously written at a rate of 1-2MB/s TOTAL and all files older than 1h should be periodically removed from the folders

Running the following cron once per hour hangs for a few good seconds the entire server many times and may even disrupt the service (the writing of new data) as I/O timeouts are generated:

find /volume/data/customer/ -type f -iname "*.ext" -mmin +60 -delete find /volume/data/customer -type d -empty -delete

I also observe slow writing speeds (few MB/s) when writing files in the above ranges. When writing larger files, it goes OK until write cache fills (obviously) and then speed drops and starts hanging the server in waves.

Now, I am searching for a solution to optimize my storage performance as I am sure that I am not optimal at defaults and many things may be improved. Although not that useful for me, i wouldn't drop LVM if it it doesn't provide significant gain also because although possible, I wouldn't reinstall the whole server by dropping LVM.

Read a lot about XFS vs. ReiserFS vs. Ext4 but I am quite puzzled. Other of my servers in a much smaller RAID1 2TB volume but exactly same setup and significantly heavier workload perform quite flawlessly.

Any ideas?

How should I debug/experiment?

Thanks.