I came across Taco Bell Programming recently, and think this article is worthy to read for every software engineer. The post mentions a scenario which you may consider to use Hadoop to solve but actually xargs may be a simpler and better choice. This reminds me a similar experience: last year a client wanted me to process a data file which has 5 million records. After some investigations, no novel technologies, a concise awk script (less than 10 lines) worked like a charm! What surprised me more is that awk is just a single-thread program, no nifty concurrency involved.

The IT field never lacks “new” technologies: cloud computing, big data, high concurrency, etc. However, the thinkings behind these “fancy” words may date back to the era when Unix arose. Unix command line tools are invaluable treasure. In many cases, picking the right components and using pipeline to glue them can satisfy your requirement perfectly. So spending some time in reviewing Unix command line manual instead of chasing state-of-the-art techniques exhaustedly, you may gain more.