This is the first draft of the table of contents of a book that I have been writing.

It’s worth noting that this entire program can be worked through without spending a penny on proprietary software (with the optional exception of Charles). It also does not require a powerful computer – you could do all this on a five year old MacBook.

Use PHP to put up your own Web site. Spend a couple of weeks making it a reasonably decent looking site. It doesn’t matter what the site is or whether it has any transactional functionality. Just some pages that display photos and content is fine.

Use CSS, JavaScript and Chrome Web inspector while building your site. You don’t have to do anything fancy just try to use inspector and write some CSS and JavaScript every time you work on your site.

Use lynx --dump to retrieve the contents of your Web site. Just hardcode all the page URLs. Redirect all the content to flat files, then use grep to look for patterns in your content. Start by looking for mistakes you commonly make. Save your greps in a file.

Did your greps actually do what you wanted? Try using find , egrep and/or perl -e . You can look up the full Perl regex syntax online and there are both REPL-based and online regex testers available.

Do your greps do what you want now? You might want to stop and read the Mastering Regular Expressions book. You will never once in your long future career ever regret taking the time to read this book.

Use wget to retrieve source files from your site. Save the files in a directory. Then write shell scripts to apply jshint , html tidy and csslint respectively. What did you learn?

Write a grep wrapper for the static analysis performed in the previous step. Filter out boring lines and/or highlight interesting lines in red. Save it all and check it in to source control. Seriously if you are still not in source control at this point STOP NOW and figure it out. Otherwise, congratulations you are just about to lose several hours of valuable work — prepare to go back to step 3 and start over! Sorry to be harsh but you WILL lose work. CHECK YOUR SITE INTO SOURCE CONTROL TOO.

Use curl to retrieve source instead of wget. What is different? Which one do you like better? Check it into source control.

Now go back to your wget script and use your text editor to batch-replace the end of every wget line with & . Now all your wgets run at once. What happens? Which strategy do you like better? Create a branch called “concurrent,” check in your new code and push it to github.

Now use GNU Parallel to do the same thing. Spend some time squelching wget log noise if necessary.







Now start your script, open a separate window and open htop . What does it all mean? What did you learn once you understood it? Now repeat with iotop and iftop. Check your bash history, see if there’s any lines in there worth remembering (spoiler alert: yes there are). Drop those lines in a new lib/snippets.sh and check it in to source control.

Is your wget script overpowering your CPU? You can go back and tweak GNU Parallel to run fewer concurrent jobs. Check your config change into source control.

Implement the same script with curl. What is different? What is the same?

Extract the GNU Parallel runner into its own file, so that there is no duplicated code between the curl and wget versions. Learn how to use source <file>

Now merge your “concurrent” branch back into master. You can always go back to just before the merge commit if you want to play with the old single threaded version. Tag the revision if you want, and/or just put in a comment you can remember to search for later.

Now use wget , lwp-request and curl to examine HTTP headers. These won’t make sense unless you go and read Wikipedia and StackOverflow and ServerFault and get a good understanding for yourself of what happens during the life of a Web request. You will never once regret having spent the time to learn this information.

Now use wget/curl to examine the headers for your Web site. Does it return the headers you would expect? What about for pages that don’t exist or have moved? Fix your site if the headers aren’t working properly.

Save your favorite header checks in SCM.

Wrap your header checks in greps that filter out boring lines.

Write a for loop that wraps the wrapper you wrote in the previous step.







Go to your Web server and and start a GNU screen session. You will have to spend some time reading up on what screen is and how to control it. Once you have this information you can pretty much always set up a CI server of your own on a Linux box, regardless of the other parameters of the environment.

Start your for-loop-wrapper inside the screen session. Congratulations! You are now running a production monitoring daemon.

You could use cron instead of a for loop and then you would get email notifications for free (assuming your ops person has configured your host with a mail account). Go and learn how cron works, it’s simple and you will use this knowledge about five times an hour once you start dealing with software systems at Web scale. Cron runs basically everything on the Web.

Now use Chrome Web Inspector Net tab to look at HTTP traffic on your site. It is assumed you learned the basics of Chrome inspector during the initial 2 weeks you spent building your site.

Use Charles to view the same traffic (optional if you don’t want to pay for Charles).

Use tcpdump to view network traffic. You just have to learn to filter tcpdump down to your web server traffic, then you can stop. TcpDump is huge, you should not try to learn all of it now. It’s OK if the commands that work for you seem somewhat obtuse and magic. You can come back and fill in those gaps in your knowledge later. For now you need to be thinking about TCP packets.

Go learn how TCP conversations work during the life of a Web request. You will use this information every day for the rest of your career as a Web developer.

Now go back and look at tcpdump again. Use wireshark to visualize and filter.

Save your tcpdump scripts in SCM.

Go and read your web server logs and PHP logs. What is interesting? What needs to be fixed? Write and save greps for the log lines that are interesting.







Use cut and histo to graph log messages in real time. Save it I. SCM, leave it running inside screen. Congrats you built a log monitoring service.

Take a day to fix your worst log boo boos. watch the graphs of “bad ” messages go down. Take screenshots, celebrate! Now you have built a “refactoring dashboard” like the one Ross Snyder talked about at Surge 2011.

Now you are probably ready to get your head around the Selenium stack, one of the most complex application stacks in the Web industry.

First install wd (prounounced “would”) and open a wd shell. Follow the steps in the tutorial and get them to actually work. This will take a couple of days, probably and that’s OK. Go slowly and bookmark all the helpful selenium articles you find. You will come back to these again and again.

Now go back and implement all of the wget, curl, lynx projects above with Selenium. This will take a month or so, probably. You don’t have to use wd, use whatever driver/language makes sense to you.

As you go, install the selenium headless environment on your server and get your new scripts running heedlessly inside your for-loop (or your cron if you went that way). You can leave your old scripts running too, in fact this is recommended so you can cross check output between different versions of the same script.

Congratulations you now have an incredibly powerful Web testing and monitoring tool at your disposal.

Take a week to pay down your technical debt. This is the only time you will stop all production for a week like this, but it’s worth it. After this initial debt is paid, you can just fix technical debt as part of your normal work day, occasionally making a project out of a big task.

Now select a login form whose action you would like to automate. Get it working. This will take a while. Use your local selenium meet up, #selenium and StackOverflow. Bookmark everything and blog what you learn. This is a huge opportunity for you to demonstrate that you are have achieved beginner status in one of the most complex and least-understood technologies around. Don’t skimp on the time required to make your first good impression as a selenium hacker! And welcome to the community!

Add your login form automation script to your CI cron. Watch it for a couple of days. What did you learn?







Pick another login form or similar single-page form. Automate it. Put it in your CI cron.

Read the Selenium API documentation carefully. Think about what kinds of capabilities exist that you could use to automate a slightly more complex flow (think: 3 or 4 page form such as credit card and shipping flow).

Automate that slightly more complex flow. You will have to read the Selenium/WebDriver source to fully understand how the APIs work. That is normal — you cannot master Selenium without reading the source, because the project evolves so fast that the documentation lags behind as a matter of course. Use your list of Selenium resources to back you up as you read the code. Ask whatever questions you need to, and try to do it on StackOverflow. If you are wondering about it and asking, then a lot of other people are probably out there getting stuck on it but not asking.

Get it working.

Put it in your CI cron.