Capybara is a tool that Ruby on Rails developers mostly use for testing their web applications. This tool, however, can be also used to automate boring/repeating/long running tasks on the web or scraping information from web sites that were not kind enough to provide API.

Use cases

I am lazy and I do not like work that involves repeating actions that won’t stimulate my brain. This is why I automate as much as I can. Ruby and Capybara came handy to me in a few situations. For example, when I learned that I cannot attach multiple files on a local printing services provider web site, and I had hundreds of files to print, I hacked together a quick script that did the clicking and waiting for me. I also used the technique to extract information and images from web sites that are heavy in JavaScript. I have also a handy script that opens Hangouts link, removes all the unnecessary DOM elements around video element so it does not waste so much space. You can also use it to write neat monitors, that’ll log into your web site as a user, perform some actions and verity that basic functionality on your live production system is not broken. Let’s do some Ruby hacking!

How it works?

Capybara starts a browser. This can be a real browser (Firefox via selenium-webdriver) or headless browser (like PhantomJS). The good thing about it, is that you can use selenium backend while you work on your script, and switch over to Phantomjs when it’s done. This way, you will see what is going on while you do the hacking, and executes without opening any extra windows when you are done and want to use it.

Selenium/Webdriver

Selenium has these nice bindings to control real browsers that are written in Ruby. Capybara uses them to interact with browser. To get started, make sure you do have “firefox” and “java” in your $PATH, otherwise it will not work as expected. Of course you need a Ruby installation too.

You need to install “capybara” gem as dependency. Selenium is a dependency and you don’t even have to require it directly. Here’s a script that checks our main page for tagline text:

$ gem install capybara require 'capybara' session = Capybara::Session.new(:selenium) session.visit "http://www.amberbit.com" if session.has_content?("Ruby on Rails web development") puts "All shiny, captain!" else puts ":( no tagline fonud, possibly something's broken" exit(-1) end $ ruby check_amberbit.rb All shiny, captain!

Poltergeist/Phantomjs

Poltergeist is a headless browser driver for Capybara. You will need to install PhantomJS and make sure that ‘phantomjs’ command is in your path first.

Let’s change the script above to use Phantore ‘capybara’

$ gem install poltergeist session = if ARGV[0] != 'phantomjs' Capybara::Session.new(:selenium) else require 'capybara/poltergeist' Capybara::Session.new(:poltergeist) end session.visit "http://www.amberbit.com" if session.has_content?("Ruby on Rails web development") puts "All shiny, captain!" else puts ":( no tagline fonud, possibly something's broken" exit(-1) end $ ruby check_amberbit.rb phantomjs All shiny, captain!

Shiny.

More advanced example

We can also use Capybara’s DSL instead of manually starting the session. Let’s find on which web sites our Ruby development company’s logo is used, using Google Image Search:

require 'cgi' require 'timeout' require 'capybara' class GoogleImagesSearcher include Capybara::DSL def initialize Capybara.default_driver = :selenium end def find_sites_with_image(image_url) urls = [] link = "http://images.google.com/searchbyimage?image_url=#{CGI.escape(image_url)}&filter=0" visit link return urls unless page.has_content?("Pages that include matching images") while true page.all("h3.r a").each do |a| urls << a[:href] end within "#nav" do click_link "Next" end end rescue Capybara::ElementNotFound return urls.uniq end end images = GoogleImagesSearcher.new.find_sites_with_image ARGV[0] puts "Found #{images.count} pages using this image:" images.each do |img| puts img end $ ruby search_for_image.rb http://www.amberbit.com/assets/amberbit_logo_big-b1c78bb141a0fe6d092afbadf1edc7b9.png Found 21 pages using this image: http://amberbit.com/ http://www.amberbit.com/blog http://amberbit.com/blog/introduction-to-rack-middleware http://amberbit.com/blog/geospatial-search-with-ruby-and-sphinx http://www.amberbit.com/blog/2014/2/4/postgresql-awesomeness-for-rails-developers/ https://www.google.com/search?tbs=simg:CAESXRpbCxCo1NgEGgIICgwLELCMpwgaNAoyCAESDPEH2gbfBvwH3QbbBhogv2oqc6m3Z6eTfIxHGmajz_1yTppo1-MlJqEmXGPUIaXkMCxCOrv4IGgoKCAgBEgTtRfB3DA&tbm=isch&sa=X&ei=SXL7Uoa6G-aC4AShv4HoBg&ved=0CEgQsw4 http://www.prweb.com/releases/2009/05/prweb2465382.htm http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/ http://www.amberbit.com/blog/2014/1/20/angularjs-templates-in-ruby-on-rails-assets-pipeline/ http://www.amberbit.com/blog/2014/1/20/torquebox-3-rails-4-zero-downtime-deployment-ubuntu-12-04/ http://www.amberbit.com/blog/2011/12/27/render-views-and-partials-outside-controllers-in-rails-3/ http://www.amberbit.com/blog/2012/2/2/building-small-sites-with-locomotivecms-and-deploying-to-heroku-and-gridfs/ https://plus.google.com/+Amberbit http://www.amberbit.com/blog/2011/10/24/measuring-complexity-of-ruby-19-code-with-metric_abc/ http://www.amberbit.com/blog/2012/02/02/building-small-sites-with-locomotivecms-and-deploying-to-heroku-and-gridfs/ http://www.amberbit.com/blog/2011/11/28/ruby-flv-pseudostreaming-sinatra-rack-evil/ https://plus.google.com/+Amberbit/about http://www.amberbit.com/work-for-us https://plus.google.com/+PrzemyslawWroblewski https://plus.google.com/+PrzemyslawWroblewski/videos https://plus.google.com/+PrzemyslawWroblewski/about

Not bad.

Summary

Ruby is a perfect language for hacking those sort for scripts together. You do not have to develop web applications only with Ruby and Rails, it is also perfect scripting language. Powerful regular expressions syntax, super easy syntax and - more than anything else - great tools and libraries (like Capybara), make Ruby excellent choice for automating tasks.

More info

Check out documentation for Capybara

Browse my Github repository with some more examples (and please do make pull request if you have some nice scripts too!).