Custom Source Control: Refactoring

This is a continuation of my last post on implementing a “custom source control” application in ruby. In that post we built a custom source control application based on “Source Control Made Easy” by Jim Weirich.

Before starting you may want to checkout the repository for the application on github. You can then checkout the part-1-complete release. Or if you’d rather just down a zip file containing the source for custom source control: part-1-complete. Let’s refactor…

Recall our initial stories from the previous post:

Initialize a new repository As a user I want the ability to initialize a repository So that I can begin adding snapshots to it

Create snapshots of my work As a user I want to create snapshots So that I can save snapshots of my work

Checkout previous snapshots As a user I want to checkout a previous snapshot So that I can fix issues or get back to a working state

I also left you with a list of some stuff I wanted to address in future posts. Lets take another look at what I will be working on now:

Separating the tests from the actual implementation code.

DRY’ing out our code.

So without further ado…

Separating the tests from the actual implementation code.

The first thing that we’ll want to do is separate the tests from the application code. Start out by changing directories into the custom_source_control directory.

cd custom_source_control

Next lets create a directory for our code and another for our tests (which going forward I may refer to them as specs).

mkdir lib mv custom_source_control.rb lib

mkdir specs mv test_dir specs

Our directory structure should now look like the following:

├── lib/ │ └── custom_source_control.rb* └── specs/ └── test_dir/ ├── test_file_1.txt └── test_file_2.txt

Now we’re going to create a Rake file. Rake is a build system that we will use from amongst other things, running our specs. I’d like to mention that Rake is another tool brought to us by Jim Weirich.

require 'rake/testtask' Rake::TestTask.new do |t| t.libs << ['lib', 'specs'] t.test_files = FileList['specs/*_spec.rb'] t.verbose = true end

So what are we doing here? Basically we are making it easy to run our specs. So we’ll be able to do: rake test for running our tests, and something else to run the application. We’re now going to create a spec helper file that will require our application code as well as the minitest library.

touch specs/helper.rb

require 'minitest/autorun' require 'custom_source_control'

Now let’s move everything in lib/custom_source_control.rb from the require 'minitest/autotest' line down, to a new file specs/custom_source_control_spec.rb . At the top of the file we need to change the require 'minitest/autotest' to require 'helper' . We also need to change the before block so that as part of the testing, we change directory to the test_dir directory.

require 'helper' describe CustomSourceControl do ... before do repo_dir = File . join ( File . dirname ( __FILE__ ), 'test_dir' ) Dir . chdir repo_dir @csc = CustomSourceControl . new @csc . initialize_repository end ... end

At this point we should be able to run our tests with rake test , and everything should be passing.

rake test

# Running tests: .............. Finished tests in 0.107780s, 129.8942 tests/s, 194.8413 assertions/s. 14 tests, 21 assertions, 0 failures, 0 errors, 0 skips

DRY’ing out our code.

Next up on our list… DRY or Don’t repeat yourself, Basically where we find we have code doing a very similar job is a potential candidate. Doing a visual scan of the code and I notice a few control_file_exists? methods. These methods serve as a convenient way to check for the existence of some of the mandatory repository files. One way to DRY this functionality up is to use the define_method method.

So I start out by refactoring the manifest_exists? and metadata_exists? methods into:

[ 'manifest' , 'metadata' ]. each do | ctl_file | define_method ( " #{ ctl_file } _exists?" ) do File . exists? File . join ( '.esc' , "__ #{ ctl_file } __" ) end end

What we’re doing is creating an array taking the prefix of previous methods and iterating through that array. With each element we’re passing it in as an argument to the define_method method. The method body should be pretty straightforward; we’re doing a File.join on the repository directory, and the control files. Let’s run our tests to make sure everything still works.

# Running tests: .............. Finished tests in 0.032827s, 426.4782 tests/s, 639.7173 assertions/s. 14 tests, 21 assertions, 0 failures, 0 errors, 0 skips

Why stop there? Let’s add head_exists? .

[ '__manifest__' , '__metadata__' , 'HEAD' ]. each do | ctl_file | define_method ( " #{ ctl_file . downcase . gsub ( '_' , '' ) } _exists?" ) do File . exists? File . join ( '.esc' , ctl_file ) end end

So in refactoring head_exists? we had to turn the array of method names into an array of filenames. This would create methods __manifest___exists? , __metadata___exists? & HEAD_exists? . Which is not what we want. So we simply add a call to the String#downcase method which properly creates the head_exists? method. We also add a call to String.gsub to handle removing the _ ’s. The last thing is to change the File.join and remove the _ ’s as well.

(retest)

# Running tests: .............. Finished tests in 0.036157s, 387.2003 tests/s, 580.8004 assertions/s. 14 tests, 21 assertions, 0 failures, 0 errors, 0 skips

The next bit of code has to do with the cwd_hashes method.

def cwd_hashes sha1 = OpenSSL :: Digest :: SHA1 . new hashes = {} cwd_files . each do | file | hashes [ file ] = sha1 . hexdigest ( File . read ( file )) sha1 . reset end hashes end

def hash_for_file ( file = nil ) sha1 = OpenSSL :: Digest :: SHA1 . new sha1 . hexdigest ( File . read ( file )) end

It instantiates a new SHA1 object, then iterates through the current working directories files and hashes them, assigning the hash to… well our hashes hash. I am tempted to refactor the code to something like the following:

def cwd_hashes cwd_files . inject ({}) { | hash , file | hash [ file ] = hash_for_file ( file ); hash } end

Which passes our tests, but I wonder if we take a performance hit since we’re now creating multiple instances of a SHA1 object. It would be nice if we could benchmark this somehow… Well we can! First let’s add the following code:

def cwd_hashes cwd_files . inject ({}) { | hash , file | hash [ file ] = hash_for_file ( file ); hash } end alias_method :refactor_cwd_hashes , :cwd_hashes def initial_cwd_hashes sha1 = OpenSSL :: Digest :: SHA1 . new hashes = {} cwd_files . each do | file | hashes [ file ] = sha1 . hexdigest ( File . read ( file )) sha1 . reset end hashes end

Add the following to to the bottom of the file:

def benchmark_cwd_hashes repo_dir = File . join ( File . dirname ( __FILE__ ), '../' , 'specs' , 'test_dir' ) Dir . chdir repo_dir 10000 . times do | idx | File . open ( "file_ #{ idx } " , 'w' ) do | f | f . write "I am file #{ idx } " end end @csc = CustomSourceControl . new @csc . initialize_repository @csc . snapshot require 'benchmark' Benchmark . bmbm do | x | x . report ( "initial:" ) { @csc . initial_cwd_hashes } x . report ( "refactor:" ) { @csc . refactor_cwd_hashes } end if Dir . exists? '.esc' FileUtils . rm_rf '.esc' FileUtils . rm Dir . glob ( 'file_*' ) end end if __FILE__ == $0 benchmark_cwd_hashes if ARGV [ 0 ] == '--benchmark' end

Everything wrapping the benchmark stuff is the same setup code we’re using in our tests.

Next let’s add the following to our rake file:

task :benchmark do system ( 'ruby' , "-Ilib" , 'lib/custom_source_control.rb' , '--benchmark' , out: $stdout , err: :out ) end

Run it with:

rake benchmark

It should output results similar too:

Rehearsal --------------------------------------------- initial: 0.140000 0.130000 0.270000 ( 0.265694) refactor: 0.160000 0.120000 0.280000 ( 0.279160) ------------------------------------ total: 0.550000sec user system total real initial: 0.120000 0.120000 0.240000 ( 0.239919) refactor: 0.170000 0.120000 0.290000 ( 0.284348)

As you can see, not much of a difference.

The next pieces of code are the hash_and_copy_manifest & hash_and_copy_metadata methods.

[ 'manifest' , 'metadata' ]. each do | method | define_method ( "hash_and_copy_ #{ method } " ) do file = File . join ( '.esc' , "__ #{ method } __" ) hash = hash_for_file ( file ) FileUtils . cp ( file , File . join ( '.esc' , hash )) hash end end

(retest)

# Running tests: .............. Finished tests in 0.035168s, 398.0892 tests/s, 597.1338 assertions/s. 14 tests, 21 assertions, 0 failures, 0 errors, 0 skips

There is not a whole lot going on in the initialize_repository method, but in snapshot we can make a few changes.

def snapshot File . new ( File . join ( '.esc' , '__metadata__' ), 'w' ) File . new ( File . join ( '.esc' , '__manifest__' ), 'w' ) write_manifest ... write_metadata manifest_hash ... end

The creation of the metadata & manifest files doesn’t need to happen because we do a File.open and write to them within the write_manifest & write_metadata methods.

# Running tests: .............. Finished tests in 0.053743s, 260.4990 tests/s, 390.7486 assertions/s. 14 tests, 21 assertions, 0 failures, 0 errors, 0 skips

The next thing to fix is the fact that metadata timestamp was hardcoded to 2014-03-07 23:59:59 -0800 . We can remove that line of code, but the tests will fail again. So what can we do? Stub the Time.now method. If you’re not on ruby 2 or higher, you are going to need to take a few extra steps. Create a Gemfile file and add the following:

source 'https://rubygems.org' gem 'minitest'

bundle install

Now we update helper.rb with the following:

require 'minitest/autorun' require 'minitest/mock' require 'custom_source_control'

in custom_source_control.rb :

def write_metadata ( manifest_hash = nil ) File . open ( File . join ( '.esc' , '__metadata__' ), 'w' ) do | file | file . puts "Snapshot Manifest: #{ manifest_hash } " file . puts "Snapshot Parent: #{ ( head_contents . empty? ) ? 'root' : head_contents } " file . puts "Snapshot Taken: #{ Time . now } " end end

in custom_source_control_spec.rb , wrap the stuff inside our before blocks in:

before do Time . stub :now , '2014-03-07 23:59:59 -0800' do ... end end

(retest)

# Running: .............. Finished in 0.046209s, 302.9713 runs/s, 454.4569 assertions/s. 14 runs, 21 assertions, 0 failures, 0 errors, 0 skips

I left off in the last post with a question: How do I use this thing now that it’s built?. I left off with a suggestion on how to make the script runable. Now I’d like to update the script and add this functionality. Open up custom_source_control.rb and add the following to the very bottom.

if __FILE__ == $0 unless [ '--benchmark' , '--checkout' , '--initialize' , '--snapshot' ]. include? ARGV [ 0 ] puts " #{ ARGV [ 0 ] } is not a subcommand." exit 1 end csc = CustomSourceControl . new case ARGV [ 0 ] when '--benchmark' benchmark_cwd_hashes when '--checkout' if ARGV [ 1 ] csc . checkout ARGV [ 1 ] else puts "'checkout' subcommand takes a second argument, SHA1 of the metadata file to checkout." exit 1 end when '--initialize' csc . initialize_repository when '--snapshot' csc . snapshot end end

Lastly, what I’ve done is symlink the file into my /usr/local/bin directory as csc

ln -s $ HOME/developer/projects/custom_source_control/lib/custom_source_control.rb /usr/local/bin/csc export PATH=$ PATH:/usr/local/bin

Still, more to do:

We don’t really clean up after ourselves so that functionality needs to be added.

Getting the checkout hash is also a manual process so that csc log functionality we talked about would come in handy.

functionality we talked about would come in handy. We are not handling any types of errors mind you

Testing for more edge cases, and fixing any bugs we find.

Adding code coverage.

Adding the ability to handle command line arguments with OptionParser .

. Adding tests and functionality to diff checkins.

Adding tests and functionality to list the history (metadata file hashes from head all the way back to root)

Possibly turning this into a gem.

…so, again, its not quite production ready.

You can double check your work with the project I have hosted on github