Downloading source code from SVN/Git repository over HTTP

Sites hosting open source projects provides an online viewer for browsing source code without actually checking out source code using SVN/Git clients. Checking out entire repository will take long time. Also with Git there is no straight forward way to checkout only a particular directory. Cloning Git repos takes long time as Git downloads entire repository to local machine. Even with sparse checkout, Git downloads entire repository. When bandwidth is a concern, one cannot checkout entire repository.



One can use GNU Wget to recursively download files from online code repositories. For windows this can be downloaded from http://users.ugent.be/~bpuype/wget/



Command to download a directory and its child directories and all files in it recursively excluding index.html is below. This will not download parent directories and files from external sites.



wget --cut-dirs=2 --level=15 --include-directories=src/main/java --recursive --no-parent --no-host-directories --reject=index.html -e robots=off --no-clobber http://sourcesite.com/src/main/java



Be careful with slashes. When I used backslash, it did not work.



--cut-dirs=N This ignores directories of N levels from root directory of the URL

--level=N Downloads files from N level of directories. Default is 5 levels

--include-directories=src/main/java Include only this directory and its child directories

--recursive Recursively download

--no-parent Do not go to parent directory of the given URL.

--no-host-directories Without this option wget creates a directory by the host name of the server

--reject=index.html Do not create index.html file

-e robots=off Exclude robots.txt when crawling the site

--no-clobber Do not overwrite existing files





Here are few of the source code repository addresses.



http://svn.apache.org/repos/asf/

http://selenium.googlecode.com/git



For Google Code sites use projectname.googlecode.com/git for Git repo or projectname.googlecode.com/svn if it is a svn repo.





