100 Scripts in 30 Days challenge: Script 11 & 12 — Web Scraping with requests Module CryptAnon Follow Apr 19, 2017 · 1 min read

The two scripts that I will be presenting today was written by me in Python 2.7 to do simple web scraping to find all download links for Weka Packages that are used by the Weka which is a good machine learning tool set.

Libraries used are requests (To do a get or post request), fake_useragent (To provide a HTTP user agent useful for scraping)and BeautifulSoup for parsing html.

First scripts wekapkglst.py scrapes http://weka.sourceforge.net/packageMetaData/ for downloadable links.

While wekapkgdl.py scrapes https://sourceforge.net/projects/weka/files/weka-packages/ for downloadable links

search-sample.txt gives a sample list of URLs that is extracted from the tool.

Scripts are given below, hope the are helpful.