For the longest time I’ve been meaning to delve into webscraping and when I finally did, it resulted in me investing some of my time in some small-time projects. One of these projects is this YouTube video downloader script.

Following this tutorial will be much easier for you if you have some experience with webscraping but even if you don’t, no need to sweat cause I’ll be explaining thoroughly each and every step and when we will be done, you will find yourself with some basic knowledge of webscraping.

The modules I have used in this project are :-

requests.

bs4.

pytube.

THE MODULES

Firstly, let us start by explaining some of the modules.

disclaimer: If you have some experience with any of these modules you may skip their respective parts.

requests:

The requests module gives us the power of HTTP in our hands. It facilitates use of HTTP request like get and post in our programs. In this program we are going to use this module to make a get request to the server i.e. we are going to send data along with the URL and get some response in return.

If you want to know more about about HTTP protocols, you can go here.

And to learn more about the requests module, here.

bs4:

bs4 or Beautiful Soup 4 is a must needed module for webscraping. It parses the web data and coverts them into a bs4 class object, that enables us to easily/idiomatically, navigate, search and modify the data.

You can find out more about bs4 here.

pytube:

pytube is a light-weight module that enables us to download YouTube videos provided we have the link to the video.

You can find out more about pytube here.

THE CODE

What we going to essentially do in this script, is we are going to webscrape YouTube to find a link to the desired video and use the pytube module to download the video.

First thing we are going to do is import requests, BeautifulSoup and YouTube.

You may not have these modules preinstalled so you will have to install them using pip install.

The next thing we are going to do is declare some variables. Here,

search is a string that contains the URL of youtube.

and search_term is a string that contains the name of the video you want to find.

Next, we are going to declare a dictionary that contains the value of the variable search_term attached to the key ‘search_query’. As to why we are doing this you will understand as we progress further in this tutorial.

In this step we will make the HTTP get request. This is done using the requests.get function which returns a ‘requests.models.Response’ class object. which contains all the data of the website. The function arguments which the get function require is the URL of the website. So, we provided it with the link of YouTube. Moreover we can also provide the website with data like search queries, encoding etc, by using the ‘params’ attribute.

As you can see YouTube accepts ‘search_query’ as a parameter with the value of the search term. Therefore, we added this parameter in the dictionary, we declared in the step above.

Finally, we are able to create the soup which is essentially the BeautifulSoup class object. The BeautifulSoup functions requires arguments like a string of the HTML data that r.text returns and the name of the parser that will parse the data. Here I’m using the ‘html5lib’ parser.

Then we are going to use the .find() function which is a functionality of a BeatifulSoup class object to search for the links of the videos. We are going to search for the <a> tag which has the attribute ‘aria-hidden’ = ‘true’ and contains the link to the video.

And then using result[‘href’] which is also a functionality added by BeatifulSoup we extract the link from it.

And finally, we use the YouTube function to download the video. We concatenate the link we obtained from the previous step with the string “https://www.youtube.com” to get the full link.

CONCLUSION

Thus, we were able to code a script to download YouTube videos with a little bit of webscraping and the pytube module with just enough functionality to barely call it a YouTube downloader.

You may also just see a second part to this tutorial in the near future, as I have plans to add some more features to this script. I may also turn this script into a command line application. You can find more of the code I used here.

And as always,

I am Prabhjit Dutta and this is –

THE CODE CAFE

HAPPY CODING