In this entry, we’re going to look up what our public facing IP address is, using the Python modules, re, requests and BeautifulSoup. It’s going to send a request to whatismyip.com with requests, then we’re going to parse the returned HTML in BeautifulSoup, so we can break it up, and navigate through it a lot easier, and then finally, use re to grab the IP address and print it to the screen.

Hit the jump for the tutorial!

UPDATE: User rdssassin on reddit just told me that http://jsonip.com exists here. Here’s the easiest way to get your ip in 4 lines:

import requests r = requests.get(r'http://jsonip.com') ip= r.json()['ip'] print 'Your IP is', ip

Anyways, below is the original post. It sort of works as a BeautifulSoup/regex tutorial, but really, just use the 4 liner above.

UPDATE 2: It turns out there’s actually a bunch of great websites for doing this, instead of whatismyip.com:

http://curlmyip.com/ credit to Buttscicles

ipaddr.me. credit to -ajp-

http://www.icanhazip.com/ credit to Koooba

import requests r = requests.get(URL FROM THE LIST ABOVE) ip= r.text print 'Your IP is', ip

I’m running a Windows 7 64bit machine, with Python 2.7 64bit, so you might have to make some changes to the code to account for the differences between our machines.

Naturally, you’ll need to grab the two third party modules from PyPy or their homepages, or maybe check out the 64bit repository, if that’s what you need, like I did. As usual, I’ll go through the code piece by piece, and include the whole thing at the very bottom of the page. So first, we’re going to import the modules we’ll need, requests, BeautifulSoup class from the BeautifulSoup library, and the regex library for Python, re.

import requests from BeautifulSoup import BeautifulSoup import re

Then we’ll need to set a customized User-Agent for the GET call, since WhatIsMyIP seems to ban any non-recognized user-agent. They put up a page to explain why here. In order to create a custom header in requests, all we need to do is pass in a dictionary with the proper key-value pairs to the get function as the argument ‘header’ when we send the request. The key is the string ‘user-agent’ and the value is the Mozilla one you see below.

#set the user-agent to the one specified by whatismyip.com header = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0',}

Then we create a reference to the proper URL at whatismyip.com so we can pass it to the get request, along with the custom header, as discussed above. The requests.get function returns a request object that contains all sorts of information, like what the response code was (ie whether it’s a 404 Not Found response or not) or what the response’s header was. We’ll specifically need the content attribute for later, when we parse it for the IP.

#the url we want to grab the ip from url = r'http://whatismyip.com' #build the request, with the proper user-agent in the custom header r = requests.get(url, headers=header)

Next up, we convert the response’s content to a BeautifulSoup object, which allows us to parse the returned HTML very very easily. So we pass in the response’s content to create an instance of the BeautifulSoup class, and call the reference soup. Then we call its findAll method to find all classes called ‘div’ with the attribute ‘ip’.* This returns a list of BeautifulSoup objects that contain the div and id we were looking for. Lastly, we assume there is only going to be one instance of the div/id pair, and keep only the first object in the list, at index 0. You can easily add a few lines that make sure it’s the right soup you have, instead of making the assumption here. * I am not familiar with the HTML terminology for those things, so forgive me, and feel free to post the corrections in the comment section below

#convert html to soup soup = BeautifulSoup(r.content) #find the line with the IP address on it ip = soup.findAll('div', id='ip')[0]

Now we need to make sure that the text we have found is human-readable, so we convert all the HTML entities to a string, so for example ‘2’ would become ‘2’ instead. This conversion allows us to parse the string more easily with regex. We create a new BeautifulSoup class instance with the text attribute from the results from out findAll method above, and make sure to set the convertEntities argument to HTML_Entities, to ensure we can get the proper conversion.

#then, convert the HTML entities raw_public_ip_line = BeautifulSoup(ip.text, convertEntities=BeautifulSoup.HTML_ENTITIES)

I won’t go into the expression below, but essentially it finds 3 groups of integers, split up by periods. {1,3} means find 1, 2, or 3 instances of the preceeding token. The preceeding token was [0-9] which means find a digit between 0 to 9. These two token were wrapped in parens, along with a period (with a backslash so it’s treated as a period rather than a wildcard token) to ensure that they are treated as one combine token, like so: (([0-9]{1,3}\.) . Then following the parens, there’s a {3} token, meaning find 3 of the preceeding token. Then finally find another 1, 2, or 3 numbers. So:

1. find a digit between 0 to 9 –> 1

2. repeat step one up to two more times for a total of 3 –> 167

3. find a period after the results of step 2. –> 167.

4. repeat step 3, 2 more times, for a total of 3 –> 167.16.256.

5. repeat step two once. –> 167.16.256.4

Then we compile the pattern into a regex friendly object, then apply the pattern to the string of text we built above, using the re.search function. Now public_ip contains the string with your IP in it.

##match this regexp ([0-9]{1,3}\.){3}([0-9]{1,3}) #compile the regex pattern for re pattern = re.compile('([0-9]{1,3}\.){3}([0-9]{1,3})') #find the first instance of the patter in the string raw_public_ip's text attribute public_ip = re.search(pattern, raw_public_ip.text) #print it out print 'Your Public IP is: {}'.format(public_ip.group(0))

Simple eh? Here’s the full code:

import requests from BeautifulSoup import BeautifulSoup import re #set the user-agent to the one specified by whatismyip.com header = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0',} #the url we want to grab the ip from url = r'http://whatismyip.com' #build the request, with the proper user-agent in the custom header r = requests.get(url, headers=header) #convert html to soup soup = BeautifulSoup(r.content) #find the line with the ip in it # which is in a div, with id 'ip' ip = soup.findAll('div', id='ip')[0] #then, convert the HTML entities raw_public_ip = BeautifulSoup(ip.text, convertEntities=BeautifulSoup.HTML_ENTITIES) #match this regexp '[0-9].*?(?=N)' # or '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' # or ([0-9]{1,3}\.){3}([0-9]{1,3}) #compile the regex pattern for re #pattern = re.compile('[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}') pattern = re.compile('([0-9]{1,3}\.){3}([0-9]{1,3})') #find the first instance of the patter in the string raw_public_ip's text attribute public_ip = re.search(pattern, raw_public_ip.text) #print it out print 'Your Public IP is: {}'.format(public_ip.group(0))