Hello Luvs,

I used Python so for many years, and I feel very comfortable using it, but the issue with the Python from the beginning remains. Speed! Python isn't the most performant language in the world (it's not designed to be !), so I ended-up to switching to Go to scan my engine. Because last time (a few years ago) I tried Python Asynco, it was a mess. So few days ago I gave it another shot, and to my surprise, it's relatively matured and is usable. As you may see on my project page, I've been working on a project called hunter suite, which tends to automate all the tedious penetration testing and bug bounty hunters.

Table of Contents

Introduction

Concurrency is hard

Sending millions of HTTP requests

Conclusion

Introduction

As a python developer, you are probably writing a lot of custom scripts. Most of the time, you find yourself in a situation you are talking to some API or network protocol for many reasons. In the case of information security research, fuzzing, or delivering an exploit payload. Now think about if your scripts get results like 100x faster. How cool is that?

Concurrency is hard

No matter what programming language we use, getting concurrency done right is hard. Having a clear goal when we want to write concurrent code will help us a lot. My goal for the parallel client part was achieving a few things.

1- send as much as possible HTTP requests to a single host used for directory, password and parameter brute force, API fuzzing 2- resolve as much as different hosts concurrently used for the bulk host, virtual host and subdomain discovery, DNS brute force, check for HTTP headers security

You can't except 100% accuracy all the time, especially if you talk to webservers. Still, we aim to get as view error as possible.

I use python 3.7 here code samples may not run correctly on the older versions. here is a slightly modified version of python docs .

import asyncio import time async def say_after(delay, what): await asyncio.sleep(delay) print(what) async def main(): task1 = asyncio.create_task( say_after(1, 'hello')) task2 = asyncio.create_task( say_after(20, 'world')) task3 = asyncio.create_task( say_after(4, 'more')) task4 = asyncio.create_task( say_after(6, 'words')) print(f"started at {time.strftime('%X')}") await task1 await task2 await task3 await task4 print(f"finished at {time.strftime('%X')}") asyncio.run(main())

Result:

started at 15:11:22 hello more words world finished at 15:11:42



Process finished with exit code 0

As we can see the word, the world printed last only because it has the highest sleep time. Here is how it works.

First, we so we can talk with its API.

import asyncio

we create four tasks using create_task function.

async def main(): task1 = asyncio.create_task( say_after(1, 'hello')) task2 = asyncio.create_task( say_after(20, 'world')) task3 = asyncio.create_task( say_after(4, 'more')) task4 = asyncio.create_task( say_after(6, 'words'))

Two new keywords here async and await. Whenever we want to make a function asynchronous, we put the "async" keyword before the function definition we "await" time-consuming call in our case sleep.

async def say_after(delay, what): await asyncio.sleep(delay) print(what)

we print the time (with seconds ) to record execution time we "await" all the tasks, or in other words, we run our four tasks.

print(f"started at {time.strftime('%X')}") await task1 await task2 await task3 await task4 print(f"finished at {time.strftime('%X')}")

Awaiting tasks one by one is tedious; we can gather tasks together. We use * (asterisk) before the list to unpack it.

# await task1 # await task2 # await task3 # await task4 await asyncio.gather(*[task1, task2, task3, task4])

We can get the same result. Now we know who we can create and run tasks, let's take it further by making HTTP requests asynchronously.

Sending millions of HTTP requests

But before we start, that let me show you another simple.

def code(): await asyncio.sleep(delay) def main(): print('code') await code

If we run this, we will get this error: SyntaxError: 'await' outside async function. So why I show you this example? Because I want you to understand, you can not install any third party python library and make it asynchronous.

But don't worry, Python has one of the most exceptional programming communities in the world. There is always a library. Meet aiohttp.



pip3 install aiohttp

and let's run a simple web request.

the code is self-explanatory, but what we do is create an async function for making a GET request and await for its response.

import aiohttp import asyncio async def fetch(session, url): async with session.get(url) as response: return await response.text() async def main(): async with aiohttp.ClientSession() as session: html = await fetch(session, 'http://python.org') print(html) if __name__ == '__main__': asyncio.run(main())

Now using our previous knowledge let's create do a simple script to compare their performance one using requests (synchronous) and one using aiohttp (asynchronous)

import requests import time urls = ["https://0xsha.io","https://twitter.com", "https://google.com", ... ] if __name__ == '__main__': print('requests version') start = time.time() print(f"started at {time.strftime('%X')}") for url in urls: requests.get(url) end = time.time() print(f"Ended at {time.strftime('%X')}") print(end-start)

sync-1.py requests version started at 16:09:26 Ended at 16:10:36 69.44833111763

Now let's look at asyncio version.

import aiohttp import asyncio import time async def fetch(session, url): async with session.get(url) as response: return await response.text() async def main(): async with aiohttp.ClientSession() as session: start = time.time() print(f"started at {time.strftime('%X')}") for url in urls: await fetch(session, url) end = time.time() print(f"started at {time.strftime('%X')}") print(end-start) if __name__ == '__main__': asyncio.run(main())





async-1.py started at 16:06:27 started at 16:07:17 50.67312693595886

as you can see, we are 20 seconds faster just by switching libraries and in only a few URLs.

But you may ask, can it be faster it's not even 10x? The answer lines in python docs and to be honest with you. I think python documentation for asyncio still needs a lot of improvements.

There are three main types of awaitable objects: coroutines, Tasks, and Futures. in our async-1 example, we used coroutines, and they await for each other to finish. We never say they should run concurrently ! for them to make them concurrent, we have to create a task list and gather it. Also please read this.

here is modified version to run tasks concurrently

urls = ["https://0xsha.io","https://twitter.com", "https://google.com", ... ] import aiohttp import asyncio import time async def fetch(session, url): async with session.get(url) as response: return await response.text() async def main(): tasks = [] async with aiohttp.ClientSession() as session: start = time.time() print(f"started at {time.strftime('%X')}") for url in urls: #await fetch(session, url) tasks.append(asyncio.create_task(fetch(session, url))) await asyncio.gather(*tasks) end = time.time() print(f"started at {time.strftime('%X')}") print(end-start) if __name__ == '__main__': asyncio.run(main())

started at 16:25:39 started at 16:25:45 5.263195037841797

5 seconds! Amazing!

Now let's re-create a fast Dir Buster. To achieve this, we can't merely loop true a large file and send it to the server as fast as possible after a few requests server may drop our future requests so we can use the asyncio queue.

We mix all previous examples. We read the files synchronously (for large files we can do that asynchronously as well ) we fill the queue with data we join the queue and cancel all the tasks finally we gather jobs and print any file with response 200.

Here is the code.

import asyncio from aiohttp import ClientSession import time # global lst = [] async def fetch(url, session, queue): while True: # Get a "work item" out of the queue. x = await queue.get() print(x.strip()) # Notify the queue that the "work item" has been processed. queue.task_done() async with session.get(url+x.strip()) as response: if response.status == 200: print(url+x.strip() + "----" + str(response.status)) x = await response.read() lst.append(x) with open("fuzz.txt", "r") as file: #https://github.com/Bo0oM/fuzz.txt/blob/master/fuzz.txt x = file.readlines() url = "http://url.com/" async def run(r): tasks = [] start = time.time() queue = asyncio.Queue() for _ in range(len(x)): # await queue.put(x[_]) queue.put_nowait(x[_]) async with ClientSession() as session: # how many tasks? for i in range(r): task = asyncio.create_task(fetch(url, session, queue)) tasks.append(task) #join them await queue.join() #cancel remaining [task.cancel() for task in tasks] await asyncio.gather(*tasks, return_exceptions=True) # await asyncio.gather(*asyncio.all_tasks(), ).cancel() # https://bugs.python.org/issue29432 # you now have all response bodies in this variable # print(result) end = time.time() print(end - start) if __name__ == '__main__': asyncio.run(run(500))

Results:





http://example.com/access_log----200 http://example.com/index.php----200 http://example.com/error_log----200 http://example.com/forum/----200 http://example.com/flash/----200 http://example.com/robots.txt----200 http://example.com/roundcube/index.php----200

Just 20.817044019699097 seconds! For ~4500 files. IT will take ages using traditional libraries like requests.

Here is full code.

This script is in no way a replacement for tools like dirsearch, dir buster, etc.

Conclusion:

Now we learned about python asyncio. We can leverage it with any async library or you can create your own async libraries. check out an impressive curved list here.

If you write python codes, start using asyncio whenever you can. It will help you to save a lot of time during your researches.

Till then luvs