This article explains how to write a tiny and basic SOCKS 5 server in Python 3.6. I am assuming that you already have a basic understanding of proxy servers.

Introduction

SOCKS is a generic proxy protocol that relays TCP connections from one point to another using intermediate connection (socks server). Originally, SOCKS proxies were mostly used as a circuit-level gateways, that is, a firewall between local and external resources (the internet). However, nowadays it is also popular in censorship circumvention and web scraping.

Throughout the article, I will be referring the RFC 1928 specification which describes SOCKS protocol.

Before reading this article, I recommend you to clone a completed version of the implementation so you can see the full picture.

TCP sessions handling

The SOCKS protocol is implemented on top of the TCP stack, in such way that the client must establish a separate TCP connection with the SOCKS server for each remote server it wants to exchange data with.

So, first of all, we need to create a regular TCP session handler. Python has a built-in socketserver module, which simplifies the task of writing network servers.

from socketserver import ThreadingMixIn , TCPServer , StreamRequestHandler class ThreadingTCPServer ( ThreadingMixIn , TCPServer ): pass class SocksProxy ( StreamRequestHandler ): def handle ( self ): # Our main logic will be here pass if __name__ == '__main__' : with ThreadingTCPServer (( '127.0.0.1' , 9011 ), SocksProxy ) as server : server . serve_forever ()

Here the ThreadingTCPServer creates a threading version of TCP server and listens for incoming connections on a specified address and port. Every time there is a new incoming TCP connection (session) the server spawns a new thread with SocksProxy instance running inside it. It gives us an easy way to handle concurrent connections.

The ThreadingMixIn can be replaced by ForkingTCPServer , which uses forking approach, that is, it spawns a new process for each TCP session.

Connection establishment and negotiation

When a client establishes a TCP session to the SOCKS server, it must send a greeting message.

The message consists of 3 fields:

version nmethods methods 1 byte 1 byte 0 to 255 bytes





Here version field represents a version of the protocol, which equals to 5 in our case. The nmethods field contains the number of authentication methods supported by the client. The methods field consists of a sequence of supported methods by the client. Thus the methods field indicates the length of a methods sequence.

According to the RFC 1928, the supported values of methods field defined as follows:

'00' NO AUTHENTICATION REQUIRED

'01' GSSAPI

'02' USERNAME/PASSWORD

'03' to X'7F' IANA ASSIGNED

'80' to X'FE' RESERVED FOR PRIVATE METHODS

'FF' NO ACCEPTABLE METHODS

When the SOCKS server receives such message, it should choose an appropriate method and answer back. Let's pretend we only support a USERNAME/PASSWORD method.

The format of the answer looks as follows:

version method 1 byte 1 byte





Here is how the whole process looks in Python:

def handle ( self ): # Greating header # read and unpack 2 bytes from a client header = self . connection . recv ( 2 ) version , nmethods = struct . unpack ( "!BB" , header ) # socks 5 assert version == SOCKS_VERSION assert nmethods > 0 # Get available methods methods = self . get_available_methods ( nmethods ) # accept only USERNAME/PASSWORD auth if 2 not in set ( methods ): # close connection self . connection . sendall ( struct . pack ( "!BB" , SOCKS_VERSION , 255 )) self . server . close_request ( self . request ) return # Send server choice self . connection . sendall ( struct . pack ( "!BB" , SOCKS_VERSION , 2 )) def get_available_methods ( self , n ): methods = [] for i in range ( n ): methods . append ( ord ( self . connection . recv ( 1 ))) return methods

Here the recv function reads n bytes from the client and the struct module helps to pack and unpack binary data using specified format.

Once the client has received the server choice, it responds with username and password credentials.

version ulen uname plen passwd 1 byte 1 byte 0 to 255 bytes 1 byte 0 to 255 bytes





The version field represents the authentication version, which is equals to 1 in our case. The ulen and plen fields represent lengths of text fields so the server knows how much data it should read from the client.

The server response should look as follows:

version status 1 byte 1 byte





The status field of 0 indicates a successful authorization, while other values treated as a failure.

Python version of authorization looks as follows:

def verify_credentials ( self ): version = ord ( self . connection . recv ( 1 )) assert version == 1 username_len = ord ( self . connection . recv ( 1 )) username = self . connection . recv ( username_len ) . decode ( 'utf-8' ) password_len = ord ( self . connection . recv ( 1 )) password = self . connection . recv ( password_len ) . decode ( 'utf-8' ) if username == self . username and password == self . password : # Success, status = 0 response = struct . pack ( "!BB" , version , 0 ) self . connection . sendall ( response ) return True # Failure, status != 0 response = struct . pack ( "!BB" , version , 0xFF ) self . connection . sendall ( response ) self . server . close_request ( self . request ) return False

Once the authorization has completed the client can send request details.

version cmd rsv atyp dst.addr dst.port 1 byte 1 byte 1 byte 1 byte 4 to 255 bytes 2 bytes





Where:

VERSION protocol version: '05'

CMD CONNECT '01' BIND '02' UDP ASSOCIATE '03'

RSV RESERVED

ATYP address type of following address IP V4 address: '01' DOMAINNAME: '03' IP V6 address: '04'

DST.ADDR desired destination address

DST.PORT desired destination port in network octet order

The cmd field indicates the type of connection. This article is limited to CONNECT method only, which is used for TCP connections. For more details, please read the SOCKS RFC.

If a client sends a domain name, it should be resolved by the DNS on the server side. Thus a client has no need for a working DNS server when working with SOCKS.

As soon as server establishes a connection to the desired destination it should reply with a status and remote address.

version rep rsv atyp bnd.addr bnd.port 1 byte 1 byte 1 byte 1 byte 4 to 255 bytes 2 bytes





Where:

VER protocol version: X'05'

REP Reply field: '00' succeeded '01' general SOCKS server failure '02' connection not allowed by ruleset '03' Network unreachable '04' Host unreachable '05' Connection refused '06' TTL expired '07' Command not supported '08' Address type not supported '09' to X'FF' unassigned

RSV RESERVED

ATYP address type of following address IP V4 address: '01' DOMAINNAME: '03' IP V6 address: '04'

BND.ADDR server bound address

BND.PORT server bound port in network octet order

Here is how it looks in Python:

# client request version , cmd , _ , address_type = struct . unpack ( "!BBBB" , self . connection . recv ( 4 )) assert version == SOCKS_VERSION if address_type == 1 : # ipv4 address = socket . inet_ntoa ( self . connection . recv ( 4 )) elif address_type == 3 : # domain domain_length = ord ( self . connection . recv ( 1 )[ 0 ]) address = self . connection . recv ( domain_length ) port = struct . unpack ( '!H' , self . rfile . read ( 2 ))[ 0 ] # server reply try : if cmd == 1 : # CONNECT remote = socket . socket ( socket . AF_INET , socket . SOCK_STREAM ) remote . connect (( address , port )) bind_address = remote . getsockname () else : self . server . close_request ( self . request ) addr = struct . unpack ( "!I" , socket . inet_aton ( bind_address [ 0 ]))[ 0 ] port = bind_address [ 1 ] reply = struct . pack ( "!BBBBIH" , SOCKS_VERSION , 0 , 0 , address_type , addr , port ) except Exception as err : # return Connection refused error reply = self . generate_failed_reply ( address_type , 5 ) self . connection . sendall ( reply ) # Establish data exchange if reply [ 1 ] == 0 and cmd == 1 : self . exchange_loop ( self . connection , remote ) self . server . close_request ( self . request )

If server's reply indicates a success, the client may now start passing the data. In order to work with both client and remote hosts concurrently we can use select library which supports select and pool Unix interfaces.

Here is how we can read and resend data in one loop both from client and remote host:

def exchange_loop ( self , client , remote ): while True : # wait until client or remote is available for read r , w , e = select . select ([ client , remote ], [], []) if client in r : data = client . recv ( 4096 ) if remote . send ( data ) <= 0 : break if remote in r : data = remote . recv ( 4096 ) if client . send ( data ) <= 0 : break

That's it! We have got a working SOCKS 5 proxy.

Now we can test it using curl :

curl -v --socks5 127 .0.0.1:9011 -U username:password https://github.com



