You may use the socket.makefile method to use this file i/o approach for sockets.

Here’s an example of defining, instantiating, writing, and reading a struct using file i/o:

protlib builds on the struct and SocketServer modules in the standard library to make it easy to implement binary network protocols. It provides support for default and constant struct fields, nested structs, arrays of structs, better handling for strings and arrays, struct inheritance, and convenient syntax for instantiating and using your custom structs.

You may download older versions of protlib and view older versions of the protlib documentation here .

You may also check out the development version of protlib with this command:

You may click here to download protlib. You may also run easy_install protlib if you have EasyInstall on your system. The project page for protlib in the Cheese Shop (aka the Python Package Index or PyPI) may be found here .

protlib is free under the BSD license. It requires Python 2.6 or later and has no other dependencies. Because protlib supports Python 3, the code snippets in this documentation are copied from a Python 3 interpreter.

then the order of the x and y fields is undefined since they share the same CInt instance. In this second case, a CWarning will be triggered, but the first case is not automatically detected by the protlib library.

then when you serialize your struct, the y field will come before the x field because its CInt value was instantiated first. Similarly, if you say

The order of struct fields is defined by the order in which the CType subclasses for those fields were instantiated. In other words, if you say

Returns a list of the CType objects which define the fields of this struct in the order in which they were declared.

Returns an objects which may be used to declare a CStruct as a field in another CStruct . This accepts the same default and always parameters as the CType constructor. For example:

When you assign a value to one of a struct’s fields, protlib converts the value to the proper data type, according to the data type. For example:

Returns the packed binary data representing this CStruct . This is what should be written to files and sockets.

Accepts a string or file-like object and returns an instance of this CStruct drawn from that data source.

Returns the size of the packed binary data needed to hold this CStruct . This method takes no arguments on a fixed-size struct, but if any of this struct’s fields has a variable length, this method will throw an exception if called with no arguments. You can pass an instance of this CStruct to get the size of that particular instance, for example:

This should never be instantiated directly. Instead, you should subclass this when defining a custom struct. Your subclass will be given a constructor which takes the fields of your struct as positional and/or keyword arguments. However, you don’t have to provide values for your fields at this time. For example:

Arrays may either be given default/always values themselves or use the default/always values of the CType they are given. For example:

You can make an array of any CType . Arrays pack and unpack to and from Python lists. For example:

This code works in both Python 2 and Python 3 and demonstrates the three methods you can override to define your custom parsing and serialization:

Some projects might require you to write custom parsing and serializing code; protlib makes this easy by allowing you to subclass CType classes. Here’s an example, which you can find in examples/ctype_subclassing/testing.py :

Because protlib is built on top of struct module, each basic data type in protlib uses a struct format string. The list of struct format strings is here and the protlib types which use them are listed below. These sizes are constant on all processor architectures by default, but this will change if you change the value of protlib.BYTE_ORDER

Serializes the value according to the specific CType class. Note that this takes no argument when called on a CStruct instance.

Note that this is a classmethod on subclasses of CStruct .

Accepts either a string or a file-like object (anything with a read method) and returns a Python object with the appropriate value.

The format string used by the underying struct module to represent the packed binary data format. Note that this is a classmethod for subclasses of CStruct .

The size of the packed binary data representing this CType . Note that this is a classmethod for subclasses of CStruct .

During handling of the above exception, another exception occurred:

Some unicode character encodings commonly contain null bytes, which makes it inadvisable to use those encodings with an AUTOSIZED string. For example:

warn("CUnicode value has length {0} and was told to serialize an encoded string of length {1} {2!r}".format(self.real_length(cstruct), len(encoded), encoded), CWarning)

/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py:384: CWarning: CUnicode value has length 5 and was told to serialize an encoded string of length 6 b'andr\xc3\xa9'

The length parameter of the CUnicode class indicates the max length of the raw serialized bytes of the CUnicode field. It does not indicate the number of unicode characters. For example, a 5-character unicode string might serialize to more than 5 bytes:

During handling of the above exception, another exception occurred:

enc_errors : This optional parameter is only valid for CUnicode objects. It defined how errors are handled, e.g. by being passed as the errors argument to the errors argument to the unicode builtin. If omitted, it defaults to “strict” . For example:

encoding : This is required for CUnicode objects but invalid for all other types. It specifies the encoding to use when translating to and from unicode and raw bytes. For example:

full_string : Unlike the struct module, protlib right-strips strings when they’re parsed, starting with the first null byte. This default behavior can be overridden by setting this parameter to True . For example:

default : Like the always parameter, except that no warnings are raised when a different value is parsed or serialized. Also, a default parameter may be either a value or a callable object. For example:

warn("{0}.{1} should always be {2!r} but was given a value of {3!r}".format(self.__class__.__name__, name, field.always, value), CWarning)

/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py:733: CWarning: OriginPoint.x should always be 0 but was given a value of 5

warn("{0}.{1} should always be {2!r} but was given a value of {3!r}".format(self.__class__.__name__, name, field.always, value), CWarning)

/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py:733: CWarning: OriginPoint.x should always be 0 but was given a value of 5

always : Use this to set a constant value for a field. You won’t need to specify this value, and a CWarning will be triggered if this field is ever assigned a different value. For example:

length : Only valid for the CString , CUnicode , and CArray data types, for which it is required. This may be one of three things: an integer which represents the length of the string; the special value protlib.AUTOSIZED , which indicates that the string is null-terminated and can be any size; or a string denoting the field where the actual length value may be found. For example:

This is the root class of all classes representing C data types in the protlib library. It may not be directly instantiated; you must always use one of its subtypes instead. There are five optional keyword arguments which you may pass to a CType:

Protocol Handlers¶

protlib also provides a convenient framework for implementing servers which receive and send CStruct objects. This makes it easy to implement custom binary protocols in which structs are passed back and forth over socket connections. This is based on the SocketServer module in the Python standard library.

In order to use these examples, you must do only two things.

First, make sure that each struct which represents a message begins with a constant value which uniquely identifies that struct.

Second, define a subclass of the appropriate handler class, either TCPHandler or UDPHandler , and define a handler method for each message type you wish to respond to.

An example client/server¶ Let’s walk through a simple example. We’ll define several structs to represent geometric concepts: a Point, a Vector, and a Rectangle. Each of these structs is a message which can be sent between the client and server. We’ll also define a variable-length message called PointGroup, which demonstrates using variable-length arrays. Note that first field in each of these messages is a constant value that uniquely identifies the message. This entire example can be found in the examples/geocalc directory. Here’s the common.py file, which is imported by both the server.py and client.py programs: import logging logging . basicConfig ( level = logging . INFO ) from protlib import * SERVER_ADDR = ( "127.0.0.1" , 32123 ) class Point ( CStruct ): code = CShort ( always = 1 ) x = CFloat () y = CFloat () class Vector ( CStruct ): code = CShort ( always = 2 ) p1 = Point . get_type () p2 = Point . get_type () class Rectangle ( CStruct ): code = CShort ( always = 4 ) points = CArray ( 4 , Point ) class PointGroup ( CStruct ): code = CShort ( always = 3 ) count = CInt () points = CArray ( "count" , Point ) For our server, we define a handler class with a handler method for each message we wish to accept. The name of each handler method should be the name of the message class in lower case with the words separated by underscores. For example, the Vector class is handled by the vector method, and the PointGroup class is handled by the point_group method. Each of these handler methods takes a single parameter other than self which is the actual message read and parsed from the socket. Here’s the server.py file which uses our subclasses of the SocketServer module classes to accept and handle incoming messages: from math import sqrt from common import * class Handler ( TCPHandler ): LOG_TO_SCREEN = True def vector ( self , v ): """returns the mid-point of the line segment""" return Point ( x = ( v . p1 . x + v . p2 . x ) / 2 , y = ( v . p1 . y + v . p2 . y ) / 2 ) def rectangle ( self , r ): """returns the endpoint closest to the origin""" dists = [( sqrt ( p . x ** 2 + p . y ** 2 ), p ) for p in r . points ] return min ( dists )[ 1 ] def point_group ( self , pg ): """returns a rectangle which encompasses all points in the group""" xmin = min ( p . x for p in pg . points ) xmax = max ( p . x for p in pg . points ) ymin = min ( p . y for p in pg . points ) ymax = max ( p . y for p in pg . points ) return Rectangle ( points = [ Point ( x = xmin , y = ymin ), Point ( x = xmin , y = ymax ), Point ( x = xmax , y = ymin ), Point ( x = xmax , y = ymax ) ]) if __name__ == "__main__" : LoggingTCPServer ( SERVER_ADDR , Handler ) . serve_forever () To test this server, we have a simple client which sends a series of messages to the server and then reads back the responses, logging everything with our protlib.Logger class. Here’s our client.py script: import socket from random import randrange from common import * def rand_point (): return Point ( x = randrange ( 100 ), y = randrange ( 100 )) logger = Logger ( also_print = True ) parser = Parser ( logger ) sock = socket . create_connection ( SERVER_ADDR ) f = sock . makefile ( "rwb" , 0 ) vec = Vector ( p1 = rand_point (), p2 = rand_point ()) logger . log_and_write ( f , vec ) pt = parser . parse ( f ) assert vec . p1 . x < pt . x < vec . p2 . x or vec . p1 . x > pt . x > vec . p2 . x assert vec . p1 . y < pt . y < vec . p2 . y or vec . p1 . y > pt . y > vec . p2 . y rect = Rectangle ( points = [ Point ( x = 1 , y = 1 ), Point ( x = 1 , y = 5 ), Point ( x = 5 , y = 1 ), Point ( x = 5 , y = 5 )]) logger . log_and_write ( f , rect ) pt = parser . parse ( f ) assert pt . x == pt . y == 1 points = [ rand_point () for i in range ( 10 )] logger . log_and_write ( f , PointGroup ( count = 10 , points = points )) rect = parser . parse ( f ) assert rect . code == Rectangle . code . always sock . close () Our server does all of our logging automatically, but we need to manually invoke the logger on the client. The logs created and their format are explained below.

Logging¶ protlib uses the logging module to provide 5 different logs, each with their own suffix: hex, raw, struct, error, and stack. By default, the prefix of these logs will be the name of the current script. A RotatingFileHandler is created for each of these logs if no handlers already exist when the logs are first accessed by protlib. For example, if you’re running the script server.py then these will be the log names, log file names, logging level used for the log messages, and type of messages written to each log: log name default filename level messages server.hex server.hex_log DEBUG nicely formatted hex dumps of the binary data sent and received server.raw server.raw_log INFO Python string literals of the binary data sent and received server.struct server.struct_log WARNING literal representations of each struct sent and received server.error server.error_log ERROR error messages server.stack server.stack_log CRITICAL stack traces of uncaught exceptions thrown by handler methods Each log message generated by one of our protocol handlers contains a unique identifier which indicates the binary protocol message received. This makes it easy to match the log messages in the different files to one another, since this unique message identifier will be present in each of the 5 logs. Log examples¶ Here’s a description of each log: struct This contains the literal representation of each request and response, for example: 2010-03-15 18:54:07,664: (1268693647_0) received Vector(code=2, p1=Point(code=1, x=39.0, y=41.0), p2=Point(code=1, x=93.0, y=13.0)) 2010-03-15 18:54:07,664: (1268693647_0) sending Point(code=1, x=66.0, y=27.0) This is convenient because the structs are logged with the Python code which represents them. Therefore we can paste them directly into a Python command prompt to inspect and play around with them: >>> from common import * >>> p = Point ( code = 1 , x = 66.0 , y = 27.0 ) >>> p Point(code=1, x=66.0, y=27.0) raw ¶ This contains the raw data in the form of a Python string of each request and response, for example: 2010-03-15 18:54:07,664: (1268693647_0) sending b'\x00\x01B\x84\x00\x00A\xd8\x00\x00' 2010-03-15 18:54:07,667: (1268693647_1) received b'\x00\x04\x00\x01?\x80\x00\x00?\x80\x00\x00\x00\x01?\x80\x00\x00@\xa0\x00\x00\x00\x01@\xa0\x00\x00?\x80\x00\x00\x00\x01@\xa0\x00\x00@\xa0\x00\x00' This is convenient because we can paste these strings into a Python command prompt and play around with them. If they are valid then we can parse them into structs, and if they aren’t then we can examine exactly why; this log will always contain what we receive even in the case of unparsable binary data: >>> from common import * >>> s = b ' \x00\x01 B \x84\x00\x00 A \xd8\x00\x00 ' >>> p = Point . parse ( s ) >>> p Point(code=1, x=66.0, y=27.0) >>> >>> s = b "bad" >>> p = Point . parse ( s ) >>> Point . parse ( s ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> File "protlib.py" , line 230 , in parse return cls . get_type ( cached = True ) . parse ( f ) File "protlib.py" , line 141 , in parse raise CError ( "{0} requires {1} bytes and was only given {2} ({3!r})" . format ( self . subclass . __name__ , self . sizeof , len ( buf ), buf )) protlib.CError : Point requires 10 bytes and was only given 3 ('bad') >>> >>> s = b "invalid but with enough data" >>> p = Point . parse ( s ) ../../protlib.py:526: CWarning: Point.code should always be 1 but was given a value of 26990 warn("{0}.{1} should always be {2!r} but was given a value of {3!r}".format(self.__class__.__name__, name, field.always, value), CWarning) >>> p Point(code=26990, x=1.1430328245747994e+33, y=1.1834294514326081e+22) hex ¶ This contains nicely-formatted tables of the binary data sent and received in hexadecimal notation. For example: 2010-03-15 18:38:50,978: (1268692730_0) received 0 1 2 3 4 5 6 7 0 00 02 00 01 42 30 00 00 8 42 74 00 00 00 01 42 aa 16 00 00 42 18 00 00 error ¶ This contains messages for common errors, such as when a message is too short, or when we have no handler to match a message we’ve received, etc. These messages contain as much information as possible to help reconstruct the problem, which usually includes the raw data involved (also present in the raw log). stack ¶ This contains stack traces from exceptions thrown in your handler methods. Logger objects¶ Although logging is performed automatically when using SocketServer classes, you may find it useful to instantiate your own logger objects, then manually make use of the 5 logs listed above. Use this object to do that; note that this class uses but does not inherit from the logging.Logger class. class Logger ( [ prefix [ , also_print=False ] ] ) ¶ A logging object which uses the 5 logs listed above. Parameters: prefix – Pass a string as this parameter to replace the default prefix (which is the name of the script being executed). For example, if you pass the string "foo" as this parameter, then your logs will be named foo.hex , foo.raw , etc.

– Pass a string as this parameter to replace the default prefix (which is the name of the script being executed). For example, if you pass the string as this parameter, then your logs will be named , , etc. also_print – whether to also print log messages to the screen log_struct ( inst [ , trans_type="received" ] ) ¶ Logs the repr of an instance of a CStruct subclass to the struct log. Parameters: inst – the instance of the struct to be logged

– the instance of the struct to be logged trans_type – a prefix to the log message, generally this should be either "sending" or "received" log_binary ( data [ , trans_type="received" ] ) ¶ Logs the repr of the packed binary data to the raw log, then logs a nicely formatted table of thje data to the hex log. Parameters: data – the packed binary data, such as what’s produced by calling s.serialize() on an instance of a CStruct subclass

– the packed binary data, such as what’s produced by calling on an instance of a subclass trans_type – a prefix to the log message, generally this should be either "sending" or "received" log_error ( message, *args, **kwargs ) ¶ Logs the message to the error log. The message parameter should be a string, and the *args and **kwargs to this method are used as the parameters to str.format log_stacktrace ( ) ¶ Logs the value of traceback.format_exc() to the stack log. log_and_write ( f, data ) ¶ Logs a string or CStruct instance to the appropriate logs, then writes it to a file. Parameters: f – a file object to which data will be written

– a file object to which data will be written message – a string or CStruct instance Advanced logging¶ As mentioned above, protlib automatically sets up a RotatingFileHandler when you instantiate protlib.Logger on each of the 5 logs for which no other logging handlers are defined. Because protlib uses the logging module from the standard library, you can use your own configuration, handlers, formatters, etc. This is demonstated by the following example, which is included as the file examples/custom_logging/testing.py , although you’ll need to replace the string "smtp.example.com" with a valid outgoing mail server for the code to run properly. import sys import time import logging from logging.handlers import SMTPHandler , TimedRotatingFileHandler from protlib import * class Point ( CStruct ): code = CShort ( always = 0x1234 ) x = CInt () y = CInt () logging . basicConfig ( level = logging . DEBUG ) trfh = TimedRotatingFileHandler ( "testing.rotating_log" , "s" , 1 ) logging . getLogger ( "testing.hex" ) . addHandler ( trfh ) logger = Logger () parser = Parser ( logger ) smtp = SMTPHandler ( "smtp.example.com" , "bugs@example.com" , [ "eli@example.com" ], "Stack Trace" ) logging . getLogger ( "testing.stack" ) . addHandler ( smtp ) if __name__ == "__main__" : with open ( "point.dat" , "w" ) as f : p1 = Point ( x = 5 , y = 6 ) logger . log_and_write ( f , p1 ) time . sleep ( 2 ) with open ( "point.dat" ) as f : p2 = parser . parse ( f ) try : Point ( x = "not an integer" ) except CError : logger . log_stacktrace () Here’s an explanation of the customizations made to our logging: The logging level is set to logging.DEBUG , which differs from the default value of logging.WARNING .

, which differs from the default value of . We use a TimedRotatingFileHandler for our hex log. Because we add this handler before instantiating protlib.Logger , this handler is used instead of the default RotatingFileHandler .

log. Because we add this handler instantiating , this handler is used the default . We use a SMTPHandler for our stack log. Because we add this handler after instantiating protlib.Logger , this is used in addition to the default RotatingFileHandler .

Protocol Handler Classes¶ As mentioned above, you should always have your protocol classes extend either the TCPHandler or UDPHandler class, depending on what type of SocketServer you’re using. Each of these classes inherits from ProtHandler , and you may use these methods and fields to affect the behavior of your custom protocol handlers: class ProtHandler ¶ The user does not instantiate this class or any of its subclasses directly. Instead, you declare your own handler class which subclasses either TCPHandler or UDPHandler , which are themselves subclasses of ProtHandler . They also extend the StreamRequestHandler and DatagramRequestHandler classes of the SocketServer module, respectively. This class also inherits from the protlib.Logger class, so you can call the log functions listed above from your handler methods by simply calling self.log_stack() , self.log_error("Boo!") , etc. STRUCT_MOD ¶ By default, your handler will detect all messages present in the same module where the handler class itself is defined. So you can either define your handler in the same module where your structs are defined, or you can import those structs into the handler’s module. This is the recommended way to integrate your handlers with your struct definitions. However, you may instead set the STRUCT_MOD field to the module where the structs are declared. (Technically this can be anything with __dict__ and __name__ fields.) You may also set this to a string which is the name of the module where they are declared. For example: import module_with_structs class SomeHandler ( TCPHandler ): STRUCT_MOD = module_with_structs # handler methods would go here class AnotherHandler ( UDPHandler ): STRUCT_MOD = "module_with_structs" # handler methods would go here LOG_TO_SCREEN ¶ This is False by default, but if set to True , every log message will be printed to the screen in addition to being written to the appropriate log. LOG_PREFIX ¶ Changes the prefix of each log from the name of the current script to whatever is specified. For example, if you set the LOG_PREFIX to "foo" , then your logs will be foo.hex , foo.raw , etc. These attributes are best set where your custom handler class is defined, for example: class Handler ( TCPHandler ): LOG_TO_SCREEN = True LOG_PREFIX = "unified" # handler methods would go here raw_data ( data ) ¶ This is the default handler for any message for which no struct has been defined. By default this logs an error message and sends no reply. Override this if you wish to have your own handler for unclassified binary messages; the data parameter is a string containing the binary data of the message. reply ( data ) ¶ Anything you return a handler method is sent back to the client, whether it’s a struct or just binary data in a string. However, sometimes you may need to send multiple messages back to the client. You can manually concatenate the binary data strings, or you can use the reply method, for example: class RepeatRequest ( CStruct ): code = CShort ( always = 1 ) name = CString ( length = 25 ) repititions = CInt () class Handler ( TCPHandler ): def repeat_request ( self , rr ): for i in range ( rr . repititions ): self . reply ( b "Hello " + sm . name + b "!

" ) class LoggingTCPServer ( addr, handler_class ) ¶ class LoggingUDPServer ( addr, handler_class ) ¶ These classes extend the TCPServer and the UDPServer classes from the SocketServer module, respectively. There are only two differences between these and their parent classes: The allow_reuse_address field is set to True for these classes.

When your protocol handler is used with one of these classes, the logging level of the default RotatingFileHandler objects is set to INFO . When it’s used with other classes, it’s set to CRITICAL + 1 . Note that this is the level of the handlers, which is independent of the level of the loggers themselves, as explained here. So basically, using these classes simply provides sensible default settings for your logs and sockets. class Parser ( [ logger [ , module ] ] ) ¶ If you know what struct you want, then you can use the CStruct.parse classmethod to read an instance of that struct from a file, e.g. p = Point.parse(f) . However, in some cases you want to read some data from a file or socket but aren’t sure what message is coming across. This class’s parse method figures out which message is being read and returns an instance of the correct struct. Parameters: module – This is exactly the same as the ProtHandler.STRUCT_MOD field; if present then it indicates which module contains the struct classes you want to use. If omitted, then the module where this class is instantiated is used.

– This is exactly the same as the field; if present then it indicates which module contains the struct classes you want to use. If omitted, then the module where this class is instantiated is used. logger – The instance of the Logger class to use to perform logging. If omitted, the logging level of each default RotatingFileHandler will be CRITICAL + 1 . parse ( f ) ¶ This method accepts a string or file and returns an instance of the struct it reads from that string/file. If the data it finds cannot be parsed into a struct, then it just returns all of the data it is able to read. This may be an empty string if no data is available. Any data returned will be written to the appropriate logs. None will be returned in the case of an incomplete message. In this case a message will be written to the error log.