Snyppets - Python snippets

This page contains a bunch of miscellaneous Python code snippets, recipes, mini-guides, links, examples, tutorials and ideas, ranging from very (very) basic things to advanced. I hope they will be usefull to you. All snippets are kept in a single HTML page so that you can easily ❶save it for offline reading (and keep on a USB key) ❷search in it.

(Don't forget to read my main Python page ( http://sebsauvage.net/python/ ): there is handful of other programs and a guides.)

Advertising



To avoid dodgy websites,

install WOT

Send a file using FTP

Piece of cake.

import ftplib # We import the FTP module

session = ftplib.FTP('myserver.com','login','passord') # Connect to the FTP server

myfile = open('toto.txt','rb') # Open the file to send

session.storbinary('STOR toto.txt', myfile) # Send the file

myfile.close() # Close the file

session.quit() # Close FTP session



Queues (FIFO) and stacks (LIFO)

Python makes using queues and stacks a piece of cake (Did I already say "piece of cake" ?).

No use creating a specific class: simply use list objects.

For a stack (LIFO), stack with append() and destack with pop() :

>>> a = [5,8,9]

>>> a.append(11)

>>> a

[5, 8, 9, 11]

>>> a.pop()

11

>>> a.pop()

9

>>> a

[5, 8]

>>>





For a queue (FIFO), enqueue with append() and dequeue with pop(0) :

>>> a = [5,8,9]

>>> a.append(11)

>>> a

[5, 8, 9, 11]

>>> a.pop(0)

5

>>> a.pop(0)

8

>>> a

[9, 11]





As lists can contain any type of object, you an create queues and stacks of any type of objects !

(Note that there is also a Queue module, but it is mainly usefull with threads.)

A function which returns several values

When you're not accustomed with Python, it's easy to forget that a function can return just any type of object, including tuples.

This a great to create functions which return several values. This is typically the kind of thing that cannot be done in other languages without some code overhead.

>>> def myfunction(a):

return (a+1,a*2,a*a)

>>> print myfunction(3)

(4, 6, 9)



You can also use mutiple assignment:

>>> (a,b,c) = myfunction(3)

>>> print b

6

>>> print c

9



And of course your functions can return any combination/composition of objects (strings, integer, lists, tuples, dictionnaries, list of tuples, etc.).







Exchanging the content of 2 variables

In most languages, exchanging the content of two variable involves using a temporary variable.

In Python, this can be done with multiple assignment.

>>> a=3

>>> b=7

>>> (a,b)=(b,a)

>>> print a

7

>>> print b

3



In Python, tuples, lists and dictionnaries are your friends, really !

Highly recommended reading: Dive into Python (http://diveintopython.net/). The first chapter contains a nice tutorial on tuples, lists and dictionnaries. And don't forget to read the rest of the book (You can download the entire book for free).







Getting rid of duplicate items in a list

The trick is to temporarly convert the list in into a dictionnary:

>>> mylist = [3,5,8,5,3,12]

>>> print dict().fromkeys(mylist).keys()

[8, 3, 12, 5]

>>>



Since Python 2.5, you can also use sets:

>>> mylist = [3,5,8,5,3,12]

>>> print list(set(mylist))

[8, 3, 12, 5]

>>>









Get all links in a web page (1)

... or regular expression marvels.

import re, urllib

htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)

linksList = re.findall('<a href=(.*?)>.*?</a>',htmlSource)

for link in linksList:

print link



Get all links in a web page (2)

You can also use the HTMLParser module.

import HTMLParser, urllib



class linkParser(HTMLParser.HTMLParser):

def __init__(self):

HTMLParser.HTMLParser.__init__(self)

self.links = []

def handle_starttag(self, tag, attrs):

if tag=='a':

self.links.append(dict(attrs)['href'])



htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)

p = linkParser()

p.feed(htmlSource)

for link in p.links:

print link





For each HTML start tag encountered, the handle_starttag() method will be called.

For example <a href="http://google.com> will trigger the method handle_starttag(self,'A',[('href','http://google.com')]) .

See also all others handle_*() methods in Pyhon manual.

(Note that HTMLParser is not bullet-proof: it will choke on ill-formed HTML. In this case, use the sgmllib module, go back to regular expressions or use BeautifulSoup.)



Get all links in a web page (3)

Still hungry ?



Beautiful Soup is a Python module which is quite good at extracting data from HTML.

Beautiful Soup's main advantages are its ability to handle very bad HTML code and its simplicity. Its drawback is its speed (it's slow).

import urllib

import BeautifulSoup



htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)

soup = BeautifulSoup.BeautifulSoup(htmlSource)

for item in soup.fetch('a'):

print item['href']

Get all links in a web page (4)

Look ma ! No parser nor regex.



import urllib



htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)

for chunk in htmlSource.lower().split('href=')[1:]:

indexes = [i for i in [chunk.find('"',1),chunk.find('>'),chunk.find(' ')] if i>-1]

print chunk[:min(indexes)]



Zipping/unzipping files

Zipping a file:

import zipfile

f = zipfile.ZipFile('archive.zip','w',zipfile.ZIP_DEFLATED)

f.write('file_to_add.py')

f.close()



Replace 'w' with 'a' to add files to the zip archive.

Unzipping all files from a zip archive:

import zipfile

zfile = zipfile.ZipFile('archive.zip','r')

for filename in zfile.namelist():

data = zfile.read(filename)

file = open(filename, 'w+b')

file.write(data)

file.close()



import zipfile

f = zipfile.ZipFile('archive.zip','w',zipfile.ZIP_DEFLATED)

startdir = "c:\\mydirectory"

for dirpath, dirnames, filenames in os.walk(startdir):

for filename in filenames:

f.write(os.path.join(dirpath,filename))

f.close()



Listing the content of a directory

You have 4 ways of doing this, depending on your need.

The listdir() method returns the list of all files in a directory:

import os

for filename in os.listdir(r'c:\windows'):

print filename



Note that you can use the fnmatch() module to filter file names.

The glob module wraps listdir() and fnmatch() into a single method:

import glob

for filename in glob.glob(r'c:\windows\*.exe'):

print filename



And if you need to collect subdirectories, use os.path.walk() :

import os.path

def processDirectory ( args, dirname, filenames ):

print 'Directory',dirname

for filename in filenames:

print ' File',filename



os.path.walk(r'c:\windows', processDirectory, None )



os.path.walk() works with a callback: processDirectory() will be called for each directory encountered.

dirname will contain the path of the directory.

filenames will contain a list of filenames in this directory.

You can also use os.walk(), which works in a non-recursive way and is somewhat easier to understand.

import os

for dirpath, dirnames, filenames in os.walk('c:\\winnt'):

print 'Directory', dirpath

for filename in filenames:

print ' File', filename



A webserver in 3 lines of code

import BaseHTTPServer, SimpleHTTPServer

server = BaseHTTPServer.HTTPServer(('',80),SimpleHTTPServer.SimpleHTTPRequestHandler)

server.serve_forever()



This webserver will serve files in the current directory. You can use os.chdir() to change the directory.

This trick is handy to serve or transfer files between computers on a local network.

Note that this webserver is pretty fast, but can only serve one HTTP request at time. It's not recommended for high-traffic servers.

If you want better performance, have a look at asynchronous sockets (asyncore, Medusa...) or multi-thread webservers.







Creating and raising your own exceptions

Do not consider exception as nasty things which want to break you programs. Exceptions are you friend. Exceptions are a Good Thing. Exceptions are messengers which tell you that something's wrong, and what is wrong. And try/except blocks will give you the chance to handle the problem.

In your programs, you should also try/catch all calls that may fall into error (file access, network connections...).

It's often usefull to define your own exceptions to signal errors specific to your class/module.

Here's an example of defining an exception and a class (say in myclass.py ):

class myexception(Exception):

pass



class myclass:

def __init__(self):

pass

def dosomething(self,i):

if i<0:

raise myexception, 'You made a mistake !'



( myexception is a no-brainer exception: it contains nothing. Yet, it is usefull because the exception itself is a message.)

If you use the class, you could do:

import myclass

myobject = myclass.myclass()

myobject.dosomething(-2)



If you execute this program, you will get:

Traceback (most recent call last):

File "a.py", line 3, in ?

myobject.dosomething(-2)

File "myclass.py", line 9, in dosomething

raise myexception, 'You made a mistake !'

myclass.myexception: You made a mistake !



myclass tells you you did something wrong. So you'd better try/catch, just in case there's a problem:

import myclass

myobject = myclass.myclass()

try:

myobject.dosomething(-2)

except myclass.myexception:

print 'oops ! myclass tells me I did something wrong.'

This is better ! You have a chance to do something if there's a problem.







Scripting Microsoft SQL Server with Python

If you have Microsoft SQL Server, you must have encountered this situation where you tell yourself «If only I was able to script all those clicks in Enterprise Manager (aka the MMC) !».

You can ! It's possible to script in Python whatever you can do in the MMC.

You just need the win32all python module to access COM objects from within Python (see http://starship.python.net/crew/mhammond/win32/)

(The win32all module is also provided with ActiveState's Python distribution: http://www.activestate.com/Products/ActivePython/)

Once installed, just use the SQL-DMO objects.

For example, get the list of databases in a server:

from win32com.client import gencache

s = gencache.EnsureDispatch('SQLDMO.SQLServer')

s.Connect('servername','login','password')

for i in range(1,s.Databases.Count):

print s.Databases.Item(i).Name

Or get the script of a table:

database = s.Databases('COMMERCE')

script = database.Tables('CLIENTS').Script()

print script

Accessing a database with ODBC

Under Windows, ODBC provides an easy way to access almost any database. It's not very fast, but it's ok.

You need the win32all python module.

First, create a DSN (for example: 'mydsn'), then:

import dbi, odbc

conn = odbc.odbc('mydsn/login/password')

c = conn.cursor()

c.execute('select clientid, name, city from client')

print c.fetchall()

Nice and easy !

You can also use fetchone() or fetchmany(n) to fetch - respectively - one or n rows at once.

Note : On big datasets, I have quite bizarre and unregular data truncations on tables with a high number of columns. Is that a bug in ODBC, or in the SQL Server ODBC driver ? I will have to investigate...







Accessing a database with ADO

Under Windows, you can also use ADO (Microsoft ActiveX Data Objects) instead of ODBC to access databases. The following code uses ADO COM objects to connect to a Microsoft SQL Server database, retreive and display a table.

import win32com.client

connexion = win32com.client.gencache.EnsureDispatch('ADODB.Connection')

connexion.Open("Provider='SQLOLEDB';Data Source='myserver';Initial Catalog='mydatabase';User ID='mylogin';Password='mypassword';")

recordset = connexion.Execute('SELECT clientid, clientName FROM clients')[0]

while not recordset.EOF:

print 'clientid=',recordset.Fields(0).Value,' client name=',recordset.Fields(1).Value

recordset.MoveNext()

connexion.Close()

CGI under Windows with TinyWeb

TinyWeb is a one-file webserver for Windows (the exe is only 53 kb). It's fantastic for making instant webservers and share files. TinyWeb is also capable of serving CGI.

Let's have some fun and create some CGI with Python !

First, let's get and install TinyWeb:

Get TinyWeb from http://www.ritlabs.com/tinyweb/ (it's free, even for commercial use !) and unzip it to c:\somedirectory (or any directory you'd like). Create the " www " subdirectory in this directory Create index.html in the www directory:

<html><body>Hello, world !</body></html> Run the server: tiny.exe c:\somedirectory\www

(make sure you use an absolute path) Point your browser at http://localhost

If you see "Hello, world !", it means that TinyWeb is up and running.

Let's start making some CGI:

In the www directory, create the " cgi-bin " subdirectory. Create hello.py containing:

print "Content-type: text/html"

print

print "Hello, this is Python talking !" Make sure Windows always uses python.exe when you double-clic .py files.

(SHIFT+rightclick on a .py file, "Open with...", choose python.exe,

check the box "Always use this program...", click Ok) Point your browser at http://localhost/cgi-bin/hello.py

You should see "Hello, this is Python talking !" (and not the source code).

If it's ok, you're done !

Now you can make some nice CGI.

(If this does not work, make sure the path to python.exe is ok and that you used an absolute path in tinyweb's command line.)

Note that this will never be as fast as mod_python under Apache (because TinyWeb will spawn a new instance of the Python interpreter for each request on a Python CGI). Thus it's not appropriate for high-traffic production servers, but for a small LAN, it can be quite handy to serve CGI like this.

Refer to Python documentation for CGI tutorials and reference.

Hint 1: Don't forget that you can also use TinySSL, which is the SSL/HTTPS enabled version of TinyWeb. That's fantastic for making secure webservers (especially to prevent LAN sniffing, when authentication is required).

Don't forget that you can also use TinySSL, which is the SSL/HTTPS enabled version of TinyWeb. That's fantastic for making secure webservers (especially to prevent LAN sniffing, when authentication is required). Hint 2: If you wrap your Python CGI with py2exe, you'll be able to run your CGI on computers where Python is not installed.

Sub-hint: Compress all exe/dll/pyd with UPX, and you can take the whole webserver and its CGI on a floppy disk and run it everywhere ! (A typical " Hello, world ! " CGI example and TinyWeb weight together only 375 kb with Python 2.2 !)



If you wrap your Python CGI with py2exe, you'll be able to run your CGI on computers where Python is not installed. Compress all exe/dll/pyd with UPX, and you can take the whole webserver and its CGI on a floppy disk and run it everywhere ! (A typical " " CGI example and TinyWeb weight together only 375 kb with Python 2.2 !) Hint 3: When serving files (not CGI), TinyWeb uses Windows file extension Content-type mapping (like .zip = application/x-zip-compressed ). If you find that Content-type is wrong, you can correct using the following file: tinyweb.reg.

When serving files (not CGI), TinyWeb uses Windows mapping (like = ). If you find that is wrong, you can correct using the following file: tinyweb.reg. Hint 4: Under Windows there is a trick to send binary files correctly in CGI: You need to change stdout mode from text mode to binary mode. This is required on Windows only:

import sys

if sys.platform == "win32":

import os, msvcrt

msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) (code taken from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65443/ )

Creating .exe files from Python programs

py2exe

cx_Freeze

pyInstaller

py2exe

myprogram.py

myprogram.exe

setup.py

from distutils.core import setup

import py2exe

setup(name="myprogram",scripts=["myprogram.py"],)

Then run:

python setup.py py2exe

py2exe will get all dependant files and write them in the \dist subdirectory. You will typically find your program as .exe , pythonXX.dll and complementary .pyd files. Your program will run on any computer even if Python is not installed. This also works for CGI.

(Note that if your program uses tkinter, there is a trick.)

Hint : Use UPX to compress all dll / exe / pyd files. This will greatly reduce file size. Use: upx --best *.dll *.exe *.pyd (Typically, python22.dll shrinks from 848 kb to 324 kb.)

creating a single EXE

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-

from distutils.core import setup

import py2exe



setup(

options = {"py2exe": {"compressed": 1, "optimize": 0, "bundle_files": 1, } },

zipfile = None,

console=["myprogram.py"]

)

cx_Freeze

You can also use cx_Freeze, which is an alternative to py2exe (This is what I used in webGobbler).

cx_Freeze\FreezePython.exe --install-dir bin --target-name=myprogram.exe myprogram.py

cx_Freeze\FreezePython.exe --install-dir bin --target-name=myprogram.exe --base-binary=Win32GUI.exe myprogram.py



Tip for the console-less version:

print

and

try:

sys.stdout.write("

")

sys.stdout.flush()

except IOError:

class dummyStream:

''' dummyStream behaves like a stream but does nothing. '''

def __init__(self): pass

def write(self,data): pass

def read(self,data): pass

def flush(self): pass

def close(self): pass

# and now redirect all default streams to this dummyStream:

sys.stdout = dummyStream()

sys.stderr = dummyStream()

sys.stdin = dummyStream()

sys.__stdout__ = dummyStream()

sys.__stderr__ = dummyStream()

sys.__stdin__ = dummyStream()



print

pyInstaller

pyInstaller

McMillan Installer

python pyinstaller_1.1\Configure.py

python pyinstaller_1.1\Makespec.py myprogram.py myprogram.spec

python pyinstaller_1.1\Build.py myprogram.spec

\distmyprogram

--onefile will create a single EXE file. E.g.:

python pyinstaller_1.1\Makespec.py --onfile myprogram.py myprogram.spec Note that this EXE, when run, unpacks all files in a temporary directory, runs the unpacked program from there, then deletes all files when finished. You may or may not like this behaviour (I don't).

will create a single EXE file. E.g.: Note that this EXE, when run, unpacks all files in a temporary directory, runs the unpacked program from there, then deletes all files when finished. You may or may not like this behaviour (I don't). --noconsole allows the creation of pure Windows executables (with no console window).

python pyinstaller_1.1\Makespec.py --noconsole myprogram.py myprogram.spec

allows the creation of pure Windows executables (with no console window). --tk is a really nice option of pyInstaller which packs all necessary files for tkinter (tcl/tk).

Reading Windows registry

import _winreg

key = _winreg.OpenKey(_winreg.HKEY_CURRENT_USER, 'Software\\Microsoft\\Internet Explorer', 0, _winreg.KEY_READ)

(value, valuetype) = _winreg.QueryValueEx(key, 'Download Directory')

print value

print valuetype



valuetype

Measuring the performance of Python programs

Python is provided with a code profiling module: profile . It's rather easy to use.

For example, if you want to profile myfunction(), instead of calling it with:

myfunction()

you just have to do:

import profile

profile.run('myfunction()','myfunction.profile')

import pstats

pstats.Stats('myfunction.profile').sort_stats('time').print_stats()

This will display a report like this:

Thu Jul 03 15:20:26 2003 myfunction.profile



1822 function calls (1792 primitive calls) in 0.737 CPU seconds



Ordered by: internal time



ncalls tottime percall cumtime percall filename:lineno(function)

1 0.224 0.224 0.279 0.279 myprogram.py:512(compute)

10 0.078 0.008 0.078 0.008 myprogram.py:234(first)

1 0.077 0.077 0.502 0.502 myprogram.py:249(give_first)

1 0.051 0.051 0.051 0.051 myprogram.py:1315(give_last)

3 0.043 0.014 0.205 0.068 myprogram.py:107(sort)

1 0.039 0.039 0.039 0.039 myprogram.py:55(display)

139 0.034 0.000 0.106 0.001 myprogram.py:239(save)

139 0.030 0.000 0.072 0.001 myprogram.py:314(load)

...



This report tells you, for each function/method:

how many times it was called ( ncalls ).

). total time spent in function (minus time spent in sub-functions) ( tottime )

) total time spent in function (including time spent in sub-functions) ( cumtime )

) average time per call ( percall )

As you can see, the profile module displays the precise filename, line and function name. This is precious information and will help you to spot the slowest parts of your programs.

But don't try to optimize too early in development stage. This is evil ! :-)

Note that Python is also provided with a similar module named hotspot , which is more accurate but does not work well with threads.

Speed up your Python programs

To speedup your Python program, there's nothing like optimizing or redesigning your algorithms.

In case you think you can't do better, you can always use Psyco: Psyco is a Just-In-Time-like compiler for Python for Intel 80x86-compatible processors. It's very easy to use and provides x2 to x100 instant speed-up.

Download psyco for your Python version (http://psyco.sourceforge.net) unzip and copy the \psyco directory to your Python site-packages directory (should be something like c:\pythonXX\Lib\site-packages\psyco\ under Windows)

Then, put this at the beginning of your programs:

import psyco

psyco.full()

Or even better:

try:

import psyco

psyco.full()

except:

pass

This way, if psyco is installed, your program will run faster.

If psyco is not available, your program will run as usual.

(And if psyco is still not enough, you can rewrite the code which is too slow in C or C++ and wrap it with SWIG (http://swig.org).)



Note: Do not use Psyco when debugging, profiling or tracing your code. You may get innacurate results and strange behaviours.



Regular expressions are sometimes overkill

I helped someone on a forum who wanted process a text file: He wanted to extract the text following "Two words" in all lines starting whith these 2 word. He had started writing a regular expression for this: r = re.compile("Two\sword\s(.*?)") .

His problem was better solved with:

[...]

for line in file:

if line.startswith("Two words "):

print line[10:]

Regular expression are sometime overkill. They are not always the best choice, because:

They involve some overhead: You have to compile the regular expression ( re.compile() ). This means parsing the regular expression and transforming it into a state machine. This consumes CPU time. When using the regular expression, you run the state machine against the text, which make the state machine change state according to many rules. This is also eats CPU time.

Regular expression are not failsafe: they can fail sometimes on specific input. You may get a " maximum recusion limit exceeded " exception. This means that you should also enclose all match() , search() and findall() methods in try/except blocks.

" exception. This means that you should also enclose all , and methods in blocks. The Zen of Python ( import this :-) says «Readability counts». That's a good thing. And regular expression quickly become difficult to read, debug and change.

Besides, string methods like find() , rfind() or startwith() are very fast, much faster than regular expressions.

Do not try to use regular expressions everywhere. Often a bunch of string operations will do the job faster.







Executing another Python program

exec("anotherprogram.py")

Bayesian filtering

Bayesian filtering is the last buzz-word of spam fighting. And it works very well indeed !

Reverend is a free Bayesian module for Python. You can download it from http://divmod.org/trac/wiki/DivmodReverend

Here's an example: Recognizing the language of a text.

First, train it on a few sentences:

from reverend.thomas import Bayes

guesser = Bayes()

guesser.train('french','La souris est rentrée dans son trou.')

guesser.train('english','my tailor is rich.')

guesser.train('french','Je ne sais pas si je viendrai demain.')

guesser.train('english','I do not plan to update my website soon.')

And now let it guess the language:

>>> print guesser.guess('Jumping out of cliffs it not a good idea.')

[('english', 0.99990000000000001), ('french', 9.9999999999988987e-005)]

The bayesian filter says: "It's english, with a 99,99% probability."

Let's try another one:

>>> print guesser.guess('Demain il fera très probablement chaud.')

[('french', 0.99990000000000001), ('english', 9.9999999999988987e-005)]

It says: "It's french, with a 99,99% probability."

Not bad, isn't it ?

You can train it on even more languages at the same time. You can also train it to classify any kind of text.







Tkinter and cx_Freeze

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-

import Tkinter



class myApplication:



def __init__(self,root):

self.root = root

self.initializeGui()



def initializeGui(self):

Tkinter.Label(self.root,text="Hello, world").grid(column=0,row=0)



def main():

root = Tkinter.Tk()

root.title('My application')

app = myApplication(root)

root.mainloop()



if __name__ == "__main__":

main()





FreezePython.exe --install-dir bin --target-name=test.exe test.py

The dynamic link library tk84.dll could not be found in the specified path [...]

FreezePython.exe --install-dir bin --target-name=test.exe test.py

copy C:\Python24\DLLs\tcl84.dll .\bin\

copy C:\Python24\DLLs\tk84.dll .\bin\

Traceback (most recent call last):

File "cx_Freeze\initscripts\console.py", line 26, in ?

exec code in m.__dict__

File "test.py", line 20, in ?

File "test.py", line 14, in main

File "C:\Python24\Lib\lib-tk\Tkinter.py", line 1569, in __init__

_tkinter.TclError: Can't find a usable init.tcl in the following directories:

[...]

cx_Freeze\FreezePython.exe --install-dir bin --target-name=test.exe test.py

copy C:\Python24\DLLs\tcl84.dll .\bin\

copy C:\Python24\DLLs\tk84.dll .\bin\

xcopy /S /I /Y "C:\Python24\tcl\tcl8.4\*.*" "bin\libtcltk84\tcl8.4"

xcopy /S /I /Y "C:\Python24\tcl\tk8.4\*.*" "bin\libtcltk84\tk8.4"

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-



import os, os.path

# Take the tcl/tk library from local subdirectory if available.

if os.path.isdir('libtcltk84'):

os.environ['TCL_LIBRARY'] = 'libtcltk84\\tcl8.4'

os.environ['TK_LIBRARY'] = 'libtcltk84\\tk8.4'



import Tkinter



class myApplication:



def __init__(self,root):

self.root = root

self.initializeGui()



def initializeGui(self):

Tkinter.Label(self.root,text="Hello, world").grid(column=0,row=0)



def main():

root = Tkinter.Tk()

root.title('My application')

app = myApplication(root)

root.mainloop()



if __name__ == "__main__":

main()

Possible improvement:



You surely could get rid of some tcl/tk script you don't need. Example: bin\libtcltk84\tk8.4\demos (around 500 kb) are only tk demonstrations. They are not necessary.

This depends on which features of Tkinter your program will use.

(cx_Freeze and - AFAIK - all other packagers are not capable of resolving tcl/tk dependencies.)







A few Tkinter tips

import Tkinter



class myApplication: #1

def __init__(self,root):

self.root = root #2

self.initialisation() #3



def initialisation(self): #3

Tkinter.Label(self.root,text="Hello, world !").grid(column=0,row=0) #4



def main(): #5

root = Tkinter.Tk()

root.title('My application')

app = myApplication(root)

root.mainloop()



if __name__ == "__main__":

main()

.pack()

grid()

.pack()

.grid()

main()

Tkinter file dialogs

import Tkinter

import tkFileDialog



root = Tkinter.Tk()

directory = tkFileDialog.askdirectory(parent=root,initialdir="/",title='Please select a directory')

if len(directory) > 0:

print "You chose directory %s" % directory

askopenfile

file

import Tkinter

import tkFileDialog



root = Tkinter.Tk()

file = tkFileDialog.askopenfile(parent=root,mode='rb',title='Please select a file')

if file != None:

data = file.read()

file.close()

print "I got %d bytes from the file." % len(data)

import Tkinter

import tkFileDialog



myFormats = [

('Windows Bitmap','*.bmp'),

('Portable Network Graphics','*.png'),

('JPEG / JFIF','*.jpg'),

('CompuServer GIF','*.gif'),

]



root = Tkinter.Tk()

filename = tkFileDialog.asksaveasfilename(parent=root,filetypes=myFormats,title="Save image as...")

if len(filename) > 0:

print "Now saving as %s" % (filename)

Including binaries in your sources

import base64,zlib

data = open('myimage.gif','rb').read()

print base64.encodestring(zlib.compress(data))

import base64,zlib

myFile = zlib.decompress(base64.decodestring("""

eJxz93SzsExUZlBn2MzA8P///zNnzvz79+/IgUMTJ05cu2aNaBmDzhIGHj7u58+fO11ksLO3Kyou

ikqIEvLkcYyxV/zJwsgABDogAmQGA8t/gROejlpLMuau+j+1QdQxk20xwzqhslmHH5/xC94Q58ST

72nRllBw7cUDHZYbL8VtLOYbP/b6LhXB7tAcfPCpHA/fSvcJb1jZWB9c2/3XLmQ+03mZBBP+GOak

/AAZGXPL1BJe39jqjoqEAhFr1fBi1dao9g4Ovjo+lh6GFDVWJqbisLKoCq5p1X5s/Jw9IenrFvUz

+mRXTeviY+4p2sKUflA1cjkX37TKWYwFzRpFYeqTs2fOqEuwXsfgOeGCfmZ57MP4WSpaZ0vSJy97

WPeY5ca8F1sYI5f5r2bjec+67nmaTcarm7+Z0hgY2Z7++fpCzHmBQCrPF94dAi/jj1oZt8R4qxsy

6liJX/UVyLjwoHFxFK/VMWbN90rNrLKMGQ7iQSc7mXgTkpwPXVp0mlWz/JVC4NK0s0zcDWkcFxxF

mrvdlBdOnBySvtNvq8SBFZo8rF2MvAIMoZoPmZrZPj2buEDr2isXi0V8egpelyUvbXNc7yVQkKgS

sM7g0KOr7kq3WRIkitSuRj1VXbSk8v4zh8fljqtOhyobP91izvh0c2hwqKz3jPaHhvMMXVQspYq8

aiV9ivkmHri5u2NH8fvPpVWuK65I3OMUX+f4Lee+3Hmfux96Vq5RVqxTN38YeK3wRbVz5v06FSYG

awWFgMzkktKiVIXkotTEktQUhaRKheDUpMTikszUPIVgx9AwR3dXBZvi1KTixNKyxPRUhcQSBSRe

Sn6JQl5qiZ2CrkJGSUmBlb4+QlIPKKGgAADBbgMp"""))



print "I have a file of %d bytes." % len(myFile)

import Image,StringIO

myimage = Image.open(StringIO.StringIO(myFile))

myimage.show()

Good practice: try/except non-standard import statements

Ease their pain with a simple try/except statement which tells the module name (which is not always the same name as stated in the import statement) and where to get it.

Example:

try:

import win32com.client

except ImportError:

raise ImportError, 'This program requires the win32all extensions for Python. See http://starship.python.net/crew/mhammond/win32/'

Good practice: Readable objects

class client:

def __init__(self,number,name):

self.number = number

self.name = name

my_client = client(5,"Smith")

print my_client

<__main__.client instance at 0x007D0E40>

class client:

def __init__(self,number,name):

self.number = number

self.name = name

def __repr__(self):

return '<client id="%s" name="%s">' % (self.number, self.name)

my_client = client(5,"Smith")

print my_client

<client id="5" nom="Dupont">

class directory:



def __init__(self):

self.clients = []



def addClient(self, client):

self.clients.append(client)



def __repr__(self):

lines = []

lines.append("<directory>")

for client in self.clients:

lines.append(" "+repr(client))

lines.append("</directory>")

return "

".join(lignes)

my_directory = directory()

my_directory.addClient( client(5,"Smith") )

my_directory.addClient( client(12,"Doe") )



print my_directory

<directory>

<client id="5" name="Smith">

<client id="12" name="Doe">

</directory>

except

try/except

Good practice: No blank-check read()

.read()

# Read from a file:

file = open("a_file.dat","rb")

data = file.read()

file.close()



# Read from an URL:

import urllib

url = urllib.urlopen("http://sebsauvage.net")

html = url.read()

url.close()

You should always bound your read().

# Read from a file:

file = open("a_file.dat","rb")

data = file .read(10000000)

file.close()



# Read from an URL:

import urllib

url = urllib.urlopen("http://sebsauvage.net")

html = url .read(200000)

url.close()

also

1.7 is different than 1.7 ?

Never confuse data and it's representation on screen.



textual representation

binary data stored in computer's memory

>>> import datetime

>>> print datetime.datetime.now()

2006-03-21 15:23:20.904000

>>>

NOT

textual representation

print

a = 1.7

b = 0.9 + 0.8 # This should be 1.7



print a

print b



if a == b:

print "a and b are equal."

else:

print "a and b are different !"

a and b are equal ?

1.7

1.7

a and b are different !

textual representation

almost

are

if abs(a-b) < 0.00001 :

print "a and b are equal."

else:

print "a and b are different !"

if str(a) == str(b) :

print "a and b are equal."

else:

print "a and b are different !"

a=1.7

a

not contain 1.7

binary approximation of the decimal number 1.7.

Get user's home directory path

import os.path

print os.path.expanduser('~')

Python's virtual machine

virtual machine

bytecode

machine language

simulates a microprocessor

mymodule.py

def myfunction(a):

print "I have ",a

b = a * 3

if b<50:

b = b + 77

return b

C:\>python

Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import mymodule

>>> print mymodule.myfunction(5)

I have 5

92

>>>

mymodule.pyc

>>> import dis

>>> dis.dis(mymodule.myfunction)

2 0 LOAD_CONST 1 ('I have')

3 PRINT_ITEM

4 LOAD_FAST 0 (a)

7 PRINT_ITEM

8 PRINT_NEWLINE



3 9 LOAD_FAST 0 (a)

12 LOAD_CONST 2 (3)

15 BINARY_MULTIPLY

16 STORE_FAST 1 (b)



4 19 LOAD_FAST 1 (b)

22 LOAD_CONST 3 (50)

25 COMPARE_OP 0 (<)

28 JUMP_IF_FALSE 14 (to 45)

31 POP_TOP



5 32 LOAD_FAST 1 (b)

35 LOAD_CONST 4 (77)

38 BINARY_ADD

39 STORE_FAST 1 (b)

42 JUMP_FORWARD 1 (to 46)

>> 45 POP_TOP



6 >> 46 LOAD_FAST 1 (b)

49 RETURN_VALUE

>>>



LOAD_CONST, PRINT_ITEM, COMPARE_OP

0

1

b = a * 3

3 9 LOAD_FAST 0 (a) # Load variable a on the stack.

12 LOAD_CONST 2 (3) # Load the value 3 on the stack

15 BINARY_MULTIPLY # Multiply them

16 STORE_FAST 1 (b) # Store result in variable b

SQLite - databases made simple

tremendous

mean

Not designed for concurrent access (database-wide lock on writing).

Only works locally (no network service, although you can use things like sqlrelay).

Does not handle foreign keys.

No rights management (grant/revoke).

Advantages:

very fast (faster than mySQL on most operations).

fast (faster than mySQL on most operations). Respects almost the whole SQL-92 standard.

Does not require installation of a service.

No database administration to perform.

Does not eat computer memory and CPU when not in use.

SQLite databases are compact

1 database = 1 file (easy to move/deploy/backup/transfer/email).

SQLite databases are portable across platforms (Windows, MacOS, Linux, PDA...)

SQLite is ACID (data consistency is assured even on computer failure or crash)

Supports transactions

Fields can store Nulls, integers, reals (floats), text or blob (binary data).

Can handle up to 2 Tera-bytes of data (although going over 12 Gb is not recommended).

Can work as a in-memory database (blazing performances !)

free

public domain

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-

from sqlite3 import dbapi2 as sqlite



# Create a database:

con = sqlite.connect('mydatabase.db3')

cur = con.cursor()



# Create a table:

cur.execute('create table clients (id INT PRIMARY KEY, name CHAR(60))')



# Insert a single line:

client = (5,"John Smith")

cur.execute("insert into clients (id, name) values (?, ?)", client )

con.commit()



# Insert several lines at once:

clients = [ (7,"Ella Fitzgerald"),

(8,"Louis Armstrong"),

(9,"Miles Davis")

]

cur.executemany("insert into clients (id, name) values (?, ?)", clients )

con.commit()



cur.close()

con.close()

Now let's use the database:

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-

from sqlite3 import dbapi2 as sqlite



# Connect to an existing database

con = sqlite.connect('mydatabase.db3')

cur = con.cursor()



# Get row by row

print "Row by row:"

cur.execute('select id, name from clients order by name;')

row = cur.fetchone()

while row:

print row

row = cur.fetchone()



# Get all rows at once:

print "All rows at once:"

cur.execute('select id, name from clients order by name;')

print cur.fetchall()



cur.close()

con.close()

Row by row:

(7, u'Ella Fitzgerald')

(5, u'John Smith')

(8, u'Louis Armstrong')

(9, u'Miles Davis')

All rows at once:

[(7, u'Ella Fitzgerald'), (5, u'John Smith'), (8, u'Louis Armstrong'), (9, u'Miles Davis')]



sqlite.connect()

SQLiteSpy

Hint 1:

sqlite.connect(':memory:')

very

Hint 2:

and

try:

from sqlite3 import dbapi2 as sqlite # For Python 2.5

except ImportError:

pass



if not sqlite:

try:

from pysqlite2 import dbapi2 as sqlite # For Python 2.4 and pySqlLite

except ImportError:

pass



if not sqlite: # If module not imported successfully, raise an error.

raise ImportError, "This module requires either: Python 2.5 or Python 2.4 with the pySqlLite module (http://initd.org/tracker/pysqlite)"



# Then use it

con = sqlite.connect("mydatabase.db3")

...



pySQLite homepage: http://initd.org/tracker/pysqlite

SQLite homepage (usefull information on the database engine itself): http://www.sqlite.org/

Dive into Python

Dive into Pyhon

...now !

Creating a mutex under Windows

webGobbler

InnoSetup

CTYPES_AVAILABLE = True

try:

import ctypes

except ImportError:

CTYPES_AVAILABLE = False



WEBGOBBLER_MUTEX = None

if CTYPES_AVAILABLE and sys.platform=="win32":

try:

WEBGOBBLER_MUTEX=ctypes.windll.kernel32.CreateMutexA(None,False,"sebsauvage_net_webGobbler_running")

except:

pass



except:pass

urllib2 and proxies

urllib2

# The proxy address and port:

proxy_info = { 'host' : 'proxy.myisp.com',

'port' : 3128

}



# We create a handler for the proxy

proxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})



# We create an opener which uses this handler:

opener = urllib2.build_opener(proxy_support)



# Then we install this opener as the default opener for urllib2:

urllib2.install_opener(opener)



# Now we can send our HTTP request:

htmlpage = urllib2.urlopen("http://sebsauvage.net/").read(200000)



whole

proxy_info = { 'host' : 'proxy.myisp.com',

'port' : 3128,

'user' : 'John Doe',

'pass' : 'mysecret007'

}

proxy_support = urllib2.ProxyHandler({"http" : "http://%(user)s:%(pass)s@%(host)s:%(port)d" % proxy_info})

opener = urllib2.build_opener(proxy_support)

urllib2.install_opener(opener)

htmlpage = urllib2.urlopen("http://sebsauvage.net/").read(200000)



(Code in this snippet was heavily inspired from http://groups.google.com/groups?selm=mailman.983901970.11969.python-list%40python.org )

Basic

Digest

NTLM

import os

os.environ['HTTP_PROXY'] = 'http://proxy.myisp.com:3128'



os.environ['FTP_PROXY']

A proper User-agent in your HTTP requests

Python-urllib/1.16

request_headers = { 'User-Agent': 'PeekABoo/1.3.7' }

request = urllib2.Request('http://sebsauvage.net', None, request_headers)

urlfile = urllib2.urlopen(request)



Make sure the program name you use in User-Agent is really unique (Search on Google !).

Adopt the form: applicationName/version , such as webGobbler/1.2.4 .

, such as . If your program spiders websites, you should respect robot rules.

Always use bound reads. (eg. .read(200000) , not .read() alone).

, not alone). Choose the network timeout wisely. You can use the following code to set the timeout in your whole program:

socket.setdefaulttimeout(60) # A 60 seconds timeout.



Error handling with urllib2

try:

urlfile = urllib2.urlopen('http://sebsauvage.net/nonexistingpage.html')

except urllib2.HTTPError , exc:

if exc.code == 404:

print "Not found !"

else:

print "HTTP request failed with error %d (%s)" % (exc.code, exc.msg)

except urllib2.URLError , exc:

print "Failed because:", exc.reason



urllib2: What am I getting ?

type of data

Content-type

urlfile = urllib2.urlopen('http://www.commentcamarche.net/')

print "Document type is", urlfile .info().getheader("Content-Type","")



Document type is text/html

Warning

after

Document type is text/html; charset=iso-8859-1

print "Document type is", urlfile .info().getheader("Content-Type","").split(';')[0].strip()

.info()

HTTP response headers

print "HTTP Response headers:"

print urlfile.info()

Document type is Date: Thu, 23 Mar 2006 15:13:29 GMT

Content-Type: text/html; charset=iso-8859-1

Server: Apache

X-Powered-By: PHP/5.1.2-1.dotdeb.2

Connection: close

Reading (and writing) large XLS (Excel) files

import os

import win32com.client



filename = 'myfile.xls'

filepath = os.path.abspath(filename) # Always make sure you use an absolute path !



# Start Excel and open the XLS file:

excel = win32com.client.Dispatch('Excel.Application')

excel.Visible = True

workbook = excel.Workbooks.Open(filepath)



# Save as CSV:

xlCSVWindows =0x17 # from enum XlFileFormat

workbook.SaveAs(Filename=filepath+".csv",FileFormat=xlCSVWindows)



# Close workbook and Excel

workbook.Close(SaveChanges=False)

excel.Quit()



Hint:

much

Hint:

excel.Workbooks.Open()

os.path.abspath()

Hint:

Hint:

Hint:

Hint:

Run makepy.py (eg. C:\Python24\Lib\site-packages\win32com\client\makepy.py) In the list, choose " Microsoft Excel 9.0 Object Library (1.3) " (or similar) and click ok. Have a look in C:\Python24\Lib\site-packages\win32com\gen_py\ directory.

You will find the wrapper (such as 00020813-0000-0000-C000-000000000046x0x1x3.py) Open this file: it contains Excel constants and their values (You can copy/paste them in your code.)

For example:

xlCSVMSDOS =0x18 # from enum XlFileFormat

xlCSVWindows =0x17 # from enum XlFileFormat



Hint:

into

Saving the stack trace

import traceback



def fifths(a):

return 5/a



def myfunction(value):

b = fifths(value) * 100



try:

print myfunction(0)

except Exception, ex:

logfile = open('mylog.log','a')

traceback.print_exc(file=logfile)

logfile.close()

print "Oops ! Something went wrong. Please look in the log file."





mylog.log

Traceback (most recent call last):

File "a.py", line 10, in ?

print myfunction(0)

File "a.py", line 7, in myfunction

b = fifths(value) * 100

File "a.py", line 4, in fifths

return 5/a

ZeroDivisionError: integer division or modulo by zero





Hint:

traceback.print_exc(file=sys.stdout)

Hint:

Filtering out warnings

usefull

should be taken care of

import warnings

warnings.filterwarnings(action = 'ignore',message='.*?no locals\(\) in functions bound by Psyco')



message

too much

Saving an image as progressive JPEG with PIL

webGobbler

Image

myimage.save('myimage.jpg',option={'progression':True,'quality':60,'optimize':True})



myimage

Image

Charsets and encoding

( There is a french translation of this article: http://sebsauvage.net/python/charsets_et_encoding.html )

Charsets and encoding



nothing

binary representation

The character set

First

Symbol → number



number

symbol

Unicode table

0000 to 007F (0 to 127)

(Latin characters) Unicode table

0080 to 00FF (128 to 255)

(Latin characters,

including accented characters) Unicode table

0900 to 097F (2304 to 2431)

(devanagari) Unicode table

1100 to 117F (4352 to 4479)

(hangul jamo)

bébé

baby

98 233 98 233

The encoding

Number → Bits



8 bits are not enough.

Unicode

UTF-8

Unicode value

(in hexadecimal) Bits to output 00000000 to 0000007F 0xxxxxxx 00000080 to 000007FF 110xxxxx 10xxxxxx 00000800 to 0000FFFF 1110xxxx 10xxxxxx 10xxxxxx 00010000 to 001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 00200000 to 03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 04000000 to 7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

any

Let's sum up all this

Symbol → Number → Bits

charset

encoding



charset

encoding

é → 233 → C3 A9



in Unicode

in UTF-8

baby

bébé → 98 233 98 233 → 62 C3 A9 62 C3 A9



in Unicode

in UTF-8

62 C3 A9 62 C3 A9

encoding

charset

Why am I getting those strange characters ?

Transmitting a text alone is useless.

If you transmit a text, you must always also tell which charset/encoding was used.









guess

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This is the same for emails: Any good email client will indicate which charset/encoding the text is encoded in.





Python and Unicode

Use them as much as possible.

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-

u

badString = "Bad string !"

bestString = u"Good unicode string."

anotherGoodString = u"Ma vie, mon \u0153uvre."

latin-1

myUnicodeString = unicode(mystring)



myUnicodeString = mystring.decode('iso-8859-1')



myString = myUnicodeString.encode('iso-8859-1')

print

can fail

A simple print instruction can fail.



>>> a = u'\u0153uvre'

>>> print a

Traceback (most recent call last):

File "<stdin>", line 1, in ?

File "c:\python24\lib\encodings\cp437.py", line 18, in encode

return codecs.charmap_encode(input,errors,encoding_map)

UnicodeEncodeError: 'charmap' codec can't encode character u'\u0153' in position 0: character maps to <undefined>

>>> import sys

>>> print sys.stdout.encoding

cp437

>>> import sys

>>> a = u'\u0153uvre'

>>> print a.encode(sys.stdout.encoding,'replace')

?uvre

>>>

Special note:

>>> a = u'\u0153uvre'

>>> file = open('myfile.txt','w')

>>> file.write( a.encode('utf-8') )

>>> file.close()

>>> file = open('myfile.txt','r')

>>> print file.read()

┼ôuvre

>>>

>>> file=open('myfile.txt','r')

>>> print repr( file.read().decode('utf-8') )

u'\u0153uvre'

>>>

repr()

>>> import sys

>>> file=open('myfile.txt','r')

>>> print file.read().decode('utf-8').encode(sys.stdout.encoding,'replace')

?uvre

>>>

3 modes

UTF-8 → Unicode → cp437 The input file. .decode('utf-8')

The Python unicode string. .encode('cp437') The console.

explicitely

Iterating

A shorter syntax

countries = ['France','Germany','Belgium','Spain']

for i in range(0,len(countries)):

print countries[i]

countries = ['France','Germany','Belgium','Spain']

i = 0

while i<len(countries):

print countries[i]

i = i+1

countries = ['France','Germany','Belgium','Spain']

for country in countries:

print country

You've spared a variable (i).

The code is more compact.

It's more readable.

for country in countries

file = open('file.txt','r')

for line in file.readlines():

print line

file.close()

file = open('file.txt','r')

for line in file:

print line

file.close()

shorter

more readable

Iterating with multiple items

data = [ ('France',523,'Jean Dupont'),

('Germany',114,'Wolf Spietzer'),

('Belgium',227,'Serge Ressant')

]



for (country,nbclients,manager) in data:

print manager,'manages',nbclients,'clients in',country

data = { 'France':523, 'Germany':114, 'Belgium':227 }

for country in data: # This is the same as for country in data.keys()

print 'We have',data[country],'clients in',country

data = { 'France':523, 'Germany':114, 'Belgium':227 }

for (country,nbclients) in data.items():

print 'We have',nbclients,'clients in',country

Creating iterators

COUNTRY NBCLIENTS

France 523

Germany 114

Spain 127

Belgium 227

clientFileReader

class clientFileReader:



def __init__(self,filename):

self.file=open(filename,'r')

self.file.readline() # We discard the first line.



def close(self):

self.file.close()



def __iter__(self):

return self



def next(self):

line = self.file.readline()

if not line:

raise StopIteration()

return ( line[:13], int(line[13:]) )

Create a __iter__() method which returns the iterator (which happen to be ourselves !)

method which returns the iterator (which happen to be ourselves !) The iterator must have a next() method which returns the next item.

method which returns the next item. The next() method must raise the StopIteration() exception when no more data is available.

clientFile = clientFileReader('file.txt')



for (country,nbclients) in clientFile:

print 'We have',nbclients,'clients in',country



clientFile.close()

for (country,nbclients) in clientFile:

Parsing the command-line

sys.argv

Parsing the command-line is not as trivial as it seems to be.

getopt

optparse

optparse

getopt

reverses all lines in a text file

a mandatory argument: file , the file to process.

, the file to process. an optional parameters with value: -o to specify an output file (such as -o myoutputfile.txt )

to specify an output file (such as ) an optional parameter without value: -c to capitalize all letters.

value: to capitalize all letters. an optional parameters: -h to display program help.

getopt

getopt

import sys

import getopt



if __name__ == "__main__":



opts, args = None, None

try:

opts, args = getopt.getopt(sys.argv[1:], "hco:",["help", "capitalize","output="])

except getopt.GetoptError, e:

raise 'Unknown argument "%s" in command-line.' % e.opt



for option, value in opts:

if option in ('-h','--help'):

print 'You asked for the program help.'

sys.exit(0)

if option in ('-c','--capitalize'):

print "You used the --capitalize option !"

elif option in ('-o','--output'):

print "You used the --output option with value",value



# Make sure we have our mandatory argument (file)

if len(args) != 1:

print 'You must specify one file to process. Use -h for help.'

sys.exit(1)



print "The file to process is",args[0]



# The rest of the code goes here...

The getopt.getopt() will parse the command-line:

will parse the command-line: sys.argv[1:] skips the program name itself (which is sys.argv[0] ) "hco:" give the list of possible options ( -h , -c and -o ). The colon ( : ) tells that -o requires a value. ["help", "capitalize","output="] allows the user to use the long options version (--help/--capitalize/--output).

User can even be mix short and long options in the command-line, such as: reverse --capitalise -o output.txt myfile.txt

The for loop will check all options.

loop will check all options. It's typically in this loop that you will modify your program options according to command-line options. The --help will display the help page and exit ( sys.exit(0) ).

The if len(args)!=1 is used to make sure our mandatory argument ( file ) is provided. You can choose to allow (or not) several arguments.

C:\>python reverse.py -c -o output.txt myfile.txt

You used the --capitalize option !

You used the --output option with value output.txt

The file to process is myfile.txt

C:\>python reverse.py -h

You asked for the program help.

optparse

import sys

import optparse



if __name__ == "__main__":



parser = optparse.OptionParser()

parser.add_option("-c","--capitalize",action="store_true",dest="capitalize")

parser.add_option("-o","--output",action="store",type="string",dest="outputFilename")



(options, args) = parser.parse_args()



if options.capitalize:

print "You used the --capitalize option !"



if options.outputFilename:

print "You used the --output option with value",options.outputFilename



# Make sure we have our mandatory argument (file)

if len(args) != 1:

print 'You must specify one file to process. Use -h for help.'

sys.exit(1)



print "The file to process is",args[0]



# The rest of the code goes here...

You first create a parser ( optparse.OptionParser() ), add options to this parser ( parser.add_option(...) ) then ask him to parse the command-line ( parser.parse_args() ).

), add options to this parser ( ) then ask him to parse the command-line ( ). Option -c does not take a value. We merely record the presence of -c with action="store_true" .

dest="capitalize" will store this option in the attribute capitalize of our parser. For -o , we specify a string to store in the outputFilename attribute of our parser.

We later simply access our options through options.capitalize and options.outputFilename . No loop.

and . No loop. args still gives us our file argument.

C:\>python reverse2.py -c -o output.txt myfile.txt

You used the --capitalize option !

You used the --output option with value output.txt

The file to process is myfile.txt

C:\>python reverse2.py -h

usage: reverse2.py [options]



options:

-h, --help show this help message and exit

-c, --capitalize

-o OUTPUTFILENAME, --output=OUTPUTFILENAME

--help

help

parser.add_option("-c","--capitalize",action="store_true",dest="capitalize", help="Capitalize all letters" )

parser.add_option("-o","--output",action="store",type="string",dest="outputFilename", help="Write output to a file" )

C:\>python reverse2.py -h

usage: reverse2.py [options]



options:

-h, --help show this help message and exit

-c, --capitalize Capitalize all letters

-o OUTPUTFILENAME, --output=OUTPUTFILENAME

Write output to a file

Using AutoIt from Python

import win32com.client



autoit = win32com.client.Dispatch("AutoItX3.Control")

autoit.Run("notepad.exe")

autoit.AutoItSetOption("WinTitleMatchMode", 4)

autoit.WinWait("classname=Notepad")

autoit.send("Hello, world.")

class

not

title

regsvr32 AutoItX3.dll

import os



# Import the Win32 COM client

try:

import win32com.client

except ImportError:

raise ImportError, 'This program requires the pywin32 extensions for Python. See http://starship.python.net/crew/mhammond/win32/'



import pywintypes # to handle COM errors.



# Import AutoIT (first try)

autoit = None

try:

autoit = win32com.client.Dispatch("AutoItX3.Control")

except pywintypes.com_error:

# If can't instanciate, try to register COM control again:

os.system("regsvr32 /s AutoItX3.dll")



# Import AutoIT (second try if necessary)

if not autoit:

try:

autoit = win32com.client.Dispatch("AutoItX3.Control")

except pywintypes.com_error:

raise ImportError, "Could not instanciate AutoIT COM module because",e



if not autoit:

print "Could not instanciate AutoIT COM module."

sys.exit(1)



# Now we have AutoIT, let's start Notepad and write some text:

autoit.Run("notepad.exe")

autoit.AutoItSetOption("WinTitleMatchMode", 4)

autoit.WinWait("classname=Notepad")

autoit.send("Hello, world.")



What's in a main

if __name__ == "__main__":

executed directly: python mymodule.py

imported: import mymodule

What is under the

if __name__=="__main__"

will only be run if the module is run directly.







Parse the command-line in the main and call the methods/functions, so that the module can be used from the command line.

Run the unit tests (unittest) in the main, so that the module performs a self-test when run.

Run example code in the main (for example, for a tkinter widget).

Example: Parsing the command-line

import re



class linkextractor:

def __init__(self,htmlPage):

self.htmlcode = htmlPage

def getLinks(self):

linksList = re.findall('<a href=(.*?)>.*?</a>',self.htmlcode)

links = []

for link in linksList:

if link.startswith('"'): link=link[1:] # Remove quotes

if link.endswith('"'): link=link[:-1]

links.append(link)

return links



if __name__ == "__main__":

import sys,getopt

opts, args = getopt.getopt(sys.argv[1:],"")

if len(args) != 1:

print "You must specify a file to process."

sys.exit(1)

print "Linkextractor is processing %s..." % args[0]

file = open(args[0],"rb")

htmlpage = file.read(500000)

file.close()

le = linkextractor(htmlpage)

print le.getLinks()

The class linkextractor contains our program logic.

contains our program logic. The main only parses the command-line, reads the specified file and uses our linkextractor class to process it.

C:\>python linkextractor.py myPage.html

Linkextractor is processing myPage.html...

[...]

import linkextractor, urllib



htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)

le = linkextractor.linkextractor(htmlSource)

print le.getLinks()

Example: Running self-tests

import re, unittest



class linkextractor:

def __init__(self,htmlPage):

self.htmlcode = htmlPage

def getLinks(self):

linksList = re.findall('<a href=(.*?)>.*?</a>',self.htmlcode)

links = []

for link in linksList:

if link.startswith('"'): link=link[1:] # Remove quotes

if link.endswith('"'): link=link[:-1]

links.append(link)

return links



class _TestExtraction(unittest.TestCase):

def testLinksWithQuotes(self):

htmlcode = """<html><body>

Welcome to <a href="http://sebsauvage.net/">sebsauvage.net/</a><br>

How about some <a href="http://python.org">Python</a> ?</body></html>"""

le = linkextractor(htmlcode)

links = le.getLinks()

self.assertEqual(links[0], 'http://sebsauvage.net/',

'First link is %s. It should be http://sebsauvage.net/ without quotes.' % links[0])

self.assertEqual(links[1], 'http://python.org',

'Second link is %s. It should be http://python.org without quotes.' % links[1])



if __name__ == "__main__":

print "Performing self-tests..."

unittest.main()

C:\>python linkextractor.py

Performing self-tests...

.

----------------------------------------------------------------------

Ran 1 test in 0.000s



OK



C:\>



Dive into Python

Mixing both

If nothing provided in command-line (or a special --selftest option is provided), perform the self-test.

option is provided), perform the self-test. Otherwise perform what the user asked in command line.



Disable all javascript in a html page

html = html.replace('<script','<noscript')



import re

re_noscript = re.compile('<(/?)script',re.IGNORECASE)

html = re_noscript.sub(r'<\1noscript',html)



<noscript>

Multiplying

>>> 3*'a'

'aaa'



>>> 3*'hello'

'hellohellohello'



>>> 3*('hello')

'hellohellohello'



>>> 3*('hello',)

('hello', 'hello', 'hello')



>>> 3*['hello']

['hello', 'hello', 'hello']



>>> 3*('hello','world')

('hello', 'world', 'hello', 'world', 'hello', 'world')

('hello')

string

('hello',)

tuple

>>> print 3*'a' + 2*'b'

aaabb



>>> print 3*('a',) + 2*('b',)

('a', 'a', 'a', 'b', 'b')



>>> print 3*['a'] + 2*['b']

['a', 'a', 'a', 'b', 'b']

Creating and reading .tar.bz2 archives

import tarfile

import bz2

archive = tarfile.open('myarchive.tar.bz2','w:bz2')

archive.debug = 1 # Display the files beeing compressed.

archive.add(r'd:\myfiles') # d:\myfiles contains the files to compress

archive.close()

import tarfile

import bz2

archive = tarfile.open('myarchive.tar.bz2','r:bz2')

archive.debug = 1 # Display the files beeing decompressed.

for tarinfo in archive:

archive.extract(tarinfo, r'd:\mydirectory') # d:\mydirectory is where I want to uncompress the files.

archive.close()



Enumerating

enumerate()

>>> for i in enumerate( ['abc','def','ghi','jkl'] ):

... print i

...

(0, 'abc')

(1, 'def')

(2, 'ghi')

(3, 'jkl')

>>>

>>> for i in enumerate('hello world'):

... print i

...

(0, 'h')

(1, 'e')

(2, 'l')

(3, 'l')

(4, 'o')

(5, ' ')

(6, 'w')

(7, 'o')

(8, 'r')

(9, 'l')

(10, 'd')

>>>

Zip that thing

zip

,

map

filter

List comprehension

>>> mylist = (1,3,5,7,9)

>>> print [value*2 for value in mylist]

[2, 6, 10, 14, 18]

compute value*2 for each value in my list

>>> mylist = (1,3,5,7,9)

>>> print [i*2 for i in mylist if i>4 ]

[10, 14, 18]

zip

,

map

filter

zip

zip

>>> print zip( ['a','b','c'], [1,2,3] )

[('a', 1), ('b', 2), ('c', 3)]



multiple

>>> print zip( ['a','b','c'], [1,2,3], ['U','V','W'] )

[('a', 1, 'U'), ('b', 2, 'V'), ('c', 3, 'W')]



>>> print zip('abcd','1234')

[('a', '1'), ('b', '2'), ('c', '3'), ('d', '4')]



>>> print zip( [1,2,3,4,5], ['a','b'] )

[(1, 'a'), (2, 'b')]



map

map

>>> print map(abs, [-5,7,-12] )

[5, 7, 12]

>>> print [abs(i) for i in [-5,7,-12]]

[5, 7, 12]

map

>>> def myfunction(value):

... return value*10+1

...

>>> print map(myfunction, [1,2,3,4] )

[11, 21, 31, 41]

>>>

several

max()

>>> print map(max, [4,5,6], [1,2,9] )

[4, 5, 9]

>>> [ max(4,1), max(5,2), max(6,9) ]

[4, 5, 9]

filter

filter

map

None

None

None

>>> print filter(abs, [-5,7,0,-12] )

[-5, 7, -12]

>>> print [i for i in [-5,7,0,-12] if abs(i)]

[-5, 7, -12]

filter

So... map/filter or list comprehension ?

always

>>> print [abs(i+5) for i in [-5,7,0,-12] if i<5]

[0, 5, 7]

filter

maps

lambda

>>> map( lambda x:abs(x+5), filter(lambda x:x<5 ,[-5,7,0,-12]) )

[0, 5, 7]

faster

reduce

>>> def myfunction(a,b):

... return a*b

...

>>> mylist = [1,2,3,4,5]

>>> print reduce(myfunction, mylist)

120

>>>print ((((1*2)*3)*4)*5)

120

operator

>>> import operator

>>> mylist = [1,2,3,4,5]

>>> print reduce(operator.mul, mylist)

120

>>> print reduce(operator.add, mylist)

15



(Reduce hint is taken from http://jaynes.colorado.edu/PythonIdioms.html#operator )

Conversions

>>> mytuple = (1,2,3)

>>> print list(mytuple) # Tuple to list

[1, 2, 3]

>>>

>>> mylist = [1,2,3] # List to tuple

>>> print tuple(mylist)

(1, 2, 3)

>>>

>>> mylist2 = [ ('blue',5), ('red',3), ('yellow',7) ]

>>> print dict(mylist2) # List to dictionnary

{'blue': 5, 'yellow': 7, 'red': 3}

>>>

>>> mystring = 'hello'

>>> print list(mystring) # String to list

['h', 'e', 'l', 'l', 'o']

>>>

>>> mylist3 = ['w','or','ld']

>>> print ''.join(mylist3) # List to string

world

>>>

sequences

string

list

>>> mystring = 'hello'

>>> for character in list(mystring): # This is BAD . Don't do this.

... print character

...

h

e

l

l

o

>>> for character in mystring: # Simply do that !

... print character

...

h

e

l

l

o

>>>

sequence

lists

>>> print [i+'*' for i in 'Hello']

['H*', 'e*', 'l*', 'l*', 'o*']

>>> print max('Hello, world !')

w

max()

do not have

string

list

A Tkinter widgets which expands in grid

pack()

grid()

Grid

Pack

never ever

(expand=1,fill=BOTH)

pack()

When using grid() , specify sticky (usually 'NSEW')

, specify (usually 'NSEW') Then use grid_columnconfigure() and grid_rowconfigure() to set the weights (usually 1).

import Tkinter



class myApplication:

def __init__(self,root):

self.root = root

self.initialisation()



def initialisation(self):

canvas1 = Tkinter.Canvas(self.root)

canvas1.config(background="red")

canvas1.grid(row=0,column=0, sticky='NSEW' )



canvas2 = Tkinter.Canvas(self.root)

canvas2.config(background="blue")

canvas2.grid(row=1,column=0, sticky='NSEW' )



self.root. grid_columnconfigure (0,weight=1)

self.root. grid_rowconfigure (0,weight=1)

self.root. grid_rowconfigure (1,weight=1)



def main():

root = Tkinter.Tk()

root.title('My application')

app = myApplication(root)

root.mainloop()



if __name__ == "__main__":

main()

grid_columnconfigure

grid_rowconfigure

self.root.grid_rowconfigure(0,weight=1)

self.root.grid_rowconfigure(1, weight=2 )



Convert a string date to a datetime object

datetime

>>> import datetime,time

>>> stringDate = "2006-05-18 19:35:00"

>>> dt = datetime.datetime.fromtimestamp(time.mktime(time.strptime(stringDate,"%Y-%m-%d %H:%M:%S")))

>>> print dt

2006-05-18 19:35:00

>>> print type(dt)

<type 'datetime.datetime'>

>>>

time.strptime() converts the string to a struct_time tuple.

converts the string to a tuple. time.mktime() converts this tuple into seconds (elasped since epoch, C-style).

converts this tuple into seconds (elasped since epoch, C-style). datetime.fromtimestamp() converts the seconds to a Python datetime object.

Compute the difference between two dates, in seconds

>>> import datetime,time

>>> def dateDiffInSeconds(date1, date2):

... timedelta = date2 - date1

... return timedelta.days*24*3600 + timedelta.seconds

...

>>> date1 = datetime.datetime(2006,02,17,15,30,00)

>>> date2 = datetime.datetime(2006,05,18,11,01,00)

>>> print dateDiffInSeconds(date1,date2)

7759860

>>>

Managed attributes, read-only attributes

Create a private attribute ( self.__x )

) Create accessor functions to this attribute ( getx,setx,delx )

) Create a property() and assign it these accessors.

class myclass(object):

def __init__(self):

self.__x = None



def getx(self): return self.__x

def setx(self, value): self.__x = value

def delx(self): del self.__x

x = property(getx, setx, delx, "I'm the 'x' property.")



a = myclass()

a.x = 5 # Set

print a.x # Get

del a.x # Del

getx/setx/delx

class myclass(object):

def __init__(self):

self.__x = None



def getx(self): return self.__x

def setx(self, value): raise AttributeError,'Property x is read-only.'

def delx(self): raise AttributeError,'Property x cannot be deleted.'

x = property(getx, setx, delx, "I'm the 'x' property.")



a = myclass()

a.x = 5 # This line will fail

print a.x

del a.x

Traceback (most recent call last):

File "example.py", line 11, in ?

a.x = 5 # This line will fail

File "example.py", line 6, in setx

def setx(self, value): raise AttributeError,'Property x is read-only.'

AttributeError: Property x is read-only.



First day of the month

>>> import datetime

>>> def firstDayOfMonth(dt):

... return (dt+datetime.timedelta(days=-dt.day+1)).replace(hour=0,minute=0,second=0,microsecond=0)

...

>>> print firstDayOfMonth( datetime.datetime(2006,05,13) )

2006-05-01 00:00:00

>>>



dt

Fetch, read and parse a RSS 2.0 feed in 6 lines

sebsauvage.net

import urllib, sys, xml.dom.minidom

address = 'http://www.sebsauvage.net/rss/updates.xml'

document = xml.dom.minidom.parse(urllib.urlopen(address))

for item in document.getElementsByTagName('item'):

title = item.getElementsByTagName('title')[0].firstChild.data

print "Title:", title.encode('latin-1','replace')



Get a login from BugMeNot

import re,urllib2,urlparse



def getLoginPassword(url):

''' Returns a login/password for a given domain using BugMeNot.



Input: url (string) -- the URL or domain to get a login for.



Output: a tuple (login,password)

Will return (None,None) if no login is available.



Examples:

print getLoginPassword("http://www.nytimes.com/auth/login")

('goaway147', 'goaway')



print getLoginPassword("imdb.com")

('bobshit@mailinator.com', 'diedie')

'''

if not url.lower().startswith('http://'): url = "http://"+url

domain = urlparse.urlsplit(url)[1].split(':')[0]

address = 'http://www.bugmenot.com/view/%s?utm_source=extension&utm_medium=firefox' % domain

request = urllib2.Request(address, None, {'User-Agent':'Mozilla/5.0'})

page = urllib2.urlopen(request).read(50000)

re_loginpwd = re.compile('<th>Username.*?<td>(.+?)</td>.*?<th>Password.*?<td>(.+?)</td>',re.IGNORECASE|re.DOTALL)

match = re_loginpwd.search(page)

if match:

return match.groups()

else:

return (None,None)



>>> print getLoginPassword("http://www.nytimes.com/auth/login")

('goaway147', 'goaway')

>>> print getLoginPassword("imdb.com")

('bobshit@mailinator.com', 'diedie')



Logging into a site and handling session cookies

import cookielib, urllib, urllib2



login = 'ismellbacon123@yahoo.com'

password = 'login'



# Enable cookie support for urllib2

cookiejar = cookielib.CookieJar()

urlOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))



# Send login/password to the site and get the session cookie

values = {'login':login, 'password':password }

data = urllib.urlencode(values)

request = urllib2.Request("http://www.imdb.com/register/login", data)

url = urlOpener.open(request) # Our cookiejar automatically receives the cookies

page = url.read(500000)



# Make sure we are logged in by checking the presence of the cookie "id".

# (which is the cookie containing the session identifier.)

if not 'id' in [cookie.name for cookie in cookiejar]:

raise ValueError, "Login failed with login=%s, password=%s" % (login,password)



print "We are logged in !"



# Make another request with our session cookie

# (Our urlOpener automatically uses cookies from our cookiejar)

url = urlOpener.open('http://imdb.com/find?s=all&q=grave')

page = url.read(200000)



cookielib

ClientCookie

For forms: Menu "Tools" > "Page info" > "Forms" tab.

For cookies: Menu "Tools" > "Options" > "Privacy" tab > "Cookies" tab > "View cookies" button.

Searching on Google

import re,urllib,urllib2



class GoogleHarvester:

re_links = re.compile(r'<a class=l href="(.+?)"',re.IGNORECASE|re.DOTALL)

def __init__(self):

pass

def harvest(self,terms):

'''Searchs Google for these terms. Returns only the links (URL).



Input: terms (string) -- one or several words to search.



Output: A list of urls (strings).

Duplicates links are removed, links are sorted.



Example: print GoogleHarvester().harvest('monthy pythons')

'''

print "Google: Searching for '%s'" % terms

links = {}

currentPage = 0

while True:

print "Google: Querying page %d (%d links found so far)" % (currentPage/100+1, len(links))

address = "http://www.google.com/search?q=%s&num=100&hl=en&start=%d" % (urllib.quote_plus(terms),currentPage)

request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )

urlfile = urllib2.urlopen(request)

page = urlfile.read(200000)

urlfile.close()

for url in GoogleHarvester.re_links.findall(page):

links[url] = 0

if "</div>Next</a></table></div><center>" in page: # Is there a "Next" link for next page of results ?

currentPage += 100 # Yes, go to next page of results.

else:

break # No, break out of the while True loop.

print "Google: Found %d links." % len(links)

return sorted(links.keys())



# Example: Search for "monthy pythons"

links = GoogleHarvester().harvest('monthy pythons')

open("links.txt","w+b").write("

".join(links))



links.txt

Building a basic GUI application step-by-step in Python with Tkinter and wxPython

Flatten nested lists and tuples

import types



def flatten(L):

''' Flattens nested lists and tuples in L. '''

def _flatten(L,a):

for x in L:

if type(x) in (types.ListType,types.TupleType): _flatten(x,a)

else: a(x)

R = []

_flatten(L,R.append)

return R





>>> a = [ 5, 'foo', (-52.5, 'bar'), ('foo',['bar','bar']), [1,2,[3,4,(5,6)]],('foo',['bar']) ]

>>> print flatten(a)

[5, 'foo', -52.5, 'bar', 'foo', 'bar', 'bar', 1, 2, 3, 4, 5, 6, 'foo', 'bar']

>>>



Efficiently iterating over large tables in databases

fetchone() : Read one row at time.

: Read row at time. fetchmany() : Read several rows at time.

: Read rows at time. fetchall() : Read all rows at time.

fetchall()

con = sqlite.connect('mydatabase.db3'); cur = con.cursor()

cur.execute('select discid,body from discussion_body;')

for row in cur.fetchall():

pass



Because

fetchall()

loads all the rows in memory at once.

fetchall()

fetchone()

con = sqlite.connect('mydatabase.db3'); cur = con.cursor()

cur.execute('select discid,body from discussion_body;')

for row in iter(cur.fetchone, None):

pass

fetchone()

None

fetchone()

for

fetchone()

None

fetchmany()

con = sqlite.connect('mydatabase.db3'); cur = con.cursor()

cur.execute('select discid,body from discussion_body;')

for row in iter(cur.fetchmany, []):

pass

fetchmany()

fetchmany()

for

fetchmany()

[]

It's better to let the database backend choose the best threshold.)

fetchmany()

fetchall()

fetchone()

fetchone/fetchmany

even greater

A range of floats

range()

>>> print range(2,15,3)

[2, 5, 8, 11, 14]



def floatrange(start,stop,steps):

''' Computes a range of floating value.



Input:

start (float) : Start value.

end (float) : End value

steps (integer): Number of values



Output:

A list of floats



Example:

>>> print floatrange(0.25, 1.3, 5)

[0.25, 0.51249999999999996, 0.77500000000000002, 1.0375000000000001, 1.3]

'''

return [start+float(i)*(stop-start)/(float(steps)-1) for i in range(steps)]



>>> print floatrange(0.25, 1.3, 5)

[0.25, 0.51249999999999996, 0.77500000000000002, 1.0375000000000001, 1.3]



Converting RGB to HSL and back

Hue : The tint (red, blue, pink, green...)

: The tint (red, blue, pink, green...) Saturation : Does the color falls toward grey or toward the pure color itself ? (It's like the "color" setting of your TV). 0=grey 1=the pure color itself.

: Does the color falls toward grey or toward the pure color itself ? (It's like the "color" setting of your TV). 0=grey 1=the pure color itself. Lightness : 0=black, 0.5=the pure color itself, 1=white

def HSL_to_RGB(h,s,l) :

''' Converts HSL colorspace (Hue/Saturation/Value) to RGB colorspace.

Formula from http://www.easyrgb.com/math.php?MATH=M19#text19



Input:

h (float) : Hue (0...1, but can be above or below

(This is a rotation around the chromatic circle))

s (float) : Saturation (0...1) (0=toward grey, 1=pure color)

l (float) : Lightness (0...1) (0=black 0.5=pure color 1=white)



Ouput:

(r,g,b) (integers 0...255) : Corresponding RGB values



Examples:

>>> print HSL_to_RGB(0.7,0.7,0.6)

(110, 82, 224)

>>> r,g,b = HSL_to_RGB(0.7,0.7,0.6)

>>> print g

82

'''

def Hue_2_RGB( v1, v2, vH ):

while vH<0.0: vH += 1.0

while vH>1.0: vH -= 1.0

if 6*vH < 1.0 : return v1 + (v2-v1)*6.0*vH

if 2*vH < 1.0 : return v2

if 3*vH < 2.0 : return v1 + (v2-v1)*((2.0/3.0)-vH)*6.0

return v1



if not (0 <= s <=1): raise ValueError,"s (saturation) parameter must be between 0 and 1."

if not (0 <= l <=1): raise ValueError,"l (lightness) parameter must be between 0 and 1."



r,b,g = (l*255,)*3

if s!=0.0:

if l<0.5 : var_2 = l * ( 1.0 + s )

else : var_2 = ( l + s ) - ( s * l )

var_1 = 2.0 * l - var_2

r = 255 * Hue_2_RGB( var_1, var_2, h + ( 1.0 / 3.0 ) )

g = 255 * Hue_2_RGB( var_1, var_2, h )

b = 255 * Hue_2_RGB( var_1, var_2, h - ( 1.0 / 3.0 ) )



return (int(round(r)),int(round(g)),int(round(b)))





def RGB_to_HSL(r,g,b) :

''' Converts RGB colorspace to HSL (Hue/Saturation/Value) colorspace.

Formula from http://www.easyrgb.com/math.php?MATH=M18#text18



Input:

(r,g,b) (integers 0...255) : RGB values



Ouput:

(h,s,l) (floats 0...1): corresponding HSL values



Example:

>>> print RGB_to_HSL(110,82,224)

(0.69953051643192476, 0.69607843137254899, 0.59999999999999998)

>>> h,s,l = RGB_to_HSL(110,82,224)

>>> print s

0.696078431373

'''

if not (0 <= r <=255): raise ValueError,"r (red) parameter must be between 0 and 255."

if not (0 <= g <=255): raise ValueError,"g (green) parameter must be between 0 and 255."

if not (0 <= b <=255): raise ValueError,"b (blue) parameter must be between 0 and 255."



var_R = r/255.0

var_G = g/255.0

var_B = b/255.0



var_Min = min( var_R, var_G, var_B ) # Min. value of RGB

var_Max = max( var_R, var_G, var_B ) # Max. value of RGB

del_Max = var_Max - var_Min # Delta RGB value



l = ( var_Max + var_Min ) / 2.0

h = 0.0

s = 0.0

if del_Max!=0.0:

if l<0.5: s = del_Max / ( var_Max + var_Min )

else: s = del_Max / ( 2.0 - var_Max - var_Min )

del_R = ( ( ( var_Max - var_R ) / 6.0 ) + ( del_Max / 2.0 ) ) / del_Max

del_G = ( ( ( var_Max - var_G ) / 6.0 ) + ( del_Max / 2.0 ) ) / del_Max

del_B = ( ( ( var_Max - var_B ) / 6.0 ) + ( del_Max / 2.0 ) ) / del_Max

if var_R == var_Max : h = del_B - del_G

elif var_G == var_Max : h = ( 1.0 / 3.0 ) + del_R - del_B

elif var_B == var_Max : h = ( 2.0 / 3.0 ) + del_G - del_R

while h < 0.0: h += 1.0

while h > 1.0: h -= 1.0



return (h,s,l)



h

Edit

Generate a palette of rainbow-like pastel colors

HSL_to_RGB()

floatrange()

def generatePastelColors (n):

""" Return different pastel colours.



Input:

n (integer) : The number of colors to return



Output:

A list of colors in HTML notation (eg.['#cce0ff', '#ffcccc', '#ccffe0', '#f5ccff', '#f5ffcc'])



Example:

>>> print generatePastelColors((5)

['#cce0ff', '#f5ccff', '#ffcccc', '#f5ffcc', '#ccffe0']

"""

if n==0:

return []



# To generate colors, we use the HSL colorspace (see http://en.wikipedia.org/wiki/HSL_color_space)

start_hue = 0.6 # 0=red 1/3=0.333=green 2/3=0.666=blue

saturation = 1.0

lightness = 0.9

# We take points around the chromatic circle (hue):

# (Note: we generate n+1 colors, then drop the last one ([:-1]) because it equals the first one (hue 0 = hue 1))

return ['#%02x%02x%02x' % HSL_to_RGB(hue,saturation,lightness) for hue in floatrange(start_hue,start_hue+1,n+1)][:-1]



Columns to rows (and vice-versa)

table = [ ('Person', 'Disks', 'Books'),

('Zoe' , 12, 24 ),

('John' , 17, 5 ),

('Julien', 3, 11 )

]



print zip(*table)

[ ('Person', 'Zoe', 'John', 'Julien'),

('Disks' , 12, 17, 3 ),

('Books' , 24, 5, 11 )

]

How do I create an abstract class in Python ?

If it quacks like a duck, then it's a duck

.quack()

.quack()

import sys



class myLogger:

def __init__(self):

pass

def write(self,data):

file = open("mylog.txt","a")

file.write(data)

file.close()



sys.stderr = myLogger() # Use my class to output errors instead of the console.



print 5/0 # This will trigger an exception



mylog.txt

myLogger

.write()

do can

class myAbstractClass:

def __init__(self):

if self.__class__ is myAbstractClass:

raise NotImplementedError ,"Class %s does not implement __init__(self)" % self.__class__



def method1(self):

raise NotImplementedError ,"Class %s does not implement method1(self)" % self.__class__



class myClass(myAbstractClass):

def __init__(self):

pass



m = myClass()

m.method1()

Traceback (most recent call last):

File "myprogram.py", line 19, in <module>

m.method1()

File "myprogram.py", line 10, in method1

raise NotImplementedError,"Class %s does not implement method1(self)" % self.__class__

NotImplementedError: Class __main__.myClass does not implement method1(self)

matplotlib, PIL, transparent PNG/GIF and conversions between ARGB to RGBA

Generate a matplotlib figure without using pylab

Get a transparent bitmap from a matplotlib figure

Get a PIL Image object from a matplotlib Figure

Convert ARGB to RGBA

Save a transparent GIF and PNG

# Import matplotlib and PIL

import matplotlib, matplotlib.backends.backend_agg

import Image



# Generate a figure with matplotlib

figure = matplotlib.figure.Figure(frameon=False)

plot = figure.add_subplot(111)

plot.plot([1,3,2,5,6])



# If you want, you can use figure.set_dpi() to change the bitmap resolution

# or use figure.set_size_inches() to resize it.

# Example:

#figure.set_dpi(150)

# See also the SciPy matplotlib cookbook:

# and especially this example:

# http://www.scipy.org/Cookbook/Matplotlib/AdjustingImageSize?action=AttachFile&do=get&target=MPL_size_test.py



# Ask matplotlib to render the figure to a bitmap using the Agg backend

canvas = matplotlib.backends.backend_agg.FigureCanvasAgg(figure)

canvas.draw()



# Get the buffer from the bitmap

stringImage = canvas.tostring_argb()



# Convert the buffer from ARGB to RGBA:

tempBuffer = [None]*len(stringImage) # Create an empty array of the same size as stringImage

tempBuffer[0::4] = stringImage[1::4]

tempBuffer[1::4] = stringImage[2::4]

tempBuffer[2::4] = stringImage[3::4]

tempBuffer[3::4] = stringImage[0::4]

stringImage = ''.join(tempBuffer)



# Convert the RGBA buffer to a PIL Image

l,b,w,h = canvas.figure.bbox.get_bounds()

im = Image.fromstring("RGBA", (int(w),int(h)), stringImage)



# Display the image with PIL

im.show()



# Save it as a transparent PNG file

im.save('mychart.png')



# Want a transparent GIF ? You can do it too

im = im.convert('RGB').convert("P", dither=Image.NONE, palette=Image.ADAPTIVE)

# PIL ADAPTIVE palette uses the first color index (0) for the white (RGB=255,255,255),

# so we use color index 0 as the transparent color.

im.info["transparency"] = 0

im.save('mychart.gif',transparency=im.info["transparency"])

import matplotlib, matplotlib.backends.backend_aggimport Imagefigure = matplotlib.figure.Figure(frameon=False)plot = figure.add_subplot(111)plot.plot([1,3,2,5,6])# If you want, you can use figure.set_dpi() to change the bitmap resolution# or use figure.set_size_inches() to resize it.# Example:#figure.set_dpi(150)# See also the SciPy matplotlib cookbook: http://www.scipy.org/Cookbook/Matplotlib/ # and especially this example:canvas = matplotlib.backends.backend_agg.FigureCanvasAgg(figure)canvas.draw()stringImage = canvas.tostring_argb()tempBuffer = [None]*len(stringImage) # Create an empty array of the same size as stringImagetempBuffer[0::4] = stringImage[1::4]tempBuffer[1::4] = stringImage[2::4]tempBuffer[2::4] = stringImage[3::4]tempBuffer[3::4] = stringImage[0::4]stringImage = ''.join(tempBuffer)l,b,w,h = canvas.figure.bbox.get_bounds()im = Image.fromstring("RGBA", (int(w),int(h)), stringImage)im.show()im.save('mychart.png')im = im.convert('RGB').convert("P", dither=Image.NONE, palette=Image.ADAPTIVE)# PIL ADAPTIVE palette uses the first color index (0) for the white (RGB=255,255,255),# so we use color index 0 as the transparent color.im.info["transparency"] = 0im.save('mychart.gif',transparency=im.info["transparency"])

<html><body bgcolor="#31F2F2"><img src="mychart.png"><img src="mychart.gif"></body></html>

Caveat:

Except Internet Explorer 5.5 and 6 !

not

Automatically crop an image

In case of transparent images, the image transparency is used to determine what to crop.

Otherwise, this function will try to find the most popular color on the edges of the image and consider this color "whitespace". (You can override this color with the backgroundColor parameter)

import Image, ImageChops



def autoCrop(image,backgroundColor=None):

'''Intelligent automatic image cropping.

This functions removes the usless "white" space around an image.



If the image has an alpha (tranparency) channel, it will be used

to choose what to crop.



Otherwise, this function will try to find the most popular color

on the edges of the image and consider this color "whitespace".

(You can override this color with the backgroundColor parameter)



Input:

image (a PIL Image object): The image to crop.

backgroundColor (3 integers tuple): eg. (0,0,255)

The color to consider "background to crop".

If the image is transparent, this parameters will be ignored.

If the image is not transparent and this parameter is not

provided, it will be automatically calculated.



Output:

a PIL Image object : The cropped image.

'''



def mostPopularEdgeColor(image):

''' Compute who's the most popular color on the edges of an image.

(left,right,top,bottom)



Input:

image: a PIL Image object



Ouput:

The most popular color (A tuple of integers (R,G,B))

'''

im = image

if im.mode != 'RGB':

im = image.convert("RGB")



# Get pixels from the edges of the image:

width,height = im.size

left = im.crop((0,1,1,height-1))

right = im.crop((width-1,1,width,height-1))

top = im.crop((0,0,width,1))

bottom = im.crop((0,height-1,width,height))

pixels = left.tostring() + right.tostring() + top.tostring() + bottom.tostring()



# Compute who's the most popular RGB triplet

counts = {}

for i in range(0,len(pixels),3):

RGB = pixels[i]+pixels[i+1]+pixels[i+2]

if RGB in counts:

counts[RGB] += 1

else:

counts[RGB] = 1



# Get the colour which is the most popular:

mostPopularColor = sorted([(count,rgba) for (rgba,count) in counts.items()],reverse=True)[0][1]

return ord(mostPopularColor[0]),ord(mostPopularColor[1]),ord(mostPopularColor[2])



bbox = None



# If the image has an alpha (tranparency) layer, we use it to crop the image.

# Otherwise, we look at the pixels around the image (top, left, bottom and right)

# and use the most used color as the color to crop.



# --- For transparent images -----------------------------------------------

if 'A' in image.getbands(): # If the image has a transparency layer, use it.

# This works for all modes which have transparency layer

bbox = image.split()[list(image.getbands()).index('A')].getbbox()

# --- For non-transparent images -------------------------------------------

elif image.mode=='RGB':

if not backgroundColor:

backgroundColor = mostPopularEdgeColor(image)

# Crop a non-transparent image.

# .getbbox() always crops the black color.

# So we need to substract the "background" color from our image.

bg = Image.new("RGB", image.size, backgroundColor)

diff = ImageChops.difference(image, bg) # Substract background color from image

bbox = diff.getbbox() # Try to find the real bounding box of the image.

else:

raise NotImplementedError, "Sorry, this function is not implemented yet for images in mode '%s'." % image.mode



if bbox:

image = image.crop(bbox)



return image

Cropping a transparent image:

im = Image.open('myTransparentImage.png')

cropped = autoCrop(im)

cropped.show() Cropping a non-transparent image:

im = Image.open('myImage.png')

cropped = autoCrop(im)

cropped.show()

To do:

Crop non-transparent image in other modes (palette, black & white).

Counting the different words

text = "ga bu zo meuh ga zo bu meuh meuh ga zo zo meuh zo bu zo"

items = text.split(' ')



counters = {}

for item in items:

if item in counters:

counters[item] += 1

else:

counters[item] = 1



print "Count of different word:"

print counters



print "Most popular word:"

print sorted([(counter,word) for word,counter in counters.items()],reverse=True)[0][1]

Count of different word:

{'bu': 3, 'zo': 6, 'meuh': 4, 'ga': 3}

Most popular word:

zo

for

for item in items:

try:

counters[item] += 1

except KeyError:

counters[item] = 1



if item in counters

Quick code coverage

Trace

main()

import trace,sys

tracer = trace.Trace(ignoredirs=[sys.prefix, sys.exec_prefix],trace=0,count=1,outfile=r'./coverage_dir/counts')

tracer.run(' main() ')

r = tracer.results()

r.write_results(show_missing=True, coverdir=r'./coverage_dir')

coverage_dir

.cover

not

.cover

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-

import os,glob,cgi



def cover2html(directory=''):

''' Converts .cover files generated by the Python Trace module to .html files.

You can generate cover files this way:

import trace,sys

tracer = trace.Trace(ignoredirs=[sys.prefix, sys.exec_prefix],trace=0,count=1,outfile=r'./coverage_dir/counts')

tracer.run('main()')

r = tracer.results()

r.write_results(show_missing=True, coverdir=r'./coverage_dir')



Input:

directory (string): The directory where the *.cover files are located.



Output:

None

The html files are written in the input directory.



Example:

cover2html('coverage_dir')

'''

# Note: This function is a quick & dirty hack.



# Write the CSS file:

file = open("style.css","w+")

file.write('''

body {

font-family:"Trebuchet MS",Verdana,"DejaVuSans","VeraSans",Arial,Helvetica,sans-serif;

font-size: 10pt;

background-color: white;

}

.noncovered { background-color:#ffcaca; }

.covered { }

td,th { padding-left:5px;

padding-right:5px;

border: 1px solid #ccc;

font-family:"DejaVu Sans Mono","Bitstream Vera Sans Mono",monospace;

font-size: 8pt;

}

th { font-weight:bold; background-color:#eee;}

table { border-collapse: collapse; }

''')

file.close()





indexHtml = "" # Index html table.



# Convert each .cover file to html.

for filename in glob.glob(os.path.join(directory,'*.cover')):

print "Processing %s" % filename

filein = open(filename,'r')

htmlTable = '<table><thead><th>Run count</th><th>Line n°</th><th>Code</th></thead><tbody>'

linecounter = 0

noncoveredLineCounter = 0

for line in filein:

linecounter += 1

runcount = ''

if line[5] == ':': runcount = cgi.escape(line[:5].strip())

cssClass = 'covered'

if line.startswith('>>>>>>'):

noncoveredLineCounter += 1

cssClass="noncovered"

runcount = '►'

htmlTable += '<tr class="%s"><td align="right">%s</td><td align="right">%d</td><td nowrap>%s</td></tr>

' % (cssClass,runcount,linecounter,cgi.escape(line[7:].rstrip()).replace(' ',' '))

filein.close()

htmlTable += '</tbody></table>'

sourceFilename = filename[:-6]+'.py'

coveragePercent = int(100*float(linecounter-noncoveredLineCounter)/float(linecounter))

html = '''<html><!-- Generated by cover2html.py - http://sebsauvage.net --><head><link rel="stylesheet" href="style.css" type="text/css"></head><body>

<b>File:</b> %s<br>

<b>Coverage:</b> %d%% ( <span class="noncovered"> ► </span> = Code not executed. )<br>

<br>

''' % (cgi.escape(sourceFilename),coveragePercent) + htmlTable + '</body></html>'

fileout = open(filename+'.html','w+')

fileout.write(html)

fileout.close()

indexHtml += '<tr><td><a href="%s">%s</a></td><td>%d%%</td></tr>

' % (filename+'.html',cgi.escape(sourceFilename),coveragePercent)



# Then write the index:

print "Writing index.html"

file = open('index.html','w+')

file.write('''<html><head><link rel="stylesheet" href="style.css" type="text/css"></head>

<body><table><thead><th>File</th><th>Coverage</th></thead><tbody>%s</tbody></table></body></html>''' % indexHtml)

file.close()



print "Done."





cover2html()

.cover

index.html

Trace

Trapping exceptions to the console under wxPython

import sys

STDERR = sys.stderr # Keep stderr because wxPyhon will redirect it.



import wx



[...your wxPython program goes here...]



if __name__ == "__main__":

import traceback,sys

try:

app = MyWxApplication() # Start you wxPython application here.

app.MainLoop()

except:

traceback.print_exc(file=STDERR)

Get a random "interesting" image from Flickr

Note:

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-

import datetime,random,urllib2,re



def getInterestingFlickrImage (filename=None):

''' Returns a random "interesting" image from Flickr.com.

The image is saved in current directory.



In case the image is not valid (eg.photo not available, etc.)

the image is not saved and None is returned.



Input:

filename (string): An optional filename.

If filename is not provided, a name will be automatically provided.

None



Output:

(string) Name of the file.

None if the image is not available.

'''

# Get a random "interesting" page from Flickr:

print 'Getting a random "interesting" Flickr page...'

# Choose a random date between the beginning of flickr and yesterday.

yesterday = datetime.datetime.now() - datetime.timedelta(days=1)

flickrStart = datetime.datetime(2004,7,1)

nbOfDays = (yesterday-flickrStart).days

randomDay = flickrStart + datetime.timedelta(days=random.randint(0,nbOfDays))

# Get a random page for this date.

url = 'http://flickr.com/explore/interesting/%s/page%d/' % (randomDay.strftime('%Y/%m/%d'),random.randint(1,20))

urlfile = urllib2.urlopen(url)

html = urlfile.read(500000)

urlfile.close()



# Extract images URLs from this page

re_imageurl = re.compile('src="(http://farm\d+.static.flickr.com/\d+/\d+_\w+_m.jpg)"',re.IGNORECASE|re.DOTALL)

urls = re_imageurl.findall(html)

if len(urls)==0:

raise ValueError,"Oops... could not find images URL in this page. Either Flickr has problem, or the website has changed."

urls = [url.replace('_m.jpg','_o.jpg') for url in urls]



# Choose a random image

url = random.choice(urls)



# Download the image:

print 'Downloading %s' % url

filein = urllib2.urlopen(url)

try:

image = filein.read(5000000)

except MemoryError: # I sometimes get this exception. Why ?

return None



filein.close()



# Check it.

if len(image)==0:

return None # Sometimes flickr returns nothing.

if len(image)==5000000:

return None # Image too big. Discard it.

if image.startswith('GIF89a'):

return None # "This image is not available" image.



# Save to disk.

if not filename:

filename = url[url.rindex('/')+1:]

fileout = open(filename,'w+b')

fileout.write(image)

fileout.close()



return filename



print getInterestingFlickrImage()

Why is Python a good beginner language ?

print "Hello, world !"

a = input()

b = a + 2

print b

a = input()

b = a + 2

if b > 10:

print "More than 10 !"

def square(value):

return value*value



print square(5)

class myClass:

def __init__(self,value):

self.value = value

def bark(self):

print "Woof woof !"



myObject = myClass(5)

print myObject.value

myObject.bark()

one concept at time

experimenting

Java:



class myfirstjavaprog

{

public static void main ( String args[] )

{

System.out.println ( "Hello World!" ) ;

}

}

Student asks:



What is a class?, What is that funny looking bracket?, What is public?, What is static?, What is void for?, What is main?, What are the parenthesis for?, What is a String?, What is args?, How come there are funny square brackets?, What is system?, What does the dot do?, What is out?, What is println?, Why are there quotes there?, What does the semicolon do?, How come it's all indented like that?.



C:



#include <stdio.h>



main()

{

printf ( "Hello, World!

" ) ;

}

Student asks:



What is #include?, What are the greater than and less than signs doing there?, What is stdio.h?, What is main? What are the parenthesis for?, What is the funny bracket for?, What is printf?, Why is hello world in quotes?, What is the backslash-N doing at the end?, What is the semicolon for?



Python:



print "Hello World"

Student asks:



What is print?, Why is hello world in quotes?



Get the picture?

Why Python is not a good beginner language.

Memory allocation problems (malloc/free and try/except/finally blocks). More generally, unexperienced Python programers may not be aware of ressources allocation issues (as the Python garbage collector takes care of most problems (file handles, network connections, etc.)).





(malloc/free and try/except/finally blocks). More generally, unexperienced Python programers may not be aware of ressources allocation issues (as the Python garbage collector takes care of most problems (file handles, network connections, etc.)). Pointers and low-level operations . Python only manipulates references and objects, which is higher-level programming. Python programers may have hard times with pointers and arrays in C or C++. (Do you like sizeof() ?)





. Python only manipulates references and objects, which is higher-level programming. Python programers may have hard times with pointers and arrays in C or C++. (Do you like sizeof() ?) Specific API . Python comes with batteries included: It has the same API on all platforms (Windows, Linux, etc.). Other languages have their own API (Java), or a plateform-specific API (C/C++). Programers coming from Python will probably have to learn plateform specificities (which is mostly hidden in Python, eg. os.path.join())





. Python comes with batteries included: It has the same API on all platforms (Windows, Linux, etc.). Other languages have their own API (Java), or a plateform-specific API (C/C++). Programers coming from Python will probably have to learn plateform specificities (which is mostly hidden in Python, eg. os.path.join()) Static typing . Python programers will have to cope with mandatory variable and type declaration, casting and eventually templates in statically-typed languages (C++, Java, C#...) in order to acheive the same things they did naturally in Python.





. Python programers will have to cope with mandatory variable and type declaration, casting and eventually templates in statically-typed languages (C++, Java, C#...) in order to acheive the same things they did naturally in Python. Compilation . Compilation is not an issue in itself, but it adds a burden.





. Compilation is not an issue in itself, but it adds a burden. Well, after learning Python, other languages will look like pain in the ass to the Python developper. This can lead to demotivation .

Reading LDIF files

strongly

not

ldif.py

#!/usr/bin/python

# -*- coding: iso-8859-1 -*-



import ldif # ldif module from http://python-ldap.sourceforge.net



class testParser(ldif.LDIFParser):

def __init__(self,input_file,ignored_attr_types=None,max_entries=0,process_url_schemes=None,line_sep='

' ):

ldif.LDIFParser.__init__(self,input_file,ignored_attr_types,max_entries,process_url_schemes,line_sep)



def handle(self,dn,entry):

if 'person' in entry['objectclass']:

print "Identifier = ",entry['uid'][0]

print "FirstName = ",entry.get('givenname',[''])[0]

print "LastName = ",entry.get('sn',[''])[0]

print



f = open('myfile.ldif','r')

ldif_parser = testParser(f)

ldif_parser.parse()

Capture the output of a program

It's easy to capture the output of a command-line program.



For example, under Windows, we will get the number of bytes received by the workstation by picking up the "Bytes received" line displayed by this command: net statistics workstation

#!/usr/bin/python

import subprocess

myprocess = subprocess.Popen(['net','statistics','workstation'],stdout=subprocess.PIPE)

(sout,serr) = myprocess.communicate()

for line in sout.split('

'):

if line.strip().startswith('Bytes received'):

print "This workstation received %s bytes." % line.strip().split(' ')[-1]



send

input

#!/usr/bin/python

import subprocess

myprocess = subprocess.Popen(['net','statistics','workstation'],stdout=subprocess.PIPE)

(sout,serr) = myprocess.communicate()

for line in sout.split('

'):

if line.strip().startswith('Bytes received'):

print "This workstation received %s bytes." % line.strip().split(' ')[-1]

myprocess.wait() # We wait for process to finish

print myprocess.returncode # then we get its returncode.



Writing your own webserver

A webserver is relatively easy to understand:



The client (browser) connects to the webserver and sends it HTTP GET or POST request (including path, cookies, etc.)



The server parses the incoming request (path (eg. /some/file), cookies, etc.) and responds with a HTTP code (404 for "not found", 200 for "ok", etc.) and sends the content itself (html page, image...)



Browser

(HTTP Client) GET /path/hello.html HTTP/1.1

Host: www.myserver.com Server

(HTTP Server) HTTP/1.1 200 OK

Content-Type: text/html

<html><body>Hello, world !</body></html>

You can take the entire control of this process and write your own webserver in Python.

Here is a simple webserver which say "Hello, world !" on http://localhost:8088/





#!/usr/bin/python

import BaseHTTPServer



class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):

def do_GET(self):

self.send_response(200)

self.send_header('Content-type','text/html')

self.end_headers()

self.wfile.write('<html><body>Hello, world !</body></html>')

return



print "Listening on port 8088..."

server = BaseHTTPServer.HTTPServer(('', 8088), MyHandler)

server.serve_forever()



We create a class which will handle HTTP requests arriving on the port ( MyHandler ).

). We only handles GET requests ( do_GET ).

). We respond with HTTP code 200, which means "everything is ok." ( self.send_response(200) ).

). We tell the browser that we're about to send HTML data ( self.send_header('Content-type','text/html') ).

). Then we sends the HTML itself ( self.wfile.write(...) )



That's easy.

From there, you can extend the server:



by responding with specific HTTP error codes if something goes wrong (404 for "Not found", 400 for "Invalid request", 401 for "No authorized", 500 for "Internal server error", etc.)

by serving different html depending on the requested path ( self.path ).

). by serving files from disk or pages (or images !) generated on the fly.

by sending html data (text/html), plain text (text/plain), JPEG images (image/jpeg), PNG files (image/png), etc.

by handling cookies (from self.headers)

by handling POST requests (for forms and file uploads)

etc.

Possibilities are endless.





But there are some reasons why you should not try to write your own webserver:



You webserver can only server one request at time. For high-traffic websites, you will need to either fork, use threads or use asynchronous sockets. There are plenty of webserver which are already highly optimized for speed and will be much faster than what you are writing.



request at time. For high-traffic websites, you will need to either fork, use threads or use asynchronous sockets. There are plenty of webserver which are already highly optimized for speed and will be much faster than what you are writing. Webservers provide a great flexility with configuration files. You don'y have to code everything (virtual paths, virtual hosts, MIME handling, password protection, etc.). That's a great timesaver.



SECURITY ! Writing your own webserver can be tricky (path parsing, etc.). There are plenty of existing webserver developped with security in mind and which take care of these issues.



Writing your own webserver can be tricky (path parsing, etc.). There are plenty of existing webserver developped with security in mind and which take care of these issues. There are already plenty of ways to incorporate Python code in an existing webserver (Apache module, CGI, Fast-CGI, etc.).





While writing your own webserver can be fun, think twice before putting this into production.

SOAP clients

First try: SOAPy. Huu... last updated April 26, 2001 ? Try to run it. Oops... it is based on xmllib which is deprecated in Python. No luck !

Next one:





? Try to run it. Oops... it is based on which is in Python. No luck ! Next one: SOAP.py 2005 ? I fetch SOAPpy-0.12.0.zip, unzip, run " python setup.py install ":

SyntaxError: from __future__ imports must occur at the beginning of the file. WTF ?

By the way, SOAP.py depends on pyXML... which is not maintained since late 2004 and is not available for Python 2.5 !

What am I supposed to do with this ?

Ok, let's try another one:





? I fetch SOAPpy-0.12.0.zip, unzip, run " ": B