Python has input/output features which are very easy to use. Files are accessed through file objects and you can open, read, write and close files using a very simple and easy to use functions from the standard library. To manage files and deal with special file formats (xml, json..) python provides special packages that make the developer life ever easier.

Filtering Files – inputfile module

If you write scripts for automation , you are probably familiar with commands like grep, head, tail and more. These commands gets a file (or more) as an input and filter its content to the output. If you want to write a similar task in python, the inputfile module is very helpful

import fileinput for line in fileinput.input(): print (fileinput.filename()) print (line + ':' + str(fileinput.lineno())) 1 2 3 4 5 import fileinput for line in fileinput . input ( ) : print ( fileinput . filename ( ) ) print ( line + ':' + str ( fileinput . lineno ( ) ) )

run this script with a list of filenames as parameters or patterns:

# inpdemo *.txt 1 # inpdemo *.txt

The script loops over all the lines in all txt files, printing the file name, line content and line number. To write a filter program, just add conditions:

import sys, fileinput, re, glob pattern = sys.argv.pop(1) # uncomment in windows #sys.argv[1:] = glob.glob(sys.argv[1]) for line in fileinput.input(): res = re.search(pattern, line) if res: print (fileinput.filename()) print (line + ':' + str(fileinput.lineno())) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 import sys , fileinput , re , glob pattern = sys . argv . pop ( 1 ) # uncomment in windows #sys.argv[1:] = glob.glob(sys.argv[1]) for line in fileinput . input ( ) : res = re . search ( pattern , line ) if res : print ( fileinput . filename ( ) ) print ( line + ':' + str ( fileinput . lineno ( ) ) )

Using the script: (searching for lines with string hello)

# inpdemo hello *.txt 1 # inpdemo hello *.txt

Serialization with pickle

Pickle module converts Python objects into a stream of bytes usually written to a file, or across a network. To use pickle, open a binary file and dump/load your objects.

import pickle d1={'name':'liran', 'id':1000, 'age': 45} outp = open('customer', 'wb') pickle.dump(d1, outp) outp.close() 1 2 3 4 5 6 7 import pickle d1 = { 'name' : 'liran' , 'id' : 1000 , 'age' : 45 } outp = open ( 'customer' , 'wb' ) pickle . dump ( d1 , outp ) outp . close ( )

and load:

import pickle inp = open('customer', 'rb') cust = pickle.load(inp) print(cust) inp.close() 1 2 3 4 5 6 import pickle inp = open ( 'customer' , 'rb' ) cust = pickle . load ( inp ) print ( cust ) inp . close ( )

Types supported by pickle:

All primitive types

strings, bytearrays

collections with pickable objects (set, list, tuple, dictionary)

custom types:

class Student(object): def __init__(self, id = 0, name = ''): self.__id = id self.__name = name def pr_student(self): print("id=" + str(self.__id) + " name=" + str(self.__name)) import pickle s=Student(100, 'avi') s.pr_student(); outp = open('students', 'wb') pickle.dump(s, outp) outp.close() inp = open('students', 'rb') d = pickle.load(inp) d.pr_student(); inp.close() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 class Student ( object ) : def __init__ ( self , id = 0 , name = '' ) : self . __id = id self . __name = name def pr_student ( self ) : print ( "id=" + str ( self . __id ) + " name=" + str ( self . __name ) ) import pickle s = Student ( 100 , 'avi' ) s . pr_student ( ) ; outp = open ( 'students' , 'wb' ) pickle . dump ( s , outp ) outp . close ( ) inp = open ( 'students' , 'rb' ) d = pickle . load ( inp ) d . pr_student ( ) ; inp . close ( )

Note the pickle protocol attribute (see docs)

File Compression with bz2, gzip

With bz2, gzip modules , you can create an archived compressed file

import pickle, gzip, bz2 s=Student(100, 'avi') s.pr_student(); outp = bz2.open('customer.bz2', 'wb') pickle.dump(s, outp) outp.close() inp = bz2.open('customer.bz2', 'rb') d = pickle.load(inp) d.pr_student(); inp.close() 1 2 3 4 5 6 7 8 9 10 11 12 13 import pickle , gzip , bz2 s = Student ( 100 , 'avi' ) s . pr_student ( ) ; outp = bz2 . open ( 'customer.bz2' , 'wb' ) pickle . dump ( s , outp ) outp . close ( ) inp = bz2 . open ( 'customer.bz2' , 'rb' ) d = pickle . load ( inp ) d . pr_student ( ) ; inp . close ( )

Pickling on a large scale with shelve

if you are going to use pickle on a large scale, the shelve module uses a database to store pickle objects by a string key:

import shelve s=Student(100, 'avi') s.pr_student(); outp = shelve.open('customer.dat') outp['c1'] = s outp.close() inp = shelve.open('customer.dat') d = inp['c1'] d.pr_student(); inp.close() 1 2 3 4 5 6 7 8 9 10 11 12 13 import shelve s = Student ( 100 , 'avi' ) s . pr_student ( ) ; outp = shelve . open ( 'customer.dat' ) outp [ 'c1' ] = s outp . close ( ) inp = shelve . open ( 'customer.dat' ) d = inp [ 'c1' ] d . pr_student ( ) ; inp . close ( )

JSON Files

JSON files are very useful for saving data offline, saving configuration and more. In the following example we use the json module to dump a dictionary to a JSON file

import json list = ['foo', {'bar': ('baz', None, 1.0, 2)}] with open("dict.json", 'w') as d: json.dump(list,d) 1 2 3 4 5 import json list = [ 'foo' , { 'bar' : ( 'baz' , None , 1.0 , 2 ) } ] with open ( "dict.json" , 'w' ) as d : json . dump ( list , d )

the generated file: (note the conversions)

["foo", {"bar": ["baz", null, 1.0, 2]}] 1 [ "foo" , { "bar" : [ "baz" , null , 1.0 , 2 ] } ]

XML Files

You can find many modules and packages for handling and parsing XML files. One simple module is minidom:

Parsing XML string:

import xml.dom.minidom doc = xml.dom.minidom.parseString('<site>devarea.com</site>') 1 2 import xml.dom.minidom doc = xml.dom.minidom . parseString ( '<site>devarea.com</site>' )

Parsing XML file:

doc = xml.dom.minidom.parse('sites.xml') 1 doc = xml.dom.minidom . parse ( 'sites.xml' )

And navigating in the DOM object:

print(doc.childNodes) print(doc.firstChild.tagName) ... 1 2 3 print ( doc . childNodes ) print ( doc . firstChild . tagName ) . . .

And many more

CSV Files

CSV files are used to store tables. All database systems can import and export data in CSV format. Use the csv module to handle CSV files:

import csv with open('students.csv') as my_file: reader = csv.DictReader(my_file) for row in reader: print(row['name'], row['city']) 1 2 3 4 5 import csv with open ( 'students.csv' ) as my_file : reader = csv . DictReader ( my_file ) for row in reader : print ( row [ 'name' ] , row [ 'city' ] )

The DictReader converts each row to a dictionary

Configuration Files – INI files

Warning!!! – Not a windows fan? skip this section

Windows uses ini files for settings and configurations. The configparser module helps you writing and parsing those files:

from configparser import * config = ConfigParser() config.add_section('GLOBALS') config.set ('GLOBALS', 'TRACE', 'True') config.add_section('FILENAMES') config.set ('FILENAMES', 'DIR','myapp') config.set ('FILENAMES', 'MASTER','%(dir)s\\master') config.set ('FILENAMES', 'SLAVE','%(dir)s\\slave') fh = open("config.ini", "w") config.write(fh) fh.close() # now read the file config.read('config.ini') master = config.get ('FILENAMES','master') print (master) print (config.getboolean('GLOBALS', 'TRACE') ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 from configparser import * config = ConfigParser ( ) config . add_section ( 'GLOBALS' ) config . set ( 'GLOBALS' , 'TRACE' , 'True' ) config . add_section ( 'FILENAMES' ) config . set ( 'FILENAMES' , 'DIR' , 'myapp' ) config . set ( 'FILENAMES' , 'MASTER' , '%(dir)s\\master' ) config . set ( 'FILENAMES' , 'SLAVE' , '%(dir)s\\slave' ) fh = open ( "config.ini" , "w" ) config . write ( fh ) fh . close ( ) # now read the file config . read ( 'config.ini' ) master = config . get ( 'FILENAMES' , 'master' ) print ( master ) print ( config . getboolean ( 'GLOBALS' , 'TRACE' ) )

the generated file: