Retrieving information from Picasa is not an easy thing, the software is quite limited and hardly offers any data export function.

What I would like to do: extract the raw face recognition information from the Picasa database. I need the person names, the image filename and the rectangle associated with each face.

On Windows 7, the Picasa database is located in C:\Users\USERNAME\AppData\Local\Google\Picasa2. In this folder, we can found mainly pmp and db files. The pmp files store tabular data. Each PMP file contains a column of a table in the database. The name of the file is table_column.pmp. The Picasa database contains 3 tables:

albumdata, which contains information on the albums (folders and face album)

catdata, categories data (almost empty on my computer)

imagedata, images data (includes rectangles and references albums)

How to read PMP files ?

PMP files are binary files in little-endian format. The header is described by the following table :

Size Description 4 bytes magic constant : 0x3fcccccd 2 bytes field type (unsigned short) 2 bytes constant: 0x1332 4 bytes constant: 0x00000002 2 bytes field type (unsigned short) 2 bytes constant: 0x1332 4 bytes number of entries (unsigned int)

The field-type values are :

Value Description 0x0 null-terminated strings 0x1 Unsigned int, 4 bytes 0x2 Dates, Microsoft Variant Time format, 8 bytes 0x3 byte field, 1 byte 0x4 unsigned long, 8 bytes 0x5 unsigned short, 2 bytes 0x6 null-terminated strings 0x7 unsigned int, 4 bytes

See http://sbktech.blogspot.fr/2011/12/picasa-pmp-format.html for more information on pmp files.

The interesting values for faces are facerect (rectangle coordinates) and personalbumid (album reference) in the table imagedata, and the values token (album reference) and name (person name) in the table albumdata.

Example (keeping only 4 columns):

The rectangle is described by a value in the format rectangle64. A 64 bit number breakable in 4 16-bit numbers. The 4 numbers, once divided by 2^16-1 (the maximum value), are the relative coordinates of the top left corner and the bottom right corner. The absolute values can be obtained by multiplying the values by the width and height of the picture.

Example :

original number (64 bits) 0x67873bec9e1e933d Break in 4 16-bit number 0x6787 0x5678 0x3b51 0x4a89 Convert to decimal 26503 15340 40478 37693 Divide by 2^16-1 (65535) 0,4044 0,2341 0,6176 0,5751 Multiply by the width (3264) and the height (2448) x1=1319 y1=573 x2=2016 y2=1407

With those information, we know that the image x has a rectangle corresponding to a specific person but we don’t know yet the file name of the picture.

This information is held in the file thumbindex.db. This file contains the whole list of folders and files indexed inside the Picasa database. the line x in thumbindex.db file will correspond to the same image as the line x in the table imagedata.

How to read thumbindex.db ?

Header :

Size Description 4 bytes magic constant: 0x40466666 4 bytes number of entries (unsigned int)

And each line follows this schema:

Size Description until null character null-terminated strings 26 bytes useless content 4 bytes index

A line will be either a folder with its complete path and a specific index value (4294967295), or an image with its filename and an index value pointing to the parent folder.

In this example the image 266 is in the folder 5.

See http://projects.mindtunnel.com/picasa3meta/docs/picasa3meta.thumbindex.ThumbIndex-class.html for more information on the thumbindex.db file.

If we merge the table imagedata and the data from thumbindex.db, for a specific image, we have the filename, the face rectangle but no album reference to associate the face to a person!

Picasa will actually add a virtual image to store this information. In the previous example, the virtual picture 268 (which has no filename) is linked to the image 266 and will contain information on the face of one person (1 virtual image per person). The rectangle in the image 266 will contain all face rectangles present in the image (when the image has more than one face identified, otherwise the rectangle will be the same as the one of the single person). So, we just need to read the reference album of the image 268 and associate it with the image file 266.

I have created a software that parse all those information and store them in csv files. One file per table pmp and one file for the faces. If imagemagick is installed (and the convert application is in the path), the software can create thumbshots of all the faces.

How to use the program to parse Picasa database ?

There are in fact 2 programs, one called PMPDB that will convert the pmp tables into csv files and one called PicasaFaces that will create a nice human readable csv with all the face information and the face thumbshots.

Usage:

java -classpath ".:bin/:commons-cli-1.2.jar" PMPDB -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder

java -classpath ".:bin/:commons-cli-1.2.jar:commons-io-2.4.jar" PicasaFaces -folder "/path/to/PicasaDB/Picasa2/db3/" -output ./OutputFolder -replaceRegex C: -replacement /media/HardDrive -convert /path/to/convert(.exe)



If the command line contains the argument -convert, then imagemagick will create all the face thumbshots (in the output folder with a folder for each person). A string replacement of the original image paths can be done if the pictures location is different from the database (in the example “C:” will be replaced by “/media/HardDrive”).

source are available on github : https://github.com/skisoo/PicasaDBReader

Working on Windows and Linux.