Once we hit this number, we felt that we could be confident that our data would accurately reflect the trends in publishing that have taken place since 1979.

Data Coding

Then we had to do some manual work. The database didn’t give us everything we wanted: it doesn’t list authors’ genders, for instance. But it did list their first names.

Thus began the lengthy process of manually coding the unique first names in our database. We couldn’t code all of them (we’d still be sat doing it now if we tried), but we did assign genders to all the names that appeared more than more than 150 times times, which gave us plenty of data that’d allow us to look at the gender politics of the Iranian publishing sector.

The Dewey Decimal System

Some of the information we wanted was in the database, but obscured. Information about genre is a good example—our scrape gathered the Dewey Decimal codes for all the books in the database, so we had to match these up with full topic descriptions in order to make use of them.

The Iranian classification system also makes use of a number of special classifications, which we found on the website of the Online Computer Library Center. Once we’d integrated this information into our database, we could undertake some proper analysis of the genres published in post-Revolutionary Iran.