Corieltauvi Stater , between 50 BC to 20 BC. Findspot in North Lincolnshire.

This is the project page for the batch upload project populating Category:Portable Antiquities Scheme adding photographs of archaeological interest to Wikimedia Commons.

As of November 2019 there were 555,000 files in this project, with a total filesize of 556 gigabytes (query/15896) and around 350 volunteers helping with categorization and descriptions. The uploads are done by Fæ, feel free to ask questions about the project in general at User talk:Fæ.

Reports [ edit ]

Dog brooch, Roman 1st-2nd century, found in Lincolnshire.

Scope [ edit ]

The Portable Antiquities Scheme (PAS) publishes photographs and descriptions of archaeological finds registered in England and Wales on its website, https://finds.org.uk, on a free CC-BY license. Artefacts have a main "record" and may have one or several photographs and drawings associated with it. In the majority of cases where there are photographs from different angles, these have been merged into one photograph, for example most coin photographs show the obverse and reverse in the same photograph.

The images vary significantly in quality, with some being older amateur photographs and others high resolution recent research quality images taken by institutions such as the British Museum.

Category:Portable Antiquities Scheme is the "bucket category" for all images. This contains all the images uploaded within the batch upload project, but also images from other sources, such as volunteer photographs uploaded directly and images taken from PAS events and publications. Before the batch upload project started there were 1,200 photographs in this category and subfolders. All images in the batch upload project use the template {{Portable Antiquities Scheme}} as a credit line, with the benefit that this may be easily amended should PAS wish to be credited differently.

Advice for reuse [ edit ]

Searching [ edit ]

Figurine of Harpocrates , 1st-2nd century, found in Hertfordshire. Photograph shows views from four directions.

Categorizing [ edit ]

During the batch upload there has been no automatic categorization. The reasoning for this approach is that with such a large upload, including many thousands of coin images, category flooding would be highly likely; for example on the PAS database a search for "farthing" returns over 5,000 photographs. The aim is that once the batch upload is complete, volunteers will use searches or surfing through larger categories to find images of interest and add detailed categories to images which best illustrate the topic. In this way there is a default type of curation over time, with less educationally valuable images ending up having the least matches in searches.

As an exception, one sub-category is added to images, this is the category of the type "Portable Antiquities Scheme, <institution>", where the abbreviation for the identifying institution is substituted. This makes it easier to visually surf through images related by their region of findspot, as the identifying institution normally only is approached to identify local finds. For example "CORN" covers the Cornwall region, with the institution being the Royal Institution of Cornwall. Note that abbreviations beginning "FA" are associated with the named assessor rather than a region, so "FAIL" is finds identified by Ian Leins who specializes in ancient coin identification nationally, hence Category:Portable Antiquities Scheme, FAIL is populated with photographs of ancient coins.

The quickest way to categorize a few hundred images at a time, is to run a relevant search, such as this search for Louis XIV silver artefacts, then use the Cat-a-lot tool to quickly select which images you would like in a specific category, such as Category:Coins of Louis XIV of France. If there are more matches than images to exclude, you can do a 'select all', then deselect the non-matches. Take the time to examine the category, it may have better sub-categories to apply to the images, or already be flooded and be in need of more diffusion or sub-categorization. In practice when categories are reaching thousands of images, they should be considered over-full unless they are an intentional 'bucket category'.

Crops [ edit ]

Roman gold intaglio, 100-200 AD, Essex. Photograph cropped to remove measurement marker.

Rulers - the majority of photographs of artefacts have measurement rulers or marks in the photograph and some have identity labels. These can be safely cropped off the photograph to provide a better focus on the artefact as the original remains in the file history and can be clicked on within the Commons image page, by anyone wanting to refer to it. The built-in Commons CropTool can provide a lossless version.

- the majority of photographs of artefacts have measurement rulers or marks in the photograph and some have identity labels. These can be safely cropped off the photograph to provide a better focus on the artefact as the original remains in the file history and can be clicked on within the Commons image page, by anyone wanting to refer to it. The built-in Commons CropTool can provide a lossless version. Detail crops - if a section of the photograph would be useful, such as the front main view of a figurine where the original photograph has several views, then a crop can be created as a new file for reusers. Again the standard CropTool can do this, you just need to select the option to create as a new file. For coin images, it is normally best to keep both the obverse (front) and reverse (back) in the same image rather than separating them.

Batch upload project [ edit ]

17th century rowel (of a horse rider's spur), found in Cornwall.

For the original discussion that started this upload project refer to User_talk:Fæ#Portable_antiquities. The upload was suggested by Pigsonthewing and executed by Fæ with the benefit of reusing upload techniques from other projects.

The upload relies on the PAS database being available as JSON records, these are visible as links on every catalogue page. The batch upload runs through the entire set of images using URLs like https://finds.org.uk/database/images/index/page/8, which as a default returns 18 images per page. Each unique image ID is then used to pull the JSON metadata for the image. The image metadata has title and label fields (which appear to be consistently identical), but no long description. Where a main record can be found using the findID, there is an attempt to improve the description field by using the main record. One disadvantage is that the main record description may refer to several objects within one find, and so many images may end up sharing this same description, however this appears to be a relatively rare exception.

From 2017-01-23, the finds site has been whitelisted for upload-from-url, phab:T155844. This avoids having to upload from a local client version relying on home bandwidth and the upload rate can be significantly faster.

Copyright [ edit ]

Licenses are found against images under the metadata tag "license" and any attribution is defined under "imagerights". Tested for licenses are:

Attribution-ShareAlike License -> {{cc-by-sa-2.0}} Attribution License -> {{cc-by-2.0}}

The CC version numbers are as given in the database links. Any licenses not matching the precise text above are rejected and the image skipped for upload, such as "All Rights Reserved" or "Attribution-NonCommercial-ShareAlike License". Where attributions do not exist, a default of "The Portable Antiquities Scheme/The Trustees of the British Museum" is used.

JSON mapping [ edit ]

This table shows how PAS JSON metadata is mapped to parameters in the standard Commons template {{Photograph}} and {{Object location}} when relevant.

results (i.e. image gallery) - broadperiod -> date (default) - title -> title - findID -> accession number AND (filename = <title> + (FindID <findID>)) - county -> depicted place - old_findID -> accession number - filename -> accession number - institution -> [[Category:Portable Antiquities Scheme, <institution>]] image - id -> imageID (1) - filename -> accession number - label -> description (default) - imagerights -> author - mimetype -> (internal type checks) - fullname -> author - license -> permission record - description -> description - daterange -> date - numdate1 -> date - numdate2 -> date - centreLat -> object location - lat - centreLon -> object location - lon

1. imageID used only where titles are non-unique for multiple photographs of the artefact.

Where field may be 'null', there is a fall back to default values. If 'daterange' exists this overrides 'numdate's. Where numdate2 does not exist or equals numdate1, the date falls back to numdate1 on the presumption that the specific year is identified. If centreLat does not exist then {{Object location}} is skipped; note that for some finds the location is suppressed from public view on the PAS database and so will not be used on Commons.

Galleries for multiple views [ edit ]

Taunton civil war hoard, c.1645

Where there are multiple images for an artefact, a housekeeping process is adding galleries to each file so that reusers can find and navigate between views of the artefact. This process is deliberately lagging the batch upload process and may be many days later than the original upload.

An example is File:Socketed gouge (profile) (FindID 661462).jpg which has two other views, "convex face" and "concave face", added as the following gallery:

In the case of File:Taunton civil war hoard (FindID 643649).jpg, there are photographs of each coin in the hoard shown in the gallery, making it easy to surf through the collection.

You can find other multiple view galleries with this search.

Where the titles for images are not unique, a separate upload procedure will be applied to generate alternative file names and upload the missing files. Where there is a mix of some repeated titles with others being unique for the artefact, the gallery will be incomplete (there's a limit on how much volunteer time is worth it for rare edge cases!), however they should still be findable through navigating the gallery and checking the what-links-here list. Where apparent duplicates have been uploaded, these exist as separate images on the PAS database and have been uploaded because they are not digitally identical, possibly due to minor image enhancement or changes in the EXIF data. As part of later 'housekeeping', later uploaded missing files include a search link above the gallery which will show all possible matches, including those missed in the other versions gallery, for example File:Roman headstud brooch, close up of decoration (FindID 438056-324489).jpg.

The alternative naming scheme for multiple files with duplicated titles for an artefact is:

File:<title> (FindID <findID>-<imageID>).jpg

Example: File:Post-medieval crotal bell (FindID 660227-501403).jpg

The gallery process also identifies missing files and attempts to upload them, a number of drawings in TIFF with titles matching main photographs were identified this way, see search.

Wayback Machine [ edit ]

Initially as a retrospective housekeeping task, now as a pre-upload job, archive links to the Internet Archive Wayback Machine are automatically added to the image page. Where not already archived, the archive pages are created at the Internet Archive before the upload to Commons. This batch upload is the first where archive links have been added to support verification and avoid reliance on manual license reviews.

Known bugs and features [ edit ]

X-ray of 4th century Roman bowl from Irchester . Metadata of title and description not available for photograph, so replacements inherited from the find record.

These were discovered when pulling records from the PAS database. Some have caused the batch upload to Commons process to fall over, and other users of the external database may benefit from planning to address these bugs and features. It is worth keeping in mind that contributors to the database are not in a central institution and include records created by non-professionals, consequently inconsistency in usage is to be expected.

Bugs

Wrong format - https://finds.org.uk/database/artefacts/record/id/67545/format/json returns XML rather than JSON. It is unknown how many records may be affected, the presumption is that this is limited to a few early records on the Finds database. Missing (or TIFFs and PNGs?) - https://finds.org.uk/database/artefacts/record/id/52689 Mysterously referenced but apparently missing images. There are quite a few of these, though probably significantly fewer than 0.5% of the database. TIFF and PNG drawings do not seem to be viewable on the PAS website, but do appear as thumbnails on record views; this may also result in unexpected errors. It seems likely that the website has been designed to cope with viewing jpeg files but is inconsistent for displaying other formats even though these are on the database. For example https://finds.org.uk/database/ajax/download/id/659837 should download a PNG file, but results in an inconclusive text error message. For links to missing TIFFs see Petscan report for image pages with gallery cross-links to missing images. Images wrongly linked to artefacts - https://finds.org.uk/database/artefacts/record/id/515346, example showing medieval manuscript images linked to a record for a thimble. This is hopefully a rare error and probably down to human error, but there seems no automatic way that the database ensures that incorrect record numbers are applied to images. Blank image records, possibly where images have been deleted from the database. Example

Features