Listening to Chris Taggart talking about OpenCorporates at netzwerk recherche conf – data, research, stories, I figured I really should start to have a play…

Looking through the example data available from an opencorporates company ID via the API, I spotted that registered trademark data was available. So here’s a quick roundabout way of previewing trademarked images using OpenCorporates and Google Refine.

First step is to grab the data – the opencorporates API reference docs give an example URL for grabbing a company’s (i.e. a legal entity’s) data: http://api.opencorporates.com/companies/gb/00102498/data

Google Refine supports the import of JSON from a URL:

(Hmm, it seems as if we could load in data from several URLs in one go… maybe data from different BP companies?)

Having grabbed the JSON, we can say which blocks we want to import as row items:

We can preview the rows to check we’re bringing in what we expect…

We’ll take this data by clicking on Create Project, and then start to work on it. Because the plan is to grab trademark images, we need to grab data back from OpenCorporates relating to each trademark. We can generate the API call URLs from the datum – id column:

The OpenCorporates data item API calls are of the form http://api.opencorporates.com/data/2601371, which we can generate as follows:

Here’s what we get back:

If we look through the data, there are several fields that may be interesting: the “representative_name_lines (the person/group that registered the trademark), the representative_address_lines, the mark_image_type and most importantly of all, the international_registration_number. Note that some of the trademarks are not images – we’ll end up ignoring those (for the purposes of this post, at least!)

We can pull out these data items into separate columns by creating columns directly from the trademark data column:

The elements are pulled in using expressions of the following form:

Here are the expressions I used (each expression is used to create a new column from the trademark data column that was imported from automatically constructed URLs):

value.parseJson().datum.attributes.mark_image_type – the first part of the expression parses the data as JSON, then we navigate using dot notation to the part of the Javascript object we want…

value.parseJson().datum.attributes.mark_text

value.parseJson().datum.attributes.representative_address_lines

value.parseJson().datum.attributes.representative_name_lines

value.parseJson().datum.attributes.international_registration_number

Finding how to get images from international registration numbers was a bit of a faff. In the end, I looked up several records on the WIPO website that displayed trademarked images, then looked at the pattern of their URLs. The ones I checked seemed to have the form:

http://www.wipo.int/romarin/images/XX/YY/XXYYNN.typ

where typ is gif or jpg and XXYYNN is the international registration number. (This may or may not be a robust convention, but it worked for the examples I tried…)

The following GREL expression generates the appropriate URL from the trademark column:

if( or(value.parseJson().datum.attributes.mark_image_type==’JPG’, value.parseJson().datum.attributes.mark_image_type==’GIF’), ‘http://www.wipo.int/romarin/images/’ + splitByLengths(value.parseJson().datum.attributes.international_registration_number, 2)[0] + ‘/’ + splitByLengths(value.parseJson().datum.attributes.international_registration_number, 2, 2)[1] + ‘/’ + value.parseJson().datum.attributes.international_registration_number + ‘.’ + toLowercase (value.parseJson().datum.attributes.mark_image_type), ”)

The first part checks that we have a GIF or JPG image type identified, and if it does, then we construct the URL path, and finally cast the filetype to lower case, else we return an empty string.

Now we can filter the data to only show rows that contain a trademark image URL:

Finally, we can create a template to export a simple HTML file that will let us preview the image:

Here’s a crude template I tried:

The file is exported as a .txt file, but it’s easy enough to change the suffix to .html so that we can view the fie in a browser, or I can cut and paste the html into this page…

[UPDATE: images look like they now have the form: https://i1.wp.com/www.wipo.int/romarin/images/77/78/777839.jpg ? The IDs may also have changed…]

null null null null “[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]” “[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]” “[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]” “[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]” “[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]” “[\”BP GROUP TRADE MARKS\”]” “[\”20 Canada Square,\”,\”Canary Wharf\”,\”London E14 5NJ\”]” “[\”Murgitroyd & Company\”]” “[\”Scotland House,\”,\”165-169 Scotland Street\”,\”Glasgow G5 8PL\”]” “[\”BP GROUP TRADE MARKS\”]” “[\”20 Canada Square,\”,\”Canary Wharf\”,\”London E14 5NJ\”]” “[\”BP Group Trade Marks\”]” “[\”20 Canada Square, Canary Wharf\”,\”London E14 5NJ\”]” “[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]” “[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]” “[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]” “[\”ROBERT WILLIAM BOAD\”,\”BP p.l.c. – GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]” “[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]” “[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]” “[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]” “[\”MURGITROYD & COMPANY\”]” “[\”17 Lansdowne Road\”,\”Croydon, Surrey CRO 2BX\”]” “[\”A.C. CHILLINGWORTH\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON EC2M 7BA\”]” “[\”BP Group Trade Marks\”]” “[\”20 Canada Square, Canary Wharf\”,\”London E14 5NJ\”]” “[\”ROBERT WILLIAM BOAD\”,\”GROUP TRADE MARKS\”]” “[\”Britannic House,\”,\”1 Finsbury Circus\”,\”LONDON, EC2M 7BA\”]” “[\”BP GROUP TRADE MARKS\”]” “[\”20 Canada Square,\”,\”Canary Wharf\”,\”London E14 5NJ\”]”

Okay – so maybe I need to tidy up the registration related columns, but as a recipe, it sort of works. (Note that it took way longer to create this blog post than it did to come up with the recipe…)

A couple of things that came to mind: having used Google Refine to sketch out this hack, we could now move code it up, maybe in something like Scraperwiki. For example, I only found trademarks registered to one legal entity associated with BP, rather than checking for trademarks held by the myriad number of legal entities associated with BP. I also wonder whether it would be possible to “compile” what Google Refine is doing (import from URL, select row items, run operations against columns, export templated data) as code so that it could be run elsewhere (so for example, could all through steps be exported as a single Javascript or Python script, maybe calling on a GREL/Google Refine library that provides some sort of abstraction layer of virtual machine for the script to make use of?)

PS What’s next…? The trademark data also identifies one or more areas in which the trademark applies; I need to find some way of pulling out each of the “en” attribute values from the items listed in the value.parseJson().datum.attributes.goods_and_services_classifications.

Rate this: Share this: Tweet





Like this: Like Loading... Related