@udb23 said in Versatile C++ game scraper: Skyscraper:

@muldjord said in Versatile C++ game scraper: Skyscraper: it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database Concerning local scraping you could adopt the xml file standard (basically an extended set of the tags used by ES) that @meleu and @Used2BeRX are already working on in this thread. This would allow to scrape also the huge info database and images that @Used2BeRX has created (not available anywhere online yet).

Looked into it and it seems they are going in a bit of a different direction that what I want to do. I am not going to create a database with game entries. I am going to create a database with single resources. This allows me to expand it at any given time, with any piece of information I see fit. So basically you have a primary key, which will be the rom sha1. Then you have a bunch of resources connected to that sha1. But it won't be bound in a <game> node. Instead it will just be a '<resource type="something" sha1="rom checksum" timestamp="something">This is an example</resource>'. The resource type can then be anything. For instance "description" or "players". When reading from the database I will then do a checksum from the source rom file, and then search through the database, connectin any resource that matches the sha1. And then assemble it to give me as much game detail as the database contains.

If later, you scrape using a web source that contains information that the local db doesn't have, it will then simply create a resource for it.

This approach is much more flexible than having a <game> node where all nodes are predefined inside.

So this is what I am planning to do. I will probably also create a stitcher, which allows any users' Skyscrape local db xml file to be stitched together with some other persons file. It will then correlate and implement missing data from that. So users can basically just share the xml files with each other. So if you have a great 'snes' xml local db, you can just send it to your friend, and he'll be able to stitch it into his own local Skyscraper db. And then he can scrape from it using a 'local' scraper module.

Ok, so that was a bit of an essay. I like to keep things simple. And I feel like this approach makes more sense for what I want to achieve with Skyscraper. An expandable database, only containing the details that you actually /have/. This will also save space, not having a bunch of empty xml subnodes in the db.

Thanks for sending me their way. I might do a collaboration with them at some point. But for now, I wanna try out this method instead.