I guess one question is what the ontology will be used for?

I can see them being useful for interaction with expert and natural language systems, but the schema.org ontology isn’t really sufficient for describing categories as discovered by learning algorithms or for fully describing the constraints/requirements on types of inputs and outputs. Which isn’t surprising as it’s an artificial categorization of reality as opposed to what our services could be learning directly from data.

My intuition says that an ontology should be an opt-in layer of detail. You don’t need it to interact with services, and I think it will be a while before services are intelligent enough to introspect the published ontology/schema of other services to figure out their utility. It’s also pretty heavy-weight, so if you start requiring the details of schema.org in API calls it’s a pretty big burden on new developers.

If we have some clear examples of how it will be used, e.g. only in API responses? Only in reference to entities defined in a canonical SNet database (see below)? Then it might help me (and the community) understand its importance.

In terms of making rapid progress, and making SNet services easy to use by the community, a priority for us should be to create a repository of data types representing common objects. My current preference, as I’ve expressed internally, is for them to be defined in protobuf files.

These common protobuf files can then be easily imported by grpc service definitions, and protobuf will take care of automatically generating client code across multiple languages. The protobuf compiler also has support for extended “options” so when it comes to it, we can annotate them with ontological information as needed. The compiler also will transfer documentation from the protobuf files into the generated client code. Lastly, there are open source proxy servers that can convert from grpc calls to REST endpoints if interacting with grpc isn’t your thing.

Having a common datatype repository naturally provides us a process to allow the community to propose new datatypes by pull request, have these under version control, and have them be historically accessible to developers. A service can publish a git hash or tag of the datatype repo to ensure the correct version is being used, and so long as the client is using a newer version it should be ok (protobuf provides ways of gracefully dealing with new unknown fields, so long as existing fields don’t change their semantic meaning).

The reason for asserting this, even under a post about ontologies, is that I think it’s easier to solve the ontology question after we have a functional network of agents. The applications will become clearer, and agents that need an ontology will come up with one. In the spirit of community-driven projects and open-source software, if a service author comes up with a good representation, other people will see the value and start using it.

Having expressed all that, I do agree a shared ontological grounding is needed for services to interoperate, and a hand-crafted one like schema.org is one option. An alternative to this approach could be an exemplar-based representation of entities. e.g. for image classification, neural-network models will have subtlely different understandings of what a “cat” is. The models could have been trained on different datasets, or even if their training data was the same, their random initialisation of weights before training can result in a different minima being found.

I think the way to share common representations here is to provide example imagery. A model may categorise an image as high probability on output neuron 5, which maps to the symbol “cat”, but the model should also provide an exemplar of a cat. For image classification that would be an rgb image of a canonical, high-scoring, cat image. For the natural language domain, perhaps it would ground itself with an image that’s been highly associated in webpages with sentences about “cats”, or it might show high-frequency word associations for “cat”.

I think this approach to ontology, one of exemplar-based representation, would be more powerful for shared representation and ensuring common semantic meaning even if it is less clean than a predefined and structured view of the world.

Another approach is saying, ok, yes we do want a structured ontology, but we want it applied to a shared database that SNet services can contribute to. We might have the “cat” symbol in the schema.org representation, but an image classification service can push to it and say “I know about cats”, including it’s high scoring cat images. The inverse is that clients can directly ask each service what it knows about the “cat” entity, if anything. If we have a database then we’d need to curate it, but if individual services replied with parts of the data it would be extremely sparsely populated and not necessarily sufficient to ensure two results map to the same entity. The reason it would be sparse, is because schema.org has a lot(!) of fields, and no way are people going to bother filling all of them out when building a service!

To summarise my opinions:

we should avoid the ontology discussion getting in the way of API implementation

the use of an ontology is probably best initially confined to services where it is necessary (i.e. what are we using the ontology for?)

exemplar-based ontologies are more useful for long-term shared understanding between services

Of course, I don’t have quite as much historical context as others in the team, so I might be missing the core reason for using an ontology. But if I’m missing it, others in the community may be as well