JPL: JPL: Storage on SAFE currently offers few abstractions. It is a low-level data store so @oetyng ls looking to apply structures or abstractions on top that developers can use, one example being an event sourcing database. Others (later) include queues and dictionaries.

That’s right. And an EventStore is a database while a queue and a dictionary are data structures. Using the former is far more uncommon and has greater impact on application design, than the two latter. So, I could have been more specific, but what I meant was like implementing the IDictionary or maybe IReliableDictionary (of ServiceFabric framework) which is an interface in C#. That way existing applications could switch out current implementations without breaking anything.

This will enable coders to use the network without even changing anything they are doing, and without needing to learn the details of the mutable data implementations for example.

I.e. quickly get up to speed with producing SAFE Network based apps.

JPL: JPL: An event sourcing (ES) database is used to store everything that happens to a particular object as a stream.

The short answer would be yes. The longer answer would be that what you store in a stream, is not necessarily limited to everything that happens to a particular object. It would be an application design choice how you partition your streams. An aggregate of objects could have one stream, but a single IOT device could have its stream as well.

JPL: JPL: State is not stored, but instead, the state of a particular stream (eg a bank account) at a particular time can be calculated by ‘replaying’ all the events (eg deposits and withdrawals) prior to that time.

Yes almost correct, with a slight modification

Current state is not stored, but instead all the changes to its state are stored as separate events.

Usually in an application, an object (like for example an Account, where we have chosen to have one stream for it), which is representing the current state as a result of all events, would be cached. So replaying events would not be necessary when accessing this object often. And any additional change would be applied directly to current state, and the new events appended to the event store. That way you keep the application responsive. (There is also something called snapshotting, but it is complicating things, and best is if you can avoid it by design.)

JPL: JPL: The fact there are no duplicates means that the database serves as a single source of the truth, similar to blockchains.

Yes, an event is a fact. And it is supposed to be the single source of truth.

JPL: JPL: ES databases are useful for storing events in the order they happen (they’re append only) and have a built-in audit log (again like blockchains).

Yes and yes. Depending on how you design your application though, you could be using events that say something about another date than the actual store date of the event. It is all about how you interpret the data when you then calculate current state. As an example:

We have customers that make deposits. If the data from the bank has not been read the same day that the deposit was registered at bank, then it is useful to add a property say BankReceiveDate, which could be different than the event TimeStamp, so that when our system records this event the day after, we are still building up current state with correct information (for the accounting for example). So if you after this want to display the balance of every day, you would have the correct balances for each day, even though the events were not stored in the order that what they modeled “happened” (what the event is actually supposed to reflect, is a matter of design DepositMade and BankDepositRegistered are reflecting two different things).

If you want to record something that happened before the facts of a previous event that has already been stored, then it can become complicated. But it is solvable. If you have just stored current state, and overwritten it with new state, then you are totally lost when you try to go back in time and apply something as if you were at that point in time.

JPL: JPL: Advantages of ES include massive scalability and fast writes. However, queries are more difficult than in a relational db and have to be modelled (so slower, less precise).

Yes, you could have massive scalability and very fast writes if designed correctly. And to the queries part, the statement is partly true.

However, the separation of reads and writes can be especially beneficial for fast queries.

This requires the usage of projections (i.e. modeled as you say), which is a way to make use of eventsourced data, but in no way a requirement for an event store.

I did not get very far into the projections subject. These can be built and scrapped at will, you always have the events to rebuild any projection you want. You would just choose any events, and stream them through a function that would use the data from the events, to build up some certain state (could be in memory, could be persisted). The events need not come from the same stream or stream category. Most often the projections use events from various streams and categories. These are then used for querying data. You could create very efficient queries this way. If you want to build an OLAP cube, by projecting the events, you can do this too, and then you get the possibility for powerful ad hoc queries that you would be used to with relational databases.

So the event streams themselves are not efficient to query, and so the process of setting up projections is an additional work input - true. But when you have constructed what ever projection you want, there is nothing saying that the queries would be slower. On the contrary, you could have much faster queries.

JPL: JPL: The main advantages SAFE can offer as a backend to an ES database :

JPL: JPL: It acts as a single storage source (a virtual hard drive) instead of multiple distributed nodes, removes the need for replicated copies, simplifies management. This is important because an event sourcing database stores more data (i.e. deltas of every event going back through time) than would typically be the case with a relational database, so avoiding duplication for redundancy is an important efficiency saving.

I would say that it is true for any database. You don’t need to move around data to access it. You just add permissions. Backups are unnecessary. You can have high availability by always being able to spin up an instance of your application somewhere, and have instant access to the data (as long as good internet connection is as mundane as steady electricity). But even with intermittent/weak internet connection it has similar benefits.

Event sourced applications often use a lot of messaging, and I can see how messaging infrastructure could be cut down on, since you don’t need to report to some other physical location, what is stored on this physical location.

JPL: JPL: The architecture is also simpler as there is effectively just one database that developers need to address.

Yes, partly because of that. But also the fact that any database can be created and accessed. And it can be of any size it needs to be. And that we could (depending on app logic design) cut down on messaging. What we are saying is that, as long as a good internet connection is as mundane and granted as a good electricity source, then you can basically have one shared “infinite” hard drive, which you can use for keeping databases on for example.

JPL: JPL: The data cannot be tampered with – therefore it is reliable. This is vital if you are looking to build a single source of the truth. There is no central control and no-one can remove the data. Removing data would break the entire stream in an ES db.

Yes!

(There are situations when editing and deleting in streams are justified - although event sourcing theory states streams shall be immutable. It mostly has to do with correcting bugs. It can be solved by redirecting pointers to events if you have designed events to be immutable.)

JPL: JPL: If the file already exists on SAFE you won’t need to store it again.

It could be designed this way, which could minimize data storage needs when you could expect large number of events to be identical. It would however require that things like Id and TimeStamp be considered metadata, and stored separately from the event body, to be able to take advantage of that feature.

JPL: JPL: Is that about right, as a 101?

About right!

JPL: JPL: First, what are the main use cases? You say you develop fintech applications. For what type of projects do you use an ES database? Presumably, ES databases only work with immutable data?

Event sourcing theory builds on the assumption that the streams are modeling facts. Things that happened. So you cannot change what happened. But you can add new things that happened which could change the meaning of previous things that happened. Compensating actions revert state. Like with accounting. You do not erase in the accounting ledger.

When we are interested in the deltas in our reality, which would be in every place where we are measuring things for example, you would be able to do a good representation of this, and maintain information of use.

What applications it would be suited for really comes down to how much and what information you need to store (i.e. at some time after conception also use).

Do you only need to store a form, with the name and address field of a user for example, then an event store is not needed.

Do you want to track the orbit of something real or abstract, and later correlate various influences on the orbit (NB: this is in very abstract terms and can be referring to almost anything), then event sourcing is just the sort of storage solution you would use.

In customer support, we have great use of storing every action some of our customers takes, since it

can later be correlated with probability to be reaching out to customer support. We can device help messages and nudging hints before the users themselves know that they need help.

In fraud detection, we can analyze behavior.

And financial applications of course with the transactions and requirements of accounting, audit etc. makes it very suitable.

JPL: JPL: Second, even if a file already exists on SAFE you still have to pay for the PUT in order to protect the anonymity of the original uploader. Would this undermine the efficiency gains offered by SAFE?

My first suggestion for an implementation of an event store, makes use of the entries in a mutable data structure. Each entry is a potential holder of a set of events resulting from one command.

In this version, we do not make use of the deduplication of the network.

Generally though, I would not say that PUT costs undermine the efficiency gains. There will be certain drawbacks and benefits with using the network, and I am absolutely sure there will be plenty of cases when the benefits are heavily outweighing drawbacks. But as always, some cases might be better solved with other approaches.

If for example you need low latency a lot more than you need the the security, accessibility and so on, then you might want some solution where data is closer physically.