Dataset Versioning & Naming

Me a few seconds ago: “So that’s a qri dataset”. Me right now: “Actually, that’s not a dataset, it’s a version of a dataset”. 😇 What we call a dataset at qri is a combination of one or more versions under a single dataset name.

Think about how you might do version control with a folder and several files on your computer. You could name the folder for the dataset, then append version numbers to the CSVs:

The filesystem-version-control-system. We want to make sure you never type _FINAL into a filename again

With qri, you name the dataset and then start making versions under that name. Each version is created at a specific time, so it’s always crystal clear what the latest version is. All of the older versions are immutable; they can live forever as under the dataset name as a record of changes over time.

Make changes, commit. Make changes, commit. Any change to a component constitutes a new version of the overall dataset!

To recap, a qri dataset has a name, consists of one or more chronological versions (user x saved at time y), and each version includes components (body, meta, etc).

Let’s go one level higher. All qri datasets are associated with exactly one user. To use qri, you must establish your identity. This is what allows the system to keep track of who made changes and when. We use username/dataset_name notation to refer to datasets. (e.g. chriswhong/usgs_earthquakes, or b5/world_bank_population)

Just like with code, the latest version of a dataset is probably most important to most users most of the time, but keeping the full history around for reference is extremely valuable.

Github users may find our dataset naming similar to the username/repository_name notation used to version code. Just like with code, the latest version of a dataset is probably most important to most users most of the time, but keeping the full history around for reference is extremely valuable.

So, let’s recap again. Qri users have datasets. Datasets are made of versions. Versions are made of components. Components each have a specific model and purpose for storing either the data itself, or some other useful information about the data.

🤓 The qri user namespace is filling up quickly. You’d better sign up now and get your short, numberless username while it’s still available!