A dream of an ultimate OS

Oleg Kiselyov

This is a dream, and as such, it is made of shreds of reality shuffled and rearranged in sometimes bizarre combinations. It has been brewing on for over ten years, feeding on dissatisfaction with many of the major modern Operating Systems. Indeed, it is glaringly obvious that the only thing a user does at a terminal is requesting, reading and modifying textual information, arranged mainly into tables or scrollable lists. Nevertheless, the user often has to apply different and completely disparate commands to achieve exactly the same modification, for example, to delete an item (line) from a list as in: removing a line in a text document, removing a file, killing a process (that is, removing it from the list of active processes) or canceling a print job. Furthermore, despite the fact that an OS is swarmed with tables -- from a hierarchical database of files to yellow page's maps, to a hash dictionary of an object file archive (library), to relatively flat databases of IP routes, current processes, users, and code revisions -- common database functions like inserting a record into a "table" with hashing a key, retrieving a record/field using a simple/concatenated key and linking tables, are conspicuously missing among the core kernel services. This paper is an attempt to imagine what an OS would look like and how it would work if looking for a word 'foo' in something and deleting/closing/stopping this something, -- be it a paragraph of text, a network connection, a subscribed newsgroup, a process -- would all require roughly the same sequence of mouse clicks or keystrokes, and would be understood and interpreted in the same spirit by the operating system.





Introduction

It seems obvious then that database services and text/list editing are a core activity that ought to be supported on the very fundamental level in an OS. The paper flashes a few images reflecting particular examples of how this unification could be done and how one can work with it. Here is the preview:

In MacOS, TextEdit has been elevated to the level of a standard system (toolbox) service. It was a strong statement that the OS was not only about managing files and processes. Furthermore, deleting a piece of text, a file, a directory, a file server connection -- all can be accomplished by the same action: highlighting and dragging into trash. There is still room for improvement however: for example, a list of processes conceptually is not much different from a list of files. One can imagine the Finder manage (arrange, get info, duplicate, trash) files and folders that are not necessarily ordinary files and folders, but processes, open TCP connections, newsgroups, active and pending print jobs, to-do tasks, etc.

Deep down, an operating system is nothing but a manager of many databases. Indeed, a file system, the process table, routing tables, list of known AppleShare servers, revision control system (projector) data, Think C projects - they are all databases. Unfortunately, despite a sizable share of common functionality and interface, every one of them is implemented and managed separately.Why not to trade a multitude of "custom" database managers for a single well-designed distributed database manager?

Conventional databases are usually implemented on top of a file system. The file system itself however is a database. Mac's HFS and Novell's file systems even use btrees and other advanced indexing schemes of "real" databases. Querying a database for Jan 1994 sales, and clicking on folders " Sales ", " 1994 ", " January " are very related activities. Why can't a "real" database then take over a file system entirely? Modern DBMS have all facilities for storing images, sound files, movies and other big and small objects, and provide flexible interfaces for linking and querying records. I wonder, what else do we need files for?

Unification of the user interface and the underlying database may also bring together parts that presently can't even be conceived as linkable. For example, documents could be made of not only chunks of text, but folders and applications themselves. Just as one stores a link to a picture in a word processing document, he can embed links to menus, applications, or remote servers, or precompiled headers, or mailto: or other form or anchors as well. This can make the desktop look just like a homepage.





Everything is just editing

rm

ps

top

ProcessWatcher

kill process_id

lprm print_job_id

This is not a coincidence that deleting a line of text, removing a file, killing a process or a route or an ARP entry -- are all instances of the same activity, removing a row from a table. This uniformity runs deep, both on the level of implementation and on the level of presentation. Indeed, there are only few basic methods to manage a collection of objects: via some kind of list or a tree. Also, there are only so many ways to present a collection to a user and let him manipulate it. The sole mode people work at a terminal is browsing and editing, that is, moving the mouse, typing and pressing a PgDn key. Disparate interfaces are not a consequence then of some fundamental differences in user or system activities, they are simply the result of evolution: different subsystem and services were written by different people and modified by even bigger crowd.

Macintosh definitely stands out in this respect. Many similar functions within MacOS are accomplished by exactly the same action (e.g., removing is by dragging into trash, opening is by double-clicking). This is especially true with a Drag&Drop Manager installed. UNIX is moving towards some UI unification, too, taking CDE, or a proc filesystem as an example. The latter is long overdue: given a UNIX tenet "everything is a file" one could only wonder why a process should be any different (and why it took so long to implement and popularize this idea). However, the unification is not complete. While it is possible to open /proc/1024 to get hold of a process with id 1024 (to find out who owns this process and when it was created, if for nothing else), one cannot rm /proc/1024 to kill the process, and one cannot ls /proc/1024/open_files to see the list of all open files for this process. Although why not?

Since the list of processes conceptually is not much different from the list of files, MacOS could, conceivably, have a folder " Processes " populated with " files " standing for processes. A user could apply then the standard Finder operations, View-by , GetInfo , Trash , Duplicate to manipulate the processes. A Usenet News hierarchy is very similar to that of a filesystem (as a matter of fact, this is exactly how it is stored and managed on an NNTP server). A newsreader Nuntius presents the news hierarchy as a directory tree of "folders" and "files" in a view-by-name mode. Alas, Nuntius had to emulate much of the Finder functionality to manage these newsgroups-folders. Many applications -- for example, printer and network managers, an FTP utility or a newsreader -- will be easier to develop and use if one can tell the Finder: "here, this is a list of files , manage it as you usually do with a list of files, just tell me when you are about to trash something."

Rearranging file icons within a folder view and rearranging paragraphs within a document are essentially the same activity. If they are unified, the overhead and code duplication can be significantly reduced. As a bonus, this would also enable ordinary documents to contain folders, icons and applications: they automatically become hyper-documents.





The luster and dull of plain text

plain text files

/etc/hosts

sendmail.cf

syslog.conf

inetd.conf

/etc/uucp/Systems

.INI

System Folder:Hosts

ex

edlin

cat

/etc/hosts

/machines

netinfo

But it does not have to be this way. If a database engine is implemented as a core system service, along with simple tools to browse and modify database records, the gordian knot of system configuration files disappears. MacOS comes very close to this ideal, with ResEdit as this universal database editor. Much (if not all) of the system configuration can be set up by opening a resource and toggling a few buttons, retyping strings or adjusting colors. There is no need to learn the syntax of a specific configuration file, and no wasting of the CPU time on parsing that text file and reporting errors if any. Unfortunately, ResEdit and a set of templates for system resources do not come bundled with MacOS. But SimpleText always does. That is why System Folder:Hosts is a plain text file...

The very idea of an application as a mere collection of code and configuration resources with a common name is beautiful. It is even possible in some applications a (LaserWriter Utility, for example) to add or delete menu items and corresponding functionality just by adding/removing appropriate resources, without any need to recompile or relink the code. It was with pain that I read a recommendation to refrain from creating code resources with PowerPC native code (which should be moved to the data fork instead). Now an application has a database managed by the resource manager, and a database managed by a fragment manager.





Everything is Database

termcap

printcap

/etc/hosts

/etc/networks

/etc/services

whatis

does

A universal database frees the OS or an application from many chores: wildcard resource lookups, time stamping, permission checking, etc. Universality also offers another clear advantage: an ability to link all kinds of records and tables, which is so much a pain now. For example, a link between two records representing files is no different for a universal database manager than a link between a record in a Users table, a record in a Processes table, a few records in the Files table and a Print jobs table. There is no longer need for multiple IDs and keeping track of them. Many-to-many links are possible as well. Performance also improves: list of all processes belonging to user joe can be found faster by a database query rather than with ps aux | grep joe . Surely any database can do a better selection than in a dumb search like netstat -a | grep finger (which is used to finger fingerer): Many similar scripts are just database queries, and not very efficient ones. Makefile s would be easier to generate and maintain. The universal database would also allow linking of an #include file directly into an includee; this will obsolete the arcane art of specifying compiler's -I and -L flags and trying to predict which of several possible time.h files the compiler would actually pick.





File system is a database, which can use improvement

The hierarchical organization of file systems manifests itself in nesting of directories (folders). On the other hand, directories are merely named views of a certain subset of files, selected according to some criteria. Thus, a directory can be thought of as a "database view," a named database query. It follows immediately that a file may appear in as many "directories" (views) as one wishes to. For example, one "folder" may show all files tagged as "sales reports", while another directory contains files modified within five days. Searching a file system and creating and populating a folder becomes therefore the same activity. Since saved views are database objects themselves, one can reference views within views if one so wishes. There is no required hierarchy however: one may create two views that refer to each other, or any other network of views that best suits the problem.

Every database record (item) should have some mandatory attributes: time stamps, owner, permissions, kind (document). Beyond that, a user (or an application/creator) may add anything they want to. For some records, the body is just a lump of binary data without further detalization. On the other hand, records of kind image (or in table images ) may have additional attributes like the image width, height and depth, a signature of a compression method, etc. Thus, listing all images 512-pixel wide and with a depth of at least 8 with a private colormap should be as easy as viewing files in a folder by date.

It goes without saying that even a naked OS must include some rudimentary database browser, to view and tinker with these files . It may look like a basic no-frills record browser in Paradox, which displays all fields of a record in columns or as name=value pairs. Of course, an OS should have an ability to generate better looking views and reports, as Paradox does. Still, the basic browser is necessary and useful (like cat or SimpleText ) in a desperate situation.

A general purpose database as a file system does not present users with a completely alien environment. Almost all old computer skills will still apply. For example, a file may still be specified by its path. Indeed, a path is merely a list (or sequence) of keys telling how to locate a file, and as such, the meaning of a file path extends beyond hierarchical file systems. The Web gives an especially good example: consider a URL http://somehost/foo/bar.html . The most obvious interpretation of the corresponding resource is as a file named bar.html located in a directory foo under the DocumentRoot on the host somehost . If foo is a CGI script however, bar.html is merely a parameter passed to that script. It can be interpreted as a file name, or anything else the script wishes. In short, not everything between slashes in a URL is a name of a directory. It is just a key specifying an object in question. The same argument holds for the new file system: one still can locate a file by entering some of its attributes separated with slashes. The database approach allows however wildcards really everywhere, and such "directory names" as file modification date, size etc. Thus running find and listing a "directory" would be exactly the same activity.

Many industrial-strength DBMS (e.g., Oracle, Informix) support an efficient and transparent access to remote databases as well as data replication. A distributed DBMS thus subsumes NFS, taking care of authorization, data transmission, local caching, data consistency, etc.





Mock-up 'hello world' session

System

UserConfigurable

application

C

C

text

To write a proper C code for "hello world", I need to include a declaration for the standard i/o package. Working in the source code editor, I may select a tool "enter a db object" and tell it to look for and insert an object with attributes "data, C declaration, owned by system, containing a string 'standard io' in comments". Or I can simply enter "C declaration for printf() ," relying on the editor to fill in the rest of the query parameters. Furthermore, I may finish typing the body of the main() function, then click on printf() and tell the editor to find and include a header containing a declaration of that function. Whatever the editor inserts, I can click on it to see what this object actually is and has.

This kind of functionality may become available even tomorrow. Also, doing away with the files as we know them does not mean breaking every habit, every skill and every application. This merely means working with a computer in a more natural way.