Data Management Tools for Embedded Software Development

Sasan Montaseri, ITTIA, http://www.ittia.com/

When it comes to embedded systems and mobile devices software, developers face a choice between flat file and embedded database, as any future change can be very risky and expensive for their application, in a demanding market where shortcomings can result in failure.

For stand-alone applications that store little data, custom flat file formats are a straightforward method to save information. Plain text files use a human-readable format, such as INI, CSV, XML, or JSON that is parsed into application-specific data structures. Similarly, flat binary files are created by writing those data structures directly to disk. However, the simplicity of flat files come with a number of limitations, such as:

Although easy to write and create, even the slightest change requires the entire file to be rewritten, which in turn makes files susceptible to data loss.

The extra wear caused by repeated erasing and writing of a large text file to flash memory will eventually wear out and thus reduce the total lifetime of flash media.

Since the entire file must be read into memory to efficiently search for data, they cannot be very large because of the time required to write changes.

Flat files also have a limited data management life cycle offering, as devices and embedded systems are becoming more aware of each other and other systems.

Even with binary files, some common problems are:

Binary files can be divided into smaller pieces used independently, but special tools must be developed to use these files outside the application.

Binary files provide some protection against corruption of the entire file, but can still lose data when changes are only partially written before an unexpected power failure.

Binary files make it more difficult to store variable-width data.

If data is copied directly from memory to a binary flat file, it is not easy to open the

File on another architecture.

The need to store data in a software library takes developers of embedded systems and mobile devices to choose an embedded database. Databases can be implemented in many different ways, and the choice of algorithms in a particular database product has a profound impact on performance characteristics and what features are available. Areas most impacted by the implementation of the database include:

Frequency of costly disk or flash media I/O operations.

Time required to recover from a crash or power loss.

Ability to manage large amounts of data without severe performance degradation.

Performance impact of sharing data between tasks and other applications.

Relative performance of read and write operations.

Portability of the storage format and application code.

Effort required to integrate database technology into the application.

All embedded databases serve three main purposes for data management:

Reliable storage

Efficient queries

Safe shared access

Balancing these three areas of functionality is difficult because strong guarantees in one area can weaken the capabilities of the other two areas.

SQLite is an example of open source database available on market. Following, we will study some of the pros and cons SQLite as a database:

Atomic Transactions: SQLite supports atomic commit, but creates a rollback journal for each transaction that remembers the original value before each change is made. So when SQLite commits a transaction, it must write all modified pages to disk in their entirety and then wait for the operation to finish. This is extremely costly, but necessary to support recovery with only a rollback journal. Concurrency and Shared Access: SQLite uses the locking mechanism built-in to the file system to protect database files during modification, which allows any process on the device with access to the file to use the database. This approach relies on the stability of the file system locking framework, and therefore cannot be used with files that are shared over a network. Data Typing and Type Safety: SQLite uses dynamic run-time typing, which means that data types are only checked when a value is read from the database and used. This is useful in prototyping, before the application's requirements are fully formed, because it allows data to be written to the database without much regard for how it will be used later. Production code, however, must be carefully audited to ensure that type mismatches are not possible or can be dealt with in a reasonable way. In-Memory Storage: While SQLite supports memory databases, they store data in the same format that is used for disk tables. SQLite memory databases are private to each connection, and cannot be shared between different tasks. High Availability: The only high availability offered by SQLite is a limited backup solution. SQLite only provides basic recovery logging and limited backup, which are important tools for self- stabilization but hardly a complete solution. Replication: SQLite provides no replication functionality. Device Notification: SQLite applications can register callback functions to be notified when a row is changed. However, this feature only monitors changes made by the current connection, and cannot be used to monitor other connections to the database.

To overcome SQLite's shortcomings, developers of embedded systems and mobile devices should look for a database offering the following features:

Performance: Whether database access is limited to one connection at a time, or divided between several concurrent tasks, it must show a significant performance advantage for database writes. Low-level access to the engine: Each embedded application has unique requirements for data management and storage. Selecting the right tools requires a careful problem analysis and a thorough understanding of database technology. Using a technology with the right features makes a significant impact on the performance, maintainability, and extensibility of the application. Concurrency: Database should greatly simplify data management for applications on embedded systems and devices, but without the complexity of a back-end database server. Replication: Database should support built-in replication that duplicates changes across multiple databases, even if connectivity is not always available. ANSI Standards: Database should follow the ANSI SQL standards. Data typing and type safety: Database should offer greater protection with static typing rather than the manifest typing in SQLite. True in-memory storage engine: Database should use algorithms that are optimized for both read and write performance in RAM. High availability: The combination of built-in replication, client/server communications, online backup, and high-performance recovery should give developers a complete set of tools to achieve fault tolerance through both self-stabilization and duplication.

A feature-level comparison between open source and proprietary databases cannot be done accurately, as there is too much variation within each camp. However, the following table shows how a database combines the best qualities of both plain text and binary file formats, while also overcoming several important limitations:

Plain Text Files Flat Binary Files Databases Human-readable format Yes No No Convenient for configuration files, any editor can view and save a plain text file. Compatible with existing tools and APIs Yes No Yes Standard tools make it easy to convert to a spreadsheet or generate reports. Platform-independence Yes No Yes Variations in data structure alignment and padding tie flat binary formats to one platform. Variable-width fields Yes No Yes Compact storage formats rely on dynamic buffers, not fixed-width structures. Model hierarchical data structures Yes No Yes Nodes in a tree map readily to XML elements and database tables alike. Minimum overhead No Yes No Records in a binary file are copied directly between memory and disk. Efficient write throughput No Yes Yes In-place updates avoid rewriting an entire file after updating or deleting a record. Model relational data structures No No Yes Organize records with multiple unique and primary keys. Transaction logging No No Yes Crash recovery protects data from unexpected power loss and similar failures. Isolated shared access No No Yes Concurrent tasks must coordinate to access data safely. Scalable sorting and searching No No Yes To find records quickly, indexes must be continuously maintained over key fields. Standard query language (SQL) No No Yes Express complex queries with little or no application code. High availability No No Yes Replication and online backup mitigate media and communication failure risks.

Final Thought

Relational embedded databases offer solution to those typical problems. When you embed your application with an open source database like SQLite, you decide to become a database company and manage your own updates, upgrades, and patches. Choosing a proprietary database connects you to a dedicated team that will be available to take care of your needs from accountability, assistance in managing the danger of intellectual property and legal risks, and prediction of uncertainties. This helps application development to be simplified, and rather than building separate solutions for each of these problems, the application developer uses a standard framework for accessing and modifying data. Furthermore, reconciling these features with each other in a single application is not trivial and can become a great distraction to developing the business logic of the device.

Database Knowledge

Database Tutorials and Videos

Click here to view the complete list of archived articles

This article was originally published in the Winter 2012 issue of Methods & Tools