What Is An Embedded Database? – Embedded Database Systems

An embedded database system, sometimes also called an in-process database system, is one that is delivered as a set of libraries that are linked with the application code such that the database system functionality exists within the application itself. The term “in-process database system” more accurately describes this architecture because the database system operates in the same address space as the application itself. This is in contrast to client/server database systems, where a separate process, the database server, provides database services to client applications. The server and the client each exist as separate processes, with their own address space.

NOTHING TO DO WITH EMBEDDED SYSTEMS

The term embedded database should not be conflated with embedded systems. Embedded database systems began in the early 1980s as a solution for line-of-business applications or departmental computing. These early embedded database systems (Empress, C-Tree, Raima, dBase and others) ran on PCs and Unix-based microcomputers and served basic business functions ranging from accounting to human resources.

In the latter half of the 1990s, embedded database systems also began to be used in embedded systems, and one company, McObject, claimed in 2001 to create the first in-memory embedded database system written explicitly for embedded systems. That was probably the touchpoint for the confusion between embedded database and embedded systems.

API

Perhaps also a legacy of the early history of embedded database systems, there is sometimes an incorrect assumption that an embedded database system is also a NoSQL database system. It is true that early embedded database systems had non-SQL APIs (application programming interfaces). But, the second generation of embedded database systems, such as dBase, R:Base, and SQLWindows sported SQL APIs. It is also true that the original non-SQL embedded database systems eventually adopted the SQL language, while retaining their original non-SQL APIs.

These non-SQL APIs should not, however, be confused with NoSQL. In many cases, these embedded database systems with non-SQL APIs were structured databases. NoSQL database systems offer a variety of ways of managing unstructured data: key-value pairs, documents, objects and graphs. To be sure, there were classic embedded database systems that supported unstructured data. One of the most popular was BerkeleyDB from Sleepycat, since acquired by Oracle. But, back in the day, the term “NoSQL” was not in use. That term did not gain popularity until 2009, though its first use can be traced to 1998.

Non-SQL APIs are also referred to as native API, navigational API and direct API. “Native” simply refers to the fact that a particular API is specific to one particular embedded database system because it was created by the developers for that database system. The phrase “navigational API” refers to the fact that you use the API to navigate through the database contents one record at a time. In contrast, SQL operates on a “set” of data. Lastly, “direct API” derives from the fact that the API facilitates direct access to the data, unlike SQL which involves a layer of abstraction. Another way to explain this is that an SQL execution engine is always implemented with a direct API.

SCALABILITY

An embedded database system, by its nature, doesn’t not scale in the same sense that a client/server and/or distributed database system, and generally speaking does not need to. Recall that embedded database systems are used to implement line-of-business and departmental solutions and, in the case of embedded systems, purpose-specific devices such as cameras, navigation systems, network and telephony infrastructure systems. They are NOT used for data warehousing, big data analytics or other types of systems that need to scale horizontally and/or vertically.

Furthermore, recall the alternative name for embedded database systems: in-process database systems, and that the database functionality exists within the host application’s same address space. So, an embedded database system needs to be able to scale in the context of being able to support multiple threads and/or multiple processes, preferably without serious degradation in performance. And it may need to scale vertically, within reason (vertical scalability refers to the volume of data a system can manage).

An embedded database system’s ability to scale with multiple threads and processes is a function of its approach to concurrency and lock granularity. The simplest approach, embodied by SQLIte for example, is to give exclusive access to a task that wants to update the database (create, update or delete data). Every other task will be blocked until the writer task completes its work. Locking can also be managed at the table level, database page level, or row level. Collectively, locking strategies fall under the heading pessimistic concurrency control. A more sophisticated approach to concurrency that is becoming more common is optimistic concurrency control implemented by multi-version concurrency control, which avoids locks and offers the hope of greater scalability for multiple threads and processes.

NO MESSAGE PASSING

To be 100% embedded, an embedded database system needs to avoid message passing with any process that is external to the application hosting the embedded database system. That said, there are embedded database systems that rely on one or more separate processes for certain functionality. For example, BerkeleyDB implements concurrency through a separate process called the lock arbiter, and one implementation of Raima RDM’s transactional file system exists as a separate process when needed to support multiple processes (an embeddable version of TFS exists for solutions implemented as a single process).

So, embedded database systems exist on a continuum of really embedded to mostly embedded with some message passing, but less than client/server.

CLIENT/SERVER

Which begs the question, can an embedded database system also be client/server? Yes, it can. Some vendors’ embedded database systems allow the host application to also be a database server. The process that requires the highest performance, and therefore could not tolerate the latency inherent in message passing, embeds the database system and takes advantage of the direct access. Simultaneously, it uses an API of the embedded database to open one or more ports on which to listen for external processes that wish to use the same database as clients. Any embedded database system that implements a RESTful API does this.