Embedded Database APIs: To SQL Or Not To SQL

Embedded database systems come in many flavors: key/value pair, object, relational, graph, document, etc. Obviously, for relational embedded database systems, SQL (most often via JDBC or ODBC) is the predominant means of access. (N.B. SQL is not an API, it is a language that is passed through APIs; JDBC and ODBC are APIs.) Many relational embedded database systems offer both SQL and lower level proprietary APIs that offer direct access to the database.

For the rest of the models, most employ proprietary APIs that are what the industry collectively refers to as “native”, “core” and/or “navigational” APIs.

GraphQL is an emerging technology. Like SQL, it’s a language, not an API.

Advantages of SQL

SQL is widely known, which means that there is a large pool of talent to recruit from. And, while the language is widely known, so are the APIs JDBC and ODBC. And even when proprietary APIs are used, they follow a similar pattern:

Obtain a database handle
Obtain a connection handle
Obtain a statement handle
Obtain a cursor
Pass an SQL statement through the statement handle
Associate the cursor with the result set
Iterate over the result set with the cursor
Deconstruct all the above

When SQL is used with JDBC/ODBC, the call path looks something like

Application –> Driver –> Driver Manager –> database (parse/optimize/execute)

When SQL is used with a proprietary API, the call path looks like

Application –> database (parse/optimize/execute)

It should be apparent that proprietary SQL APIs have an advantage in terms of performance and lower system complexity.

SQL can also offer great developer productivity (as long as SQL is well understood; there is a learning curve). It is pretty easy to write a query that joins a number of tables, filters on one or more columns, sorts by one or more columns, and/or calculates some aggregates. Accomplishing this with a proprietary API can range from straightforward (if the columns filtered and/or sorted on are from the same table) to very complex (if the parenthetical conditions just mentioned are not true). Likewise, a single simple SQL UPDATE or DELETE statement can act on many rows at once, whereas a non-SQL API would have to visit each row one-by-one.

SQL also offers a measure of vendor-independence and portability. However, “plug and play” database portability is a myth. Just try switching an application from Vendor A’s database with its ODBC driver to Vendor B’s database and it’s ODBC driver. There’ll be some work, but less than switching when you’ve used any sort of proprietary API (SQL or otherwise). That said, as a developer, once you’ve mastered SQL, that is a skill that is highly portable.

SQL has overhead

For all of its relative simplicity and expressiveness/productivity, SQL has unavoidable overhead. Even in the simplest scenario, where SQL is used with a proprietary API thus excluding a driver and driver manager, SQL must still be parsed, then optimized to arrive at the (hopefully) most efficient execution plan, and finally executed. There is no free lunch – these steps consume processor cycles, memory and other resources.

When your task is simple, e.g. you just want to poke one row of data into a table, or you just want to fetch one unique row from a table, SQL is kind of overkill. Completing that kind of task with any proprietary API is going to be easier to code, and execute at run time far faster with far fewer resources.

There is also the fabled “impedance mismatch” between SQL and [pick your target]. For example, there’s an impedance mismatch between SQL data types and any particular programming language’s data types. And, there’s an impedance mismatch between the relational database model that SQL is almost universally associated with and, e.g. an object model which, of course, gave rise to a lot of object-relational mapping (ORM) technologies in the past couple of decades (creating yet another layer of overhead).

Enter Proprietary APIs

Embedded database system proprietary APIs mitigate impedance mismatch. They are, after all, created by the same vendor that designed and developed the embedded database system.

Proprietary APIs also eliminate the parse/optimize/execute overhead of SQL. They offer direct access to the database content. Look, there’s no magic to SQL. Every SQL execution engine is using a native/core/navigational (anyway, proprietary) API for the low level I/O, so there’s nothing that SQL can do that you can’t do yourself in a native API as long as you have access to it (i.e., the embedded database vendor exposes it) and documentation. That’s why many embedded database systems offer both low-level native APIs and SQL APIs.

However, that also shifts the burden to the programmer to navigate the database using the API to affect joining, filtering, sorting and calculating aggregates. In some embedded database systems, these steps might be obviated by the nature of the database system. Meaning, e.g., a document database will likely store denormalized data that is already joined when it’s added to the database. Thus, there’s no need navigate the database content to join the related data at run-time. And, as mentioned above, it is often the case that your many of your applications interaction with the database are so simple that SQL is unjustified. If an embedded database system allows you to work with both an SQL API and a proprietary API at the same time, you have the best of both worlds.

Proprietary API disadvantages: no standard, vendor lock in

The heading pretty much says it all. Every embedded database system proprietary API is just that: proprietary. It’s a skill that you have to learn and, unlike SQL, it is not a skill that you an carry over to any other embedded database system.

And because of the proprietary nature of all native/navigational/direct APIs, they cause your system (if not also your skillset) to become locked in to one specific embedded database system. Even when based on open source technology, this lock in can be a bad thing if the embedded database system turns out to being falling short of necessary performance, or craters under stress, or fails any number of other scenarios that we can speculate about.

I briefly mentioned GraphQL in the introduction. To keep this article short, and because I believe it’s a subject worth exploring in depth rather than treating as an “also ran”, I’ll reserve a deeper dive into GraphQL for a future article.

Conclusion

SQL and proprietary APIs both have a place in the embedded database system universe. SQL has many advantages: it’s powerful, it has large talent pool, it’s a portable skillset, and it offers a measure of vendor-independence. But SQL also has drawbacks: there is overhead, it can be overkill, and it introduces impedance mismatch.

Proprietary APIs offer better performance characteristics, lower system complexity, lower system overhead, and a more straightforward means of carrying out simple database operations. But proprietary APIs take more effort to carry out complex database operations or operations that affect large data sets (e.g bulk update/delete), are not a portable skillset, and make it more difficult to migrate your system from one embedded database system to another should it become necessary to do so.