Half SQL: semi-opaque tables reduce developer effort
February 27, 2017 Leave a comment
Your relational data design could steal a trick from NoSQL: database representation of your application objects doesn’t need to be broken down to primitive values. Some opacity of relational data is beneficial.
Relational databases impose rigid structure on the data: data is organized in tables, which are made up of columns. Traditionally the columns could only be of primitive types (numeric or alpha), although opaque types (BLOB, CLOB) and XML are now universally available. The rigid structure makes it easy to reason about data, to guarantee data quality in the database layer and to build tooling on top of the database. Querying based on values of individual columns is well defined and highly efficient. Downstream usage (reporting, analysis) becomes easier. Relational structure works well when you need referential integrity across entities that are not updated together.
But relational databases have downsides. One of the downsides is the notorious difficulty of changing the database structure. Martin Fowler wrote: “A lot of effort in application development is tied up in working with relational databases. Although Object/ Relational Mapping frameworks have eased the load, the database is still a significant source of developer hours”. Guy Harrison blogging for Tech Republic: “Change management is a big headache for large production RDBMS. Even minor changes to the data model of an RDBMS have to be carefully managed and may necessitate downtime or reduced service levels“. There is “impedance mismatch” between RDBMS and application layer: “Application developers have been frustrated with the impedance mismatch between the relational data structures and the in-memory data structures of the application”. Even a change confined to a single table (e.g. adding a column) requires significant effort and synchronizing rollout of database and application layers.
The frustration with the amount of developers’ effort that the relational databases required was one of the drivers behind the rise of NoSQL starting about a decade ago. NoSQL databases differ from the relational in many other ways and switching to NoSQL is not an easy step. Fortunately, you can solve some of your problems by using just one item from the NoSQL bag of tricks. You can greatly reduce the impact of single-table changes (such as adding a new column, the most frequent type of change) by making your table definition semi-opaque. Don’t throw away your old and trusty RDBMS. Turn it into a Half SQL data store: in each table, select a small number of key columns that may be used in indexes and keep them. Hide all other fields from RDBMS by placing them into an opaque container field of a BLOB type. As a simplified example, Orders table may look like this:
Your application will be responsible for serializing and deserializing those blobs. Adding a new object field will be invisible to the RDBMS. When you need to add a new field, you will only need to change the code in the serializer/deserializer. And if you use a good serialization library (if your application is written in Java, please, don’t use built in serialization; there are many libraries that are faster and more flexible), even those changes in most cases will be NOOP because your library will take care of those automatically. No data migration will be needed. You will be able to write test to verify that your logic works before and after the change. And you retain the RDBMS goodness of referential integrity and super-fast queries over the indexed columns.
Stashing all “other” object fields into a BLOB column could save you quite a bit of effort.