Notes on a C# memory-resident object database

Project: MemreDb

I once had to mock an API in C# as the person implementing that part of the project was delayed due to customer support issues. I did this with some simple collection classes inside the API client project, which would simply store the objects and return them when necessary to the application.

This allowed me to get on with the development of the (native) front end application. My interface between app and API client wouldn't change.

Once the engineer got onto the project he realised that the API agreed wasn't going to cut it. Obviously we all knew this would happen, as he got to grips with the shape of the data.

I could have gone back and restarted my mock API with moq or a similar library, but I was enjoying creating the mock API with collections for the objects, collections for the foreign key constraints, and locking objects to ensure consistency.

Of course, this code could not return the actual stored object, in case it got mutated by the app, so it would have to clone the object, so support needed to be added to all the data objects.

This was great fun but there was a lot of repeated code. Surely it could be implemented better with generics?

The idea

The project wooshed on and the API was delivered with sample data. The mocked API was useful for when the remote servers were down (the system I was connected to was far too complex to run locally, and we weren't allowed our own Azure instances as we hadn't yet migrated there), but there was no more work to be done on it apart from upkeep.

However in my personal time the idea of a memory based database that returned full C# objects grew, based on generics, with foreign key constrains, indexing, locking schemes, and the ability to execute common SQL. There was no need for an SQL parser, as this was going to be a quick project. Just a database class that had generic methods such as Insert(T toInsert), Delete(WhereClause clause), and T Select(WhereClause clause).

The schema would be constructed by creating tables containing columns with names and a type, with indexing, constraints and all the things typically needed in a simple database.

The WhereClause object would be constructed by adding table names, Boolean logic, literals, and comparisons, hierarchically. If an SQL parser was ever to be written it would create a WhereClause whilst processing that part of the statement.

Now, it was never going to be useful for most people. As the first line in the readme states 'this probably isn't the database you're looking for'.

Problems

The first problem arose with not having variadic generics in C#. I could not have a select method take multiple object types as generic parameters, returning a sorted list of all objects of all types that 'comply' with the where clause.

The second problem was, did I really need to return multiple types? This was a design question. For instance, would the theoretical users of this db need a list that contained both, or just one, of a pair of class types (in a way the a SELECT statement with a join can return data with nulls in where there is a LEFT or RIGHT join).

The answer to this was probably not. What users of object data really want is class objects that contain collections of dependent class objects. So I got on with developing a select method containing just one type, with the idea that if I wanted to continue developing I would add the contained data from joins.

Also, in theory I could have the select method return a list of objects of one type, and whilst maintaining the same lock, return objects of other types through further calls.

Implementation

The data objects to be stored had to have an empty public constructor. This allowed reflection to be leveraged to make copies of data, which was necessary for insertions and selections. The data held had to be immutable and not altered either after it was inserted or after it was returned.

The object stored could also have no private state that wasn't based on the public state. And dependent private state would need to be updated automatically by the object implementation. Also, the objects should have a single responsibility - if a table name is called 'license' it should store an object of class License which only contains license information. If the class stores other data then the table 'license' is a misnomer and should be called 'licenseAndSomethingOtherStuff'.

Of course if needed related data could be denormalised into it if necessary, such as a license usage count.

The data itself is held in a generic list collection and a binary tree, keyed on the primary key. The list was ordered on insert order, and was needed for select calls that weren't based on an indexed object element.

If a table has multiple indices, there would be multiple binary trees. If the table has an auto incrementing primary key, it would be incremented on insertion.

Where clauses are constructed from a source table, and a series of and clause objects, abd or clause objects. The clause objects are constructed from either literals or from other Boolean clause objects.

When the select method is called with a WhereClause, there is an optimistic locking system that, from the necessary tables, collects the current state of all those tables. This table state is a simple counter that is incremented on any alteration to the table. An overall state object links all these state ids into one place. On retrieving the data from the select method, if the states are no longer the latest then no data is returned and a conflict exception thrown.

Selecting data

The first implementation of select simply created an iterator for the tree in the selected sort order, and compared each item against the where clauses, creating a list as it went to return.

Obviously this was a naive implantation and ignored the power of trees.

The next implementation recursed down the where clause hierarchy to find an actual comparison to local data (data that is in the tree, no consideration for foreign keys or more than one table yet). Rather than iterate over the whole tree, it now inserts/discards tree branches dependent on comparisons, returning a new tree for this clause. This tree contains new nodes that reference the same internal data objects as are stored within the database. (Comparisons against non-indexed columns still iterate through entire tree, and return a new tree ordered by the primary index).

Iterating back up the where clause hierarchy, OR clauses merge descendant tree results together, and AND clauses merge them by only taking nodes that are in both trees.

Foreign keys

The software supports foreign key relationships, and inserts enforcing these relationships. There are cascaded deletions too.

The select method currently only processes data in the table the select is on. This needs expanding to bring in objects from other tables mentioned in a where clause.

Performance

This database was never implemented to be performant, so no measurements or optimisation has taken place. I expect it to be reasonably quick but offer no guarantees.

Conclusion

This was an evening project over a couple of weeks back in 2022. It taught me a lot about the capabilities and constraints of C# generics. It also taught me a lot about database internals.

I would like to move this project forward with support for returning datasets from JOINed queries, either return the foreign data inside a collection (for one to many) or a single property (for one to one), or just separate lists for data that is joined but not hierarchical.

Also I would like to update the queries to include:

not
is null (and is not null)
multiple joined tables, including comparisons between disparate tables through a chain of joins
group by
Count, sum, average

An SQL parser would be great to implement, and is on the nice-to-have list.