NoSQL Databases & Polyglot Persistence

NoSQL Properties

NoSQL databases differ from traditional relational systems in fundamental ways. The database model is not relational, and the focus shifts to distributed and horizontal scalability. Schema restrictions are weak or absent entirely, data replication is straightforward, and access is typically provided through an API. The consistency model does not follow ACID guarantees, trading strict consistency for availability and partition tolerance.

Polyglot Persistence

Different database models suit different purposes, and using various database types within one application can be beneficial when each is deployed according to its strengths. This concept is called polyglot persistence and allows both SQL and NoSQL technologies to coexist within the same application, leveraging relational databases for transactional integrity and NoSQL stores for scalability or flexible schemas.

Core NoSQL Technologies

The four core NoSQL technologies are key-value stores, column-family databases, document stores, and graph databases. Each offers distinct tradeoffs in structure, scalability, and query expressiveness.

Key-Value Stores

In key-value stores, a specific value can be stored for any key with a simple command. A database qualifies as a key-value store if it has a set of identifying data objects called keys, exactly one associated value for each key, and the ability to query the value by specifying the key. Processing speed can be enhanced further if key-value pairs are buffered in main memory, creating what are called in-memory databases.

Scalability is nearly unlimited thanks to fragmentation or sharding of the data content. Partitioning is straightforward because of the simple model: individual computers within the cluster, called shards, take on only a portion of the keyspace, distributing the workload evenly across machines.

Column-Family Stores

Column-family stores enhance the key-value concept by providing additional structure. In the Bigtable model, a table is a sparse, distributed, multidimensional, sorted map. The data structure assigns elements from a domain to elements in a codomain, the mapping function is sorted so there is an order relation for keys, the data is distributed across different computers, addressing is multidimensional with more than one parameter, and the map is sparse because many keys may have no data entry.

Databases using this model are called column-family stores. They store data in multidimensional tables where data objects are addressed with row keys and properties are addressed with column keys. Columns are grouped into column families, and the schema only refers to those families — within one family, arbitrary column keys can be used. In distributed architectures, the data of a column family is preferably stored physically in one place, called co-location, to optimize response times.

Document Stores

Document stores combine the absence of a schema with the possibility of structuring stored data. Despite the name, document stores do not handle arbitrary documents like web pages, video, or audio, but structured data in records called documents. On the first level, document stores function as a kind of key-value store where every key, or document ID, corresponds to a record stored as the value. On the second level, these documents have their own internal structure, usually in JSON format.

A document store is a key-value store where the data objects stored as values are called documents and the keys are used for identification. This dual-layer structure allows flexible schemas while still enabling efficient retrieval by key.

Native XML Databases

A native XML database is a specialized document store where data is stored in documents compatible with the XML standard. XML technologies such as XPath, XQuery, and XSLT can be used for querying and manipulating data, making these databases well suited for applications that already rely heavily on XML for interchange or configuration.

Graph Databases

Graph databases represent data and schemas as graphs or graph-like structures, such as hypergraphs. Data manipulations are expressed as graph transformations or operations that directly address typical properties of graphs — paths, adjacency, subgraphs, and connections. The database supports integrity constraints to ensure data consistency, with consistency definitions tied directly to graph structures like node and edge types, attribute domains, and referential integrity of edges.

Graph databases excel at modeling highly connected data, such as social networks, recommendation engines, and knowledge graphs, where relationships between entities are as important as the entities themselves.