> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/donnemartin/system-design-primer/llms.txt
> Use this file to discover all available pages before exploring further.

# Database Overview

> Overview of database systems including RDBMS, NoSQL, and SQL vs NoSQL considerations

<img src="https://camo.githubusercontent.com/e45e39c36eebcc4c66e1aecd4e4145112d8e88e3/687474703a2f2f692e696d6775722e636f6d2f586b6d35435873e2809c.706e67" alt="Database Scaling" />

## Relational Database Management System (RDBMS)

A relational database like SQL is a collection of data items organized in tables.

**ACID** is a set of properties of relational database [transactions](https://en.wikipedia.org/wiki/Database_transaction).

* **Atomicity** - Each transaction is all or nothing
* **Consistency** - Any transaction will bring the database from one valid state to another
* **Isolation** - Executing transactions concurrently has the same results as if the transactions were executed serially
* **Durability** - Once a transaction has been committed, it will remain so

There are many techniques to scale a relational database: **master-slave replication**, **master-master replication**, **federation**, **sharding**, **denormalization**, and **SQL tuning**.

### Scaling Techniques

* [Master-slave replication](/database/rdbms#master-slave-replication)
* [Master-master replication](/database/rdbms#master-master-replication)
* [Federation](/database/rdbms#federation)
* [Sharding](/database/rdbms#sharding)
* [Denormalization](/database/rdbms#denormalization)
* [SQL tuning](/database/rdbms#sql-tuning)

## NoSQL

NoSQL is a collection of data items represented in a **key-value store**, **document store**, **wide column store**, or a **graph database**. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor [eventual consistency](/topics/consistency-patterns#eventual-consistency).

**BASE** is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses availability over consistency.

* **Basically available** - the system guarantees availability.
* **Soft state** - the state of the system may change over time, even without input.
* **Eventual consistency** - the system will become consistent over a period of time, given that the system doesn't receive input during that period.

### NoSQL Database Types

#### Key-Value Store

**Abstraction:** hash table

A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in [lexicographic order](https://en.wikipedia.org/wiki/Lexicographical_order), allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.

Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.

**Examples:** Redis, Memcached, DynamoDB

#### Document Store

**Abstraction:** key-value store with documents stored as values

A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself.

Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.

Some document stores like [MongoDB](https://www.mongodb.com/mongodb-architecture) and [CouchDB](https://blog.couchdb.org/2016/08/01/couchdb-2-0-architecture/) also provide a SQL-like language to perform complex queries. [DynamoDB](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf) supports both key-values and documents.

**Examples:** MongoDB, CouchDB, DynamoDB

#### Wide Column Store

<img src="https://camo.githubusercontent.com/823668b07b4bff50574e934273c9244e4e5017d6/687474703a2f2f692e696d6775722e636f6d2f6e3136694f476b2e706e67" alt="Wide Column Store" />

**Abstraction:** nested map `ColumnFamily<RowKey, Columns<ColKey, Value, Timestamp>>`

A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.

Google introduced [Bigtable](http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf) as the first wide column store, which influenced the open-source [HBase](https://www.edureka.co/blog/hbase-architecture/) often-used in the Hadoop ecosystem, and [Cassandra](http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html) from Facebook.

Wide column stores offer high availability and high scalability. They are often used for very large data sets.

**Examples:** Bigtable, HBase, Cassandra

#### Graph Database

<img src="https://camo.githubusercontent.com/e302949a673c7c3e7f83c0b7c1b1a7308f382e6f/687474703a2f2f692e696d6775722e636f6d2f664e636c3635672e706e67" alt="Graph Database" />

**Abstraction:** graph

In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.

Graphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.

**Examples:** Neo4j, FlockDB

## SQL or NoSQL

<img src="https://camo.githubusercontent.com/c3b8e6a6e4c9e6f3c0e0e6e5e6e5e6e5e6e5e6e5/687474703a2f2f692e696d6775722e636f6d2f77586747713566e2809c.706e67" alt="SQL vs NoSQL" />

### Reasons for SQL:

* Structured data
* Strict schema
* Relational data
* Need for complex joins
* Transactions
* Clear patterns for scaling
* More established: developers, community, code, tools, etc
* Lookups by index are very fast

### Reasons for NoSQL:

* Semi-structured data
* Dynamic or flexible schema
* Non-relational data
* No need for complex joins
* Store many TB (or PB) of data
* Very data intensive workload
* Very high throughput for IOPS

### Sample Data Well-Suited for NoSQL:

* Rapid ingest of clickstream and log data
* Leaderboard or scoring data
* Temporary data, such as a shopping cart
* Frequently accessed ('hot') tables
* Metadata/lookup tables

<Note>
  **SQL vs NoSQL Trade-off:** Choose SQL for structured data with complex relationships and ACID requirements. Choose NoSQL for flexible schemas, horizontal scalability, and high-throughput workloads.
</Note>

## Source(s) and Further Reading

### RDBMS

* [Scalability, availability, stability, patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns/)
* [Multi-master replication](https://en.wikipedia.org/wiki/Multi-master_replication)
* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)

### NoSQL

* [Explanation of base terminology](http://stackoverflow.com/questions/3342497/explanation-of-base-terminology)
* [NoSQL databases a survey and decision guidance](https://medium.com/baqend-blog/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.wskogqenq)
* [Introduction to NoSQL](https://www.youtube.com/watch?v=qI_g07C_Q5I)
* [NoSQL patterns](http://horicky.blogspot.com/2009/11/nosql-patterns.html)

### SQL vs NoSQL

* [Scaling up to your first 10 million users](https://www.youtube.com/watch?v=kKjm4ehYiMs)
* [SQL vs NoSQL differences](https://www.sitepoint.com/sql-vs-nosql-differences/)
