Column vs row-oriented databases

The key difference between column and row-oriented databases is the way values are stored. Column-oriented databases store all the values for a given column in the same place, while row-oriented databases store all the values in a row (no matter the number of columns) together.

This means that if one needs to query a lot of rows but only a few columns, using a database that keeps columns stored together can provide a huge performance advantage, as data can be retrieved based only on the columns that need to be projected. If we were to use a row-oriented database, since each row is stored together we would always have to go over the entire row even when not required.

Let’s consider the following example:

id	name	age
1	john	32
2	alexa	29
3	dinis	25

Row-oriented database

In a row-oriented database, data is stored row by row, i.e. the data is represented in the disk by a sequential list of rows, like the following example:


1	john	32	2	alexa	29	3	dinis	25

This is usually how traditional databases like Postgres and MySQL work and it works quite well for OLTP (Online Transactional Processing) applications, as they can provide good performance for writes - since writing new data is just a matter of appending a new row after the last one.

When it comes to reads, it depends on whether we need to read the entire row or just a few columns - i.e. when dealing with complex operations usually the entire row has to be kept in memory even though only part of its columns are needed.

Column-oriented database

Column-oriented databases, on the other hand, were created to support fast reads (and ad-hoc queries) even if that requires compromising writes as we will see. This is why they are usually used by OLAP (Online Analytical Processing) applications. In a column-oriented database, the values for a given column are grouped, like the following example:


1	2	3	john	alexa	dinis	32	29	25

This means that adding a new row forces the database to add a new value after all the values for each of the columns the table has - this can be simplified by having each column stored in a different file/disk.

When it comes to reads, this database is more appropriate for fast reads as it’s a lot easier to go over the values for a specific column without having to go through all the other columns.