PostgreSQL Columnar Store for Analytic Workloads

We are excited to open source our columnar store extension for PostgreSQL and share it with the community! Columnar stores bring notable benefits for analytic workloads, where data is loaded in batches.

This columnar store extension uses the Optimized Row Columnar (ORC) format for its data layout. ORC improves upon the RCFile format developed at Facebook, and brings the following benefits:

  • Compression: Reduces in-memory and on-disk data size by 2-4x. Can be extended to support different codecs.
  • Column projections: Only reads column data relevant to the query. Improves performance for I/O bound queries.
  • Skip indexes: Stores min/max statistics for row groups, and uses them to skip over unrelated rows.

Further, we used the Postgres foreign data wrapper APIs and type representations with this extension. This brings:

  • Support for 40+ Postgres data types. The user can also create new types and use them.
  • Statistics collection. PostgreSQL's query optimizer uses these stats to evaluate different query plans and pick the best one.
  • Simple setup. Create foreign table and copy data. Run SQL.

It's worth noting that the columnar store extension is self-contained. If you're a PostgreSQL user, you can get the entire source code and build using the instructions on our GitHub page. You can even join columnar store and regular Postgres tables in the same SQL query.

Now, let's see how all this fits together with an example. For this, we start by downloading customer review data from Amazon for the year 1998. In this table, each review represents an event. We find that events or fact tables, or tables that have a large number of columns are good fits for the columnar store.

wget http://examples.citusdata.com/customer_reviews_1998.csv.gz
gzip -d customer_reviews_1998.csv.gz

Next, let's connect to the PostgreSQL database and create a foreign table.

CREATE EXTENSION cstore_fdw;
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;

CREATE FOREIGN TABLE customer_reviews
(
    customer_id TEXT,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
)
SERVER cstore_server
OPTIONS(filename '/opt/citusdb/3.0/cstore/customer_reviews.cstore',
        compression 'pglz');

Note that we specified the compression method as the LZ-class algorithm that's already built into PostgreSQL. The cstore extension will use this algorithm to compress column values in groups of 10K; and this group size is also configurable.

Next, we can use PostgreSQL's copy command to load data into the columnar store.

COPY customer_reviews FROM '/home/user/customer_reviews_1998.csv' WITH CSV;

Finally, let's run an example SQL query on the column store table.

-- Do we have a correlation between a book's title's length and its review ratings?
SELECT
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    count(*)
FROM
    customer_reviews
WHERE
    product_group = 'Book'
GROUP BY
    title_length_bucket
ORDER BY
    title_length_bucket;

This example shows that the columnar store extension works nicely with your PostgreSQL database. The natural follow up question is, what are the benefits? To quantify them, we ran some preliminary numbers using the industry standard TPC-H benchmark.

We started by measuring compression ratios for the tables in this benchmark and found that they compressed by 3.5x with pglz. This means that you can now fit 3.5x of your working set into memory. Also, when going to disk, you read proportionally less data. Finally, if you're using SSDs, you save notably from storage costs.

After looking at compression ratios, we next measured query run times on an m1.xlarge instance with rotational disks. We also flushed the page cache before each test to see the impact on disk I/O. Further, we ran Analyze on each foreign table so that PostgreSQL has the statistics it needs to choose the best query plan.

Table Type TPC-H 3 TPC-H 5 TPC-H 6 TPC-H 10
PostgreSQL 42.4 42.3 33.7 42.7
cstore 24.7 23.5 14.5 23.9
cstore (PGLZ) 22.1 22.3 12.6 20
4GB data using PostgreSQL 9.3 on m1.xlarge

This first diagram shows four representative queries from the TPC-H benchmark, and highlights query run-times improving by about 2x. It's worth noting that the tables in this benchmark have 10-15 columns each and we see these benefits grow as the number of columns increase to 100s.

The second diagram focuses on disk I/O, and shows total volume of data read from disk with regular Postgres tables, cstore tables, and cstore tables with pglz compression. The benefits of reading only relevant columns are more apparent here. With pglz, cstore reduces data read from disk by more than 10x.

Table Type TPC-H 3 TPC-H 5 TPC-H 6 TPC-H 10
PostgreSQL 4444 4444 3512 4433
cstore 786 754 756 869
cstore (PGLZ) 322 346 269 302
4GB data using PostgreSQL 9.3 on m1.xlarge

Besides all this, we are really excited by how we can further improve upon this extension! Some features that we're thinking about include new compression methods, better query cost estimation, improved read/write performance, and checksums. We outline these features in our GitHub page and welcome your opinions.

Finally, if you start using the columnar store and need for it to scale out, CitusDB could easily take care of that. Citus extends Postgres to provide distributed queries across a cluster; and we're actively working on improving it. We also have a few cool projects in the works around cstore and more. So please do get in touch with us at engage @ citusdata.com to tell us what you think!

Got comments? Join the discussion on Hacker News.

CitusDB 3.0 release notes

We are excited to announce the general availability of CitusDB v3.0. Apart from various bug fixes and performance improvements, the major features in this release include:

  • Large table joins. We now support dynamic repartitioning of tables to handle joins between any number of tables, independent of their size and partitioning method. The query planner determines the ideal join strategy from statistics gathered from the partitioned tables.
  • Integration with PostgreSQL 9.3.2. This comes with all the new Postgres features and improvements, some of which include:
    • Writable foreign tables
    • New operators and functions to extract values from JSON types
    • Reduced System V shared memory requirements
  • Appends to existing shards via a UDF. This allows smaller, incremental updates to a distributed table by appending data to existing shards rather than creating new ones.
  • Support for the array_agg aggregate function. Note that we currently do not support the order-by or distinct clauses within the aggregate function.
  • Deletion of data based on partition key ranges via a UDF.

Installation notes

CitusDB 3.0 is based on PostgreSQL 9.3.2, and is binary incompatible with CitusDB 2.0. As a result, the install path for the new version has changed from /opt/citusdb/2.0 to /opt/citusdb/3.0 in order to allow for multiple versions of Citus DB to be installed in parallel.

If you are running CitusDB 2.0 and want to upgrade to v3.0, you will need to run pg_upgrade on your all nodes, and manually reload your distributed metadata. For specifics, please get in touch with us so we can assist with the upgrade process. If you are downloading and using CitusDB for the first time, please follow the installation instructions found in the CitusDB documentation page.

Page 1 of 7

About

CitusDB is a scalable analytics database that's built on top of PostgreSQL.

In this blog, we share our ideas and experiences on databases and distributed systems.