Download Citus DB

By clicking download below, you agree that you have read, understand and accept the CitusDB License Agreement.

Thank you, your download has started.

Would you like to subscribe to our future updates?

Operating System

64-bit Edition

32-bit Edition

Ubuntu 10.04+
Debian 6.0
Ubuntu 8.04 - 9.10
Debian 5.0
Fedora 12+
RHEL 6.0+
Fedora 8 - 11
RHEL 5.0 - 5.8
Amazon Machine Image
N/A

Quick Start Guide

The Citus DB installer puts all binaries under /opt/citusdb/2.0, and also creates a data subdirectory to store the newly initialized database's contents. The installer then sets the data directory's owner to the current real user. To install, run the following distribution specific commands.

Ubuntu / Debian
localhost# sudo dpkg --install citusdb-2.0.0-1.amd64.deb
Fedora / Redhat
localhost# sudo rpm --install citusdb-2.0.0-1.x86_64.rpm
Amazon EC2

You can use the AWS Management Console or the ec2-run-instances command to launch ami-cbd741a2. In here, we start up a single node; and we talk about launching multiple nodes later in our documentation.

localhost# ssh -i <private SSH key file> ec2-user@<external hostname>

Setup worker databases

In this guide, we demonstrate a setup that uses multiple database instances on the same node. We use the already installed database as the master, and then initialize two more worker nodes. For a setup with independent worker nodes, please see the documentation page.

localhost# /opt/citusdb/2.0/bin/initdb -D /opt/citusdb/2.0/data.9700
localhost# /opt/citusdb/2.0/bin/initdb -D /opt/citusdb/2.0/data.9701

Configure

We now need to tell the master database about the workers. To do this we append the worker database names to the pg_worker_list file. Here we can also specify the port number on which the workers are listening.

localhost# emacs -nw /opt/citusdb/2.0/data/pg_worker_list.conf

# HOSTNAME     [PORT]     [RACK]
localhost 9700
localhost 9701

Start all databases

Now we can start the databases using pg_ctl, specifying a data directory and a logfile for each database. We start the master database on the default port, and specify ports for the two workers.

localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data -l logfile start
localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data.9700 -o "-p 9700" -l logfile.9700 start
localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data.9701 -o "-p 9701" -l logfile.9701 start

Try with Sample Data

To try things out, we first need to download some example data.

localhost# wget http://examples.citusdata.com/customer_reviews_1998.csv.gz
localhost# gzip -d customer_reviews_1998.csv.gz

Create a distributed table

We then use psql to connect to the master, specifying localhost and the default 'postgres' database.

localhost# /opt/citusdb/2.0/bin/psql -h localhost -d postgres

We can now create a distributed table. We partition the table on the review_date column by specifying the DISTRIBUTE BY APPEND clause.

postgres=# CREATE TABLE customer_reviews
(
    customer_id TEXT not null,
    review_date DATE not null,
    review_rating INTEGER not null,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10) not null,
    product_title TEXT not null,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
)
DISTRIBUTE BY APPEND (review_date);

Load data

We next load data using the STAGE command; this command has the same syntax as PostgreSQL's COPY. Citus DB automatically partitions the data into fixed-size blocks, and replicates these blocks among worker databases.

postgres=# \STAGE customer_reviews FROM '/home/user/customer_reviews_1998.csv' (FORMAT CSV)

Query

We are now ready to start issuing queries against the cluster. For additional example queries, please see the documentation page.

postgres=# SELECT count(*) FROM customer_reviews;

Newsletter

Sign up below to stay in touch with our progress. Your email address is safe, we also hate spam.