By clicking download below, you agree that you have read, understand and accept the CitusDB License Agreement.
The Citus DB installer puts all binaries under /opt/citusdb/2.0, and also creates a data subdirectory to store the newly initialized database's contents. The installer then sets the data directory's owner to the current real user. To install, run the following distribution specific commands.
Ubuntu / Debianlocalhost# sudo dpkg --install citusdb-2.0.0-1.amd64.debFedora / Redhat
localhost# sudo rpm --install citusdb-2.0.0-1.x86_64.rpmAmazon EC2
You can use the AWS Management Console or the ec2-run-instances command to launch ami-cbd741a2. In here, we start up a single node; and we talk about launching multiple nodes later in our documentation.
localhost# ssh -i <private SSH key file> ec2-user@<external hostname>
In this guide, we demonstrate a setup that uses multiple database instances on the same node. We use the already installed database as the master, and then initialize two more worker nodes. For a setup with independent worker nodes, please see the documentation page.
localhost# /opt/citusdb/2.0/bin/initdb -D /opt/citusdb/2.0/data.9700 localhost# /opt/citusdb/2.0/bin/initdb -D /opt/citusdb/2.0/data.9701
We now need to tell the master database about the workers. To do this we append the worker database names to the pg_worker_list file. Here we can also specify the port number on which the workers are listening.
localhost# emacs -nw /opt/citusdb/2.0/data/pg_worker_list.conf # HOSTNAME [PORT] [RACK] localhost 9700 localhost 9701
Now we can start the databases using pg_ctl, specifying a data directory and a logfile for each database. We start the master database on the default port, and specify ports for the two workers.
localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data -l logfile start localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data.9700 -o "-p 9700" -l logfile.9700 start localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data.9701 -o "-p 9701" -l logfile.9701 start
To try things out, we first need to download some example data.
localhost# wget http://examples.citusdata.com/customer_reviews_1998.csv.gz localhost# gzip -d customer_reviews_1998.csv.gz
We then use psql to connect to the master, specifying localhost and the default 'postgres' database.
localhost# /opt/citusdb/2.0/bin/psql -h localhost -d postgres
We can now create a distributed table. We partition the table on the review_date column by specifying the DISTRIBUTE BY APPEND clause.
postgres=# CREATE TABLE customer_reviews
(
customer_id TEXT not null,
review_date DATE not null,
review_rating INTEGER not null,
review_votes INTEGER,
review_helpful_votes INTEGER,
product_id CHAR(10) not null,
product_title TEXT not null,
product_sales_rank BIGINT,
product_group TEXT,
product_category TEXT,
product_subcategory TEXT,
similar_product_ids CHAR(10)[]
)
DISTRIBUTE BY APPEND (review_date);
We next load data using the STAGE command; this command has the same syntax as PostgreSQL's COPY. Citus DB automatically partitions the data into fixed-size blocks, and replicates these blocks among worker databases.
postgres=# \STAGE customer_reviews FROM '/home/user/customer_reviews_1998.csv' (FORMAT CSV)
We are now ready to start issuing queries against the cluster. For additional example queries, please see the documentation page.
postgres=# SELECT count(*) FROM customer_reviews;