pg_shard

pg_shard is currently deprecated. If you're looking for a sharding solution, please check out the newly released and open source Citus. Citus natively integrates pg_shard and offers a superset of its functionality.

pg_shard is currently deprecated. If you're looking for a sharding solution, please check out the newly released and open source Citus. Citus natively integrates pg_shard and offers a superset of its functionality.

pg_shard

pg_shard is a sharding extension for PostgreSQL. It shards and replicates your PostgreSQL tables for horizontal scale and high availability. The extension also seamlessly distributes your SQL statements, without requiring any changes to your application. Join the mailing list (link is external) to stay on top of the latest developments or to send any questions or feedback.

As a standalone extension, pg_shard addresses many NoSQL use cases. It also enables real-time analytics, and has an easy upgrade path to Citus for complex analytical workloads (distributed joins). Further, the extension provides access to standard SQL tools, and powerful PostgreSQL features, such as diverse set of indexes and semi-structured data types.

building

pg_shard runs on Linux and OS X. The extension works with PostgreSQL 9.3.4+, PostgreSQL 9.4.0+, and CitusDB 3.2+.

Once you have PostgreSQL or CitusDB installed, you're ready to build pg_shard. For this, you will need to include the pg_config directory path in your make command. This path is typically the same as your PostgreSQL installation's bin/ directory path. For example:


# Path when PostgreSQL is compiled from source
PATH=/usr/local/pgsql/bin/:$PATH make
sudo PATH=/usr/local/pgsql/bin/:$PATH make install

# Path when CitusDB package is installed
PATH=/opt/citusdb/4.0/bin/:$PATH make
sudo PATH=/opt/citusdb/4.0/bin/:$PATH make install

pg_shard also includes regression tests. To verify your installation, start your PostgreSQL instance with the shared_preload_libraries setting mentioned below, and run make installcheck.

Upgrading from Previous Versions

To upgrade an existing installation, simply:

  • Build and install the latest pg_shard release (see the Building section)
  • Restart your PostgreSQL server
  • Run ALTER EXTENSION pg_shard UPDATE; on the PostgreSQL server

Note that taking advantage of the new repair functionality requires that you also install pg_shard on all your worker nodes.

Setup

pg_shard uses a master node to store shard metadata. In the simple setup, this node also acts as the interface for all queries to the cluster. As a user, you can pick any one of your PostgreSQL nodes as the master, and the other nodes in the cluster will then be your workers.

An easy way to get started is by running your master and worker instances on the same machine. In that case, each instance will be one PostgreSQL database that runs on a different port. You can simply use localhost as the worker node's name in this setup. Alternatively, you could start up one PostgreSQL database per machine; this is more applicable for production workloads. If you do this, you'll need to configure your PostgreSQL instances so that they can talk to each other. For that, you'll need to update the listen_addresses setting in your postgresql.conf file, and change access control settings in pg_hba.conf.

Whatever you decide, the master must be able to connect to the workers over TCP without any interactive authentication. In addition, a database using the same name as the master's database must already exist on all worker nodes.

Once you decide on your cluster setup, you will need to make two changes on the master node. First, you will need to add pg_shard to shared_preload_libraries in your postgresql.conf:


shared_preload_libraries = 'pg_shard'    # (change requires restart)

Second, the master node in pg_shard reads worker host information from a file called pg_worker_list.conf in the data directory. You need to add the hostname and port number of each worker node in your cluster to this file. For example, to add two worker nodes running on the default PostgreSQL port:

$ emacs -nw $PGDATA/pg_worker_list.conf

# hostname port-number
worker-101  5432
worker-102  5432

Then, you can save these settings and restart the master node.

Table Sharding