Citus swirl

Pex Powers Ultra-Fast, Accurate Content Search With Citus and Google Cloud

80B
rows updated/day
1280
cores

Movie studios and music labels spend billions of dollars each year to create content. Much of this content makes its way to consumers via online channels, so understanding the true reach of their viral work, beyond their own social accounts, represents a big challenge.

That’s why Pex exists: to help content creators understand the true reach of their content and optimize revenue streams, as well as identify and combat copyright infringement. “Pex works like Google image search or Shazam, but for video and audio content,” explains Rasty Turek, founder and CEO of Pex. “Our clients provide us with their original content and ask us to find it anywhere it appears on the web.”

To find content wherever it may be, Pex analyzes immense volumes of data each day. “We process every single video and music track that is published online, in real time—so in terms of the amount of data we process, we are larger than YouTube, Facebook, Instagram, Spotify, Snapchat, and many others combined,” says Turek. “The scale of our data pipeline is just massive.”

Pex diagram
How Pex works

Wanted: The right combination of scale and speed

Performing analytics on multimedia content across multiple platforms requires some heavy lifting within Pex’s technology stack. “We need to run an OLTP workload at a massive scale,” says Turek. “Over the years, we’ve tried just about every database you can think of. We started on Mongo because it was the easiest to start with. It was the one that you don’t have to touch too much for it to work. But we outgrew that within a month.”

Next, the company tried Cassandra. “We were struggling on it for months before we asked DataStax to come and look at the settings,” says Turek. “They basically told us, ‘This is maybe not the right database for you.’”

Pex then tried vanilla Postgres on a single node. However, even factoring in various Postgres optimization techniques, the Pex team found they were pushing the boundaries of scalability.

Next, the company tested a combination of Hadoop and HBase. “It scaled fine for a while,” says Turek. “Then we discovered that companies storing a similar volume of data were using clusters that were ten times larger. We realized there was no way to move forward.”

Pex screenshot
The Citus database powers Pex’s unique audio and visual recognition technology

Citus provides horizontal scalability to update 80 billion rows per day

Finally, Pex discovered Citus in May 2016 in a blog post. “I read a comment about Citus for horizontal scaling,” says Turek. “I thought, wait a minute. Postgres can scale horizontally? What the hell? Just hearing ‘Citus’ and ‘Postgres’ and that combination of words kind of woke me up.”

Turek downloaded the open source version of the Citus database and experimented with it for a while before connecting directly with the Citus team. “The team at Citus helped us structure our very first deployment,” says Turek.

Pex started using Citus on Google Cloud as a gateway for its incoming data. The company extracts multimedia content, which can be audio or visual or both. They fingerprint the data, then store those fingerprints. Pex also extracts metadata—view count, author, upload date, how many likes, how many dislikes, and so on.

“Today we update all 9.4 billion videos in our database at least once within 24 hours,” says Turek. “Since some videos are updated several times a day, we ingest about 80 billion rows of data a day. It’s not only massive amounts of data, but it’s massive amounts of traffic.”

Pex also precalculates certain analytics through a series of cascading triggers. “Even today, we’re seeing incredible performance,” says Turek. “We are currently writing 60,000 rows a second on a 20-node cluster with all of those calculations in place. It’s very hard to scale to those numbers in the NoSQL world.”

Turek sees Citus as the foundation of smooth continued growth for Pex. The company’s database is currently 80TB, and the company doesn’t have to worry about whether they can add more nodes to the Citus database cluster in the future. “With Citus, we can scale almost infinitely with the benefits of Postgres underneath,” says Turek.

Turek frequently shares his experience with other founders. “Our investors sometimes connect us with companies that are having a hard time figuring out how to scale,” Turek says. “For me, it’s always the same answer: Citus.”

Without the combination of Citus, Postgres, and Google Cloud, Pex would not be around. We would not be able to fulfill our data processing needs with any other combination of services. And we tried! We really, really tried.
Rasty Turek, Founder and CEO, Pex

A consistent source of truth supports fast, accurate analytics

In a database with as many nodes as the one Pex maintains, consistency can be a challenge. “Knowing that all the data is correct at all times is harder than it sounds,” says Turek. “When we run an OLTP workflow across multiple nodes, we need to know that all the data we’re working with is fully updated and has been propagated properly.”

Citus provides Pex with the consistent source of truth it needs to deliver fast, accurate analytics for its clients. “Unlike other databases, Postgres is structured to make sure that data is actually fully written before the database calls it written,” says Turek. “There is no in-between. There is no such thing as half of the data being processed, while the other half was not.”

We bring in 80 billion rows of data a day. It’s not only massive amounts of data, but also massive amounts of traffic. We came to an industry that claimed that what we are doing is impossible. People laughed at us and said, ‘This is BS. It’s just not going to happen.’ To this day, people say that what we do is magic.
Rasty Turek, Founder and CEO, Pex

Pex delivers better, faster analytics than native content platforms

Customers rely on Pex for answers to complex questions like how many YouTube views were received in a specific category of music during a certain time frame versus the same category on Facebook. Pex calculates the answers within seconds. “This type of cross-platform calculation is something that the individual platforms can’t deliver, since they only have access to their own data,” says Turek. “But Pex searches across all platforms.”

When Pex launched, the company frequently encountered resistance from skeptics who simply couldn’t believe it was possible to produce analytics at the scale and speed the company was offering. “We came to an industry that claimed that what we are doing is impossible,” says Turek. “People laughed at us and said, ‘This is BS. It’s just not going to happen.’ To this day, people say that what we do is magic.”

Turek attributes Pex’s unique capabilities to its technology stack and unique application architecture. “Without the combination of Citus, Postgres, and Google Cloud, Pex would be not around,” he says. “We would not be able to fulfill our data processing needs with any other combination of services. And we tried! We really, really tried.”

Customers ask how many weeks they have to wait for results, but with Citus it takes us roughly three minutes. The difference is so striking compared to our competitors, customers usually struggle to understand how we can do it.
Rasty Turek, Founder and CEO, Pex

Citus assures worry-free database operations

When asked what advice he has for CTOs and Architects who are planning to grow and scale their applications, Turek emphasized the fact that Citus takes the worry out of database operations. “If you use the Citus Cloud database, even the very smallest cluster, then you don’t have to care about your database anymore,” says Turek. “I think that is a pretty powerful thing where you don’t have to care about the most important thing for you, which is your database. Not caring about it in the sense where you don’t have to be fearful if you’re going to lose data or if it’s going to scale when you need it to.”

The Citus database enables Pex to move hundreds of thousands of rows into Postgres within seconds, then start showing the data almost immediately with pre-calculated results. Says Turek, “Citus is built on top of the most powerful SQL engine that is out there: Postgres. So, if you already know how to utilize very complex and powerful engines like Postgres, including all the Postgres extensions, then I don’t think you can go wrong with Citus ever—and I think Citus is going to give you such an ease of mind. It’s ridiculous, right?”

Teamwork makes the dream work

Although the Citus database is an important contributor to Pex’s success, Turek believes it isn’t the whole story.

“The other half is the incredible team,” he says. “Everyone I’ve interacted with at Citus has been not only the nicest person I ever met, but also smart and incredibly helpful. We’ve gotten so much support over the years. Without that help, I don’t know if Pex would be where it is today.”

“Citus is everything for us. It’s the center of our brain.”

Everyone I’ve interacted with at Citus has been not only the nicest person I ever met, but also smart and incredibly helpful. We’ve gotten so much support over the years. Without that help, I don’t know if Pex would be where it is today.
Rasty Turek, Founder and CEO, Pex
Pex & Citus Data Story Highlights

pex.com

  • 80 billion rows of data updated per day
  • Query results delivered in three minutes—not weeks—thanks to the massive parallelism of Citus
  • Managing a 20-node Citus database cluster on Google Cloud with 2.4TB memory, 1280 cores, & 80TB of data

We bring in 80 billion rows of data a day. It’s not only massive amounts of data, but also massive amounts of traffic. We came to an industry that claimed that what we are doing is impossible. People laughed at us and said, ‘This is BS. It’s just not going to happen.’ To this day, people say that what we do is magic.

Rasty Turek, Founder and CEO at Pex

About Pex

Pex helps content creators, brands, rights owners, and analysts to find, track, and monetize their videos and music across the web. Through the company’s powerful audio and video recognition technology, creators can view how copies of their content appear on all major streaming platforms in real-time, and discover in-depth analytics on the true reach of their content. Armed with this data, Pex clients can optimize their social presence, understand how content travels on each social media platform, and automatically resolve copyright infringement.

pex.com