Summary: How DoorDash Reduced Feature Store Costs by 75% using CockroachDB

Tanmay Deshpande
3 min readMar 26
Photo by Erik Karits on Unsplash

Recently I came across this article from Doordash — Using CockroachDB to Reduce Feature Store Costs by 75%

Following is the summary of this article.

DoorDash is a food delivery platform that needed a feature store to handle its massive machine-learning growth. The company found that combining different databases could boost efficiency and simplify operations. At first, DoorDash used Redis for its online machine-learning storage, but as the number of ML features increased, it became clear that Redis wasn’t cost-effective or maintenance-friendly. Therefore, DoorDash decided to supplement Redis with another database, CockroachDB. After using CockroachDB to augment its online serving platform, DoorDash reduced its cloud spend per value-stored on average by 75% with a minimal increase in latency.

DoorDash encountered maintenance overheads with large-scale Redis clusters (>100 nodes). Upscaling using native AWS ElastiCache consumed extra CPU, causing latencies to increase and resulting in an indeterminate amount of time required to complete a run. DoorDash had to create its approach to scaling Redis with almost no downtime. DoorDash’s process for upscaling large Redis clusters with zero downtime involves spinning up a Redis cluster with the desired number of nodes from the most recent daily backup. It also involves replaying all writes from the last day on the new cluster, switching over traffic to the new one, and deleting the old one.

Even though CockroachDB had higher latency than Redis for various read/write operations, DoorDash decided to use it because it served as an excellent alternative for various use cases that do not require ultra-low latency and high throughput. CockroachDB has a variety of attributes that make it very desirable from an operational standpoint, including database version upgrades and scaling operations resulting in zero downtime, auto-scaling behavior based on load both at a cluster and a range level, and disk-based storage making the cost of storing high cardinality features much cheaper.

What differentiates CockroachDB from other databases is its unique storage architecture. At a high level, CockroachDB is a Postgres-compatible SQL layer capable of operating across multiple…

Tanmay Deshpande

I write about technology in simple words!