How to Take Prometheus Planet-Scale: Massively Large Scale Metrics Deployments

How to Take Prometheus Planet-Scale: Massively Large Scale Metrics Deployments

Wonder Frog Wed 05:20PM - 06:00PM

Observability at eBay has been on an exponential growth curve. What was a low 2M/sec ingest rate of time series in 2017 is now roughly 40M/sec with active time series close to three billion. Our current cortex-inspired architecture of Prometheus builds sharding and clustering on top of the Prometheus TSDB. It's relatively simple to shard/replicate tenants of data in centralized clusters. However, large clusters with growing cardinality become less useful as query latencies degrade considerably. In 2020, Google published a paper on its time-series database Monarch, dubbed a planet-scale TSDB. The paper gave us some useful hints on how we could decentralize our installations and go fully planet scale. We started with a prototype to federate queries to TSDBs from different cities. Now, it lets us deploy our TSDBs anywhere using Kubernetes operators and Prometheus. This session focuses on the planet-scale architecture of our metrics platform, how GitOps has facilitated absorbing the complexity of massive deployment, and more.

Add to calendar
Scroll to Top

THANK YOU!

Thank you for inquiring about sponsoring swampUP 2023. We’ll be in touch shortly!

Sponsoring swampUP 2023