Twitter depends heavily on real-time event aggregation. Classic timeseries applications include site traffic, service health, and user engagement monitoring; these are increasingly complemented by a range of products and features that surface aggregated timeseries data directly to end users. Services that power such features need to be resilient enough to ensure a consistent user experience, flexible enough to accommodate a rapidly changing product roadmap, and able to scale to tens of billions of events per day.
Experience has shown that truly robust real-time aggregation services are hard to build; scaling and evolving them gracefully is even harder; and, moreover, many timeseries applications call for essentially the same architecture, with slight variations in the data model. Solving this broad class of problems at Twitter has been a multiyear effort. In previous talks we have introduced Summingbird, a high-level abstraction library for generalized distributed computation, which provides an elegant descriptive framework for complex aggregation problems. In this talk, I will describe how we built a flexible, reusable, end-to-end service architecture on top of Summingbird, called TSAR (the TimeSeries AggregatoR).
TSAR uses Python to provide an service toolkit that integrates with essential services that provide data processing, data warehousing, query capability, observability, and alerting, automatically configuring and orchestrating its components.