Contribute Media
A thank you to everyone who has made this possible: Read More

About the analyst and silver bullets

Description

In my talk I will touch upon Rambler/top-100 relaunch, available instruments for realime analytics' system developement and about our experience of exchanging batch-calculations for realtime pipelines. I'll introduce you to the architecture we used for both data processing approaches and will pay attention to all the components. Then we'll duscuss peculiar properties of python that matter when implementing data procession algorithms for HIVE, touch upon fundamental problems of storing aggregates as well as pros and cons of alternative approach. I'll bring you details on how we use Spark for changing sessions calculation, we'll discuss emerging issues and existing solutions. In the end i'll show you our results of performance tests and will introduce you to several pitfalls we've met on our way.

Details

Improve this page