Contribute Media
A thank you to everyone who makes this possible: Read More

Experiments in data mining, entity disambiguation and how to think data-structures for designing beautiful algorithms


This session is for people who would like to move from pure play analytics to heavy data lifting and data engineering applications. With a background in Computer science and Quantitative economics, I blend these two approaches to bring quick wins(and Fail cheaper) to solve some of most interesting problems around us today- for fun and profit.

The four problems I will discuss are around –

  1. Scoring your connections to find your net worth in LinkedIn [Data structures, Algorithms , Data mining & Visualization]
  2. Building your custom distance metrics (with Graphs as base data-structure) and finger keying – with applications for hand-collected data-sets, susceptible to entirely different distance metrics, than traditionally explored [Algorithms]
  3. Building relevant Job feeds in Linkedin (TF-IDF) and (possibly) hacking around Job applications [Relevance Algorithms, Linguist pre-processing & visualization]
  4. Developing a custom batch aggregation platform - with competing system goals [developing scoring metrics, concept of centrality & formal constructs to measure recency]
Most of all you will learn the why and why not of things, that I will keep coming back to as I discus the problems above. This I feel is the most important differentiator between coding well and coding for scale, and helps build a structured thought process to problem solving.
By the end of the session, you will have seen a blend of tools ranging from Python (Algorithms & data mining & Graph Theory) , Python, R & ink space (Graphical representation & visualization)

The session will be heavy on algorithms, and thinking data structures, so to make most of this, you need – a background/Interest in Computer science and Quantitative vigor with some hands-on coding experience, and a mind that wants to learn more.


Improve this page