Siddharth Singh currently works as an Engineering Manager in the data infrastructure group at LinkedIn where he is responsible for Voldemort and Venice projects. Previously at LinkedIn, he worked as an engineer where he architected and built Voldemort's multi data center strategy. Before LinkedIn, Siddharth worked on the core MongoDB database where he contributed to replication and sharding technology. Siddharth as a MS from Purdue University with a focus on distributed database systems.
The data-derived recommendations that power LinkedIn's site consists of a continuous chain of three phases: data collection, processing and serving. Due to the dynamic nature of the professional graph both, data collection and processing, methods have evolved to be able to quickly incorporate new signals into the derived features. This evolution, in the data collection and processing methods, has created a need for specialized storage infrastructure that can support this dynamic cycle.
This presentation will uncover the technology behind systems that specialize in serving recommendation dataset at LinkedIn and it's evolution from the early days of serving batch only derived data (Project Voldemort) to it's current novel way of serving derived data computed from any source, batch or nearline (Project Venice). The presentation will discuss architectural choices and tradeoffs in building these systems and share some thoughts on the future direction of this space.