The data-derived recommendations that power LinkedIn's site consists of a continuous chain of three phases: data collection, processing and serving. Due to the dynamic nature of the professional graph both, data collection and processing, methods have evolved to be able to quickly incorporate new signals into the derived features. This evolution, in the data collection and processing methods, has created a need for specialized storage infrastructure that can support this dynamic cycle.
This presentation will uncover the technology behind systems that specialize in serving recommendation dataset at LinkedIn and it's evolution from the early days of serving batch only derived data (Project Voldemort) to it's current novel way of serving derived data computed from any source, batch or nearline (Project Venice). The presentation will discuss architectural choices and tradeoffs in building these systems and share some thoughts on the future direction of this space.