Quantifying the similarity between items is a fundamental challenge in building recommender systems. However, many of the standard formulas used today are based on heuristics that might work in practice but lack a clear theoretical explanation. This reliance on trial-and-error makes it difficult to understand why certain methods perform better than others or how to improve them systematically. This seminar presents a principled approach to designing and understanding similarity measures by grounding them in the formal theory of probabilistic modelling and parameter estimation.

By adopting this perspective, we show how two new similarity measures can be derived from the Bernoulli and Multinomial distributions. We will discuss the importance of probabilistic smoothing and why choosing the right probability distribution is essential for accuracy. Finally, we demonstrate how the widely used Cosine Similarity can be reframed as the solution to a specific optimization problem. This transition from intuitive shortcuts to formal theory provides a new lens through which to view classical tools and opens a path for more rigorous research in the field.

Speaker

Noah Daniëls is a PhD student in our department of computer science.

Time and Place

Wednesday 25/02/2026 at 13:45pm in M.G.006.

Registration

Participation is free, but registration is compulsory.

References and Related Reading

  • https://dl.acm.org/doi/10.1145/963770.963776
  • https://dl.acm.org/doi/10.1145/3308558.3313710
  • https://www.semanticscholar.org/paper/A-comparison-of-event-models-for-naive-bayes-text-McCallum-Nigam/04ce064505b1635583fa0d9cc07cac7e9ea993cc

And optionally the Recommender systems: the textbook chapters 2 and 3, but the essentials are covered in the first paper linked above.