Building content similarity recommenders at BBC

The BBC World Service publishes News in over 33 languages for a total of 44 services. In order to surface new and relevant content for our audiences in different countries we developed a content similarity recommender in-house. Articles of each service are manually annotated with tags and additional metadata, however, there exists a lack of consistency between the different tagging spaces, which affects the performance of the recommender systems. Furthermore, creating bespoke recommenders for each language is challenging, especially when not mastering the language in question.

In this talk, we show how an unsupervised learning algorithm, Latent Dirichlet Allocation (LDA), provides a better methodology to address those challenges compared to previously employed strategies. We also present how we build content similarity recommender systems based on LDA for several languages, explain the metrics that allow us to select a good model and how we assess the stability of the model through time.

Clara Higuera
Data scientist, BBC

Clara Higuera carried out her PhD in Bioinformatics and Artificial Intelligence in Madrid, Spain. About a year ago she joined BBC and now she works in BBC News in the Audience Engagement team as data scientist helping to better understand BBC News audience and build data products with the help of data and ML.

Michel Schammel 
Data scientist, BBC

Michel Schammel is an astrophysicist turned data scientist who is now focussing on online user journeys for the BBC World Service. He has previously worked on content recommendations based on automatic metadata extraction from audio and video as well as some computer vision applications for automatic player pose recognition in football.