News recommendation : how to promote human participation in machine learning algorithm testing?

In December 2020, Octopeek took part in the CiML (Challenges in Machine Learning) workshop, held as part of the 2020 NeurIPS convention. 2020 was the seventh edition of the workshop, and aimed to identify and formalize best practices in testing and evaluation of algorithms, as well as to celebrate innovators using AI in this field. We had the honor of presenting the Renewal platform and the results of our research study carried out in collaboration with CentraleSupélec on the online evaluation of news recommendation systems.

Why developing news recommendation algorithms?

There are only so many hours in a day, and only so much time readers are willing to consecrate to catching up on the news. By filtering content to provide users with information that is pertinent to them and their interests, recommendation systems aim to optimize the number of clicks and stories read

News recommendations can be considered a separate and distinct field of recommendation systems for a few reasons: the nature of the items being recommended (dynamic popularity, rich textual content, large and continuously growing catalog, short lifespan), the users’ consumption habits (the need for fresh news, long-term and short-term interests), as well as the fact that many articles are time- and location-specific.

So, what about Renewal?

The Renewal platform is to news recommender systems what infrastructure is to a city. We provide the roadmap and the lights that protect travelers, while allowing users to build their own vehicles that allow them to get to their destinations. We’ve created a recommendation algorithm that users can build on to improve and test their own systems, with the ability to automatically test and train systems. More than that, we’ve created a mobile application that allows users to test the pertinence of their recommendation algorithms online and in real time, with actual end users. 

We use implicit relevance feedback based on numerous factors, such as click rates and time spent reading an article, rather than explicit feedback from the users. They don’t have to rate an article or give it a like for us to recognize if the recommended story was pertinent to the user. Renewal thus provides a much more realistic feedback system than offline ML models that have traditionally been used by researchers.

It also differs from other online news recommendation ML models, such as NewsREEL, in that it provides results while also allowing for AB testing of multiple systems simultaneously. Indeed, a same user will receive results from multiple algorithms, thus, not only testing the pertinence of the articles presented, but comparing those results to a wholly different algorithm simultaneously. Also, unlike NewsREEL, the favorability score isn’t just based on whether or not an article was clicked, but also how long the user spent interacting with the article. And finally, the end user’s history is stored on the application, allowing for more long-term modeling of their interests.

How will algorithm tests be organized on Renewal?

Renewal will first be tested in a small-scale competition involving students from the university of CentralSupélec, a graduate level engineering school in the greater Paris region of France. The objective of this first test will be to collect feedback on the mobile app, the implementation of recommendation algorithms, and the manner in which end users are implicated in the final evaluation of recommendation algorithms in live competition. 

For the first competition, the students at CentraleSupélec will be both the creators and the end users. Building on the backbone of the Renewal platform, which aims to facilitate the process of creating recommendation algorithms, these students would be served results in real time, without knowing whose algorithm had served the results. Simply by using the platform as any regular end user, the Renewal platform would be able to evaluate which algorithms served the most pertinent results. We will simultaneously be creating a fun, stimulating, competitive and motivational feedback system that will help students measure the results of their algorithms while incentivizing them to outdo themselves and their fellow competitors by offering a daily leaderboard or a grand prize to the team who performs the best. 

Once we’ve reviewed the results of these small scale competitions and optimized the platform, we’ll be ready to move onto other recommendation systems, beyond news, and in professional research settings.

How do we convince users to participate in algorithm tests?

Ensuring better participation in competitions with Renewal is based on two main themes: facilitating the creation of algorithms and gamification. 

A lot of time is wasted in the early stages of creating recommendation systems that have nothing to do with the actual creation of the algorithms–ensuring that your system is correctly connected to the API, and all of the troubleshooting that inevitably comes from that one misplaced semi-colon, not least among the time sinks. With Renewal, we’ve created a platform that completely bypasses these early traps and even provide a basic working algorithm that competitors can use as the framework for their systems. We want to ensure that the participants have everything they need to get started training and testing their systems as soon as possible. 

In that same vein, when it comes to challenge time, each user will be assigned two algorithms that will feed their results, and thus each algorithm need only be trained for a fraction of the participants at any one time, allowing for faster and easier training of the systems, less calls equals less bandwidth and less processing time.

As for the gamification, we like to envision Renewal like chess, or a fighting game. Two systems matched head-to-head in a zero sum game Elo rating system. These systems face off in heads-up competition to see which performs better and are rated based on the results. However, if a participant runs into an issue and is unable to compete, we have a baseline algorithm that will replace them on the day of the competition. The team isn’t penalized for not showing up, but their competitor still has a chance at moving up in the rankings by outperforming the stand-in.

Alongside that, we offer real-time leaderboards and a reward system for the team that creates the best algorithm. And let’s not forget the team aspect. This means you have more than just yourself to rely on and you’re able to bounce ideas off your teammates, but it also means you can enjoy the competitive team spirit and celebrate your wins and lament your losses together.

We’ve been able to create an easy and powerful platform to improve research studies and algorithm training and testing, and we’ve only just begun. Soon, the Renewal platform will host a competition between students from CentraleSupelec for the creation of the best recommendation algorithm. This device can then be deployed at the level of the research community, in addition to traditional evaluation systems.