Hi!

I'll leave the comments related to model architecture and behavior to others (more expert than me), I'd like to comment on the process/infrastructure parts :)

On Sat, Sep 23, 2023 at 9:03 PM Strainu <strainu10@gmail.com> wrote:
Hi folks,

So glad to see the old and new ML teams have an open discussion about this subject.

I understand that the team might prefer to have several tickets for different issues, but the discussion about the general approach to the different models is of interest to many people and is more easily digested on email. I would suggest to continue discussing the merits of the current strategy (and not necessarily of a model or another) on email.

I proposed Phabricator tasks because I think that they better target different broad subjects, it is easier to involve specific teams/people and to define the goal of the conversations. In this big email thread we started outlining the migration/deprecation of ORES in favor of Lift Wing, and now we are talking about model architectures and strategies to use for various use cases in the future. I really like the conversation, but if we wanted to be strict a new email thread (with a different subject) should be created, instead of mixing multiple subjects. People interested in the Lift Wing migration wouldn't be able to add comments, or if they did it would become difficult to follow all the discussions.

As stated before, I'll clarify the "deprecation" term mentioned in Wikitech for the various revscoring-based models, but it is not something that is related to the Lift Wing migration (since all models present on ORES are also on Lift Wing). It is a long term and wider project that will happen over the upcoming months/years, and that requires a broader discussion.

This is why I propose to discuss models on Phabricator, rather than Wikitech-l :)
 
On the long run, I believe an unique model good enough can be developed for revert bots. However, it would be great if there were some clear quality criteria that the community can verify and the old models are maintained for a wiki until we are sure the new model passes that criteria on that wiki.

Definitely, I just want to make it clear that the ML team has no intention to force any choice to the community, we are just trying to optimize our infrastructure to serve a wide variety of models and in the process we have to choose the best strategy to follow. On Lift Wing we require that every new model has a model card that explains how it works, how it was trained, best use cases, etc.. For example, these are the API Portal's pages for the two Revert Risk models (they contain the link to model cards):
https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reverted_risk_language_agnostic_prediction
https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_reverted_risk_multilingual_prediction
 
A change in hosting should not be the guiding force in any team's roadmap, but the needs of its users.

I hope that we (as ML) didn't describe our intentions in the wrong way, since our aim is absolutely not to impose anything, but to improve our infrastructure to better serve users in the future (WMF internal use cases and the community). ORES served us well over the years, it was a pioneering project on a topic, ML, that was only discussed in Research papers and some futuristic set of libraries at the time. Big players were already working on it internally, but there was no clear guidance or standards, and over the years stuff like MLOps formed and nowadays they are the de facto standard to operate. We are trying to follow those best practices, because we are convinced that they will surely improve and ease the process to build and publish a model at the WMF.
If you are curious, the ML team worked a lot on documentation, see https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing. We tried as best as we could to make the transition smooth and to highlight new features and improvements for every user.

To summarize - we, as ML team, have created Lift Wing to serve the community and our internal use cases, and we wouldn't remove support or change dramatically how the community operates without a gradual migration path and proposing new solutions first. During the migration to Lift Wing we asked folks to test Revert Risk models, instead of goodfaith/damanging ones, and the solution seems to have suited a lot of use cases. Maybe in the future we'll have a mixture of specialized models for certain wikis, and more "multi-purpose" ones, but finding the right solution will surely involve community feedback and several tries.

Thanks for the feedback!

Luca