AI

ai@lists.wikimedia.org

1 participants
265 discussions

Scoring Platform Team update
by Aaron Halfaker 03 Jun '17

03 Jun '17

Hey folks, I'll be starting to post updates here on our new blog[1], but if you'd prefer to be notified via the mailing lists we used to post to, that's OK. I'll make sure that the highlights and the link to these posts gets pushed there too. We had a big presence at the Wikimedia Hackathon 2017 in Vienna. We kicked off a lot of new language focused collaborations (Greek, Tamil, Bengali) and we deployed a new Item Quality model for Wikidata. French and Finnish Wikipedias now have advance edit quality prediction support! ORES is available through api.php again via rvprop=orescores and rcprop=oresscores. Wiki labels now has a new stats reporting interface. Check out https://labels.wmflabs.org/stats We had a major hiccup when failing over to CODFW, but we worked it out and ORES is very happy again. See more details on our new blog: "Score all the things" https://phabricator.wikimedia.org/phame/blog/view/8/ -Aaron Principal Research Scientist @ WMF Head of the Scoring Platform team

1 0

Introducing myself
by zppix e 03 Jun '17

03 Jun '17

Hello, I just wanted to introduce myself, for you that dont know me, My name is Zppix (on irc,onwiki etc) I'm a volunteer devloper upon seeing the metric meeting from May 25th and seeing Aaron's presentation about Kalani effect it peaked my interested in the AI-related projects, and now here I am. Other WMF projects I've ever worked on/contributed to: -Operations/puppet -Operations/mw-config -Mw/core -tools.zppixbot, tools.quarrybot-enwiki (both i created) -grrrit-wm (before merge with wikibugs) and im sure i've missed a few. Anyway i'm looking forward to collaberating with you all, have a great rest of the day! -- Thanks, Zppix Volunteer Developer for WMF www.enwp.org/User:Zppix

1 0

Join my Reddit AMA about Wikipedia and ethical, transparent AI
by Aaron Halfaker 01 Jun '17

01 Jun '17

Hey everybody, TL;DR: I wanted to let you know about an upcoming experimental Reddit AMA ("ask me anything") chat we have planned. It will focus on artificial intelligence on Wikipedia and how we're working to counteract vandalism while also making life better for newcomers. We plan to hold this chat on June 1st at 21:00 UTC/14:00 PST in the /r/iAMA subreddit[1]. I'd love to answer any questions you have about these topics questions, and I'll send a follow-up email to this thread shortly before the AMA begins. ---- For those who don't know who I am, I create artificial intelligences[2] that support the volunteers who edit Wikipedia[3]. I've been fascinated by the ways that crowds of volunteers build massive, high quality information resources like Wikipedia for over ten years. For more background, I research and then design technologies that make it easier to spot vandalism in Wikipedia—which helps support the hundreds of thousands of editors who make productive contributions. I also think a lot about the dynamics between communities and new users—and ways to make communities inviting and welcoming to both long-time community members and newcomers who may not be aware of community norms. For a quick sampling of my work, check out my most impactful research paper about Wikipedia[3], some recent coverage of my work from *Wired*[4], or check out the master list of my projects on my WMF staff user page[5], the documentation for the technology team I run[9], or the home page for Wikimedia Research[8]. This AMA, which I'm doing with with the Foundation's Communications department, is somewhat of an experiment. The intended audience for this chat is people who might not currently be a part of our community but have questions about the way we work—as well as potential research collaborators who might want to work with our data or tools. Many may be familiar with Wikipedia but not the work we do as a community behind the scenes. I'll be talking about the work I'm doing with the ethics of AI and how we think about artificial intelligence on Wikipedia, and ways we’re working to counteract vandalism on the world’s largest crowdsourced source of knowledge—like the ORES extension[6], which you may have seen highlighting possibly problematic edits on your watchlist and in RecentChanges. I’d love for you to join this chat and ask questions. If you do not or prefer not to use Reddit, we will also be taking questions on ORES' MediaWiki talk page[7] and posting answers to both threads. 1. https://www.reddit.com/r/IAmA/ 2. https://en.wikipedia.org/wiki/Artificial_intelligence 2. https://www.mediawiki.org/wiki/ORES 3. http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/halfa… 4. https://www.wired.com/2015/12/wikipedia-is-using-ai-to-expand-the-ranks-of-… 5. https://en.wikipedia.org/wiki/User:Halfak_(WMF) 6. https://www.mediawiki.org/wiki/Extension:ORES 7. https://www.mediawiki.org/wiki/Talk:ORES 8. https://www.mediawiki.org/wiki/Wikimedia_Research 9. https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team -Aaron Principal Research Scientist @ WMF User:EpochFail / User:Halfak (WMF)

1 1

New feature for wikilabels to show progress of campaigns
by Amir Tafreshi 30 May '17

30 May '17

Hey, I'm excited to announce the new feature in wikilabels that allows people to check progress of campaigns they are labeling. Wikilabels [1] is a platform to gather human-labeled data so it can be used on ORES [2]. You can find its home page in https://labels.wmflabs.org. These data can be used in different AI models varying from fighting vandalism to quality of items in Wikidata to anti-harassment models. Until now, it was hard to get number of labels that is being made in each campaign or understand how much work is left. But from now on by accessing https://labels.wmflabs.org/stats and then going to your wiki you can have these data. For example, go to https://labels.wmflabs.org/stats/enwiki/ and it shows you progress of each campaign and how many labels are left to consider it done or number of unique volunteers who are labeling. Note that we are overhauling current paths of wikilabels to something completely new because current paths are a little bit confusing and jump around between GUI and API. So this URLs might change in the future [3] but we will announce that properly beforehand and also make sure there is redirect left from the old ones. Any feedback about this feature would be greatly welcome. Feel free to reach out to us in #wikimedia-ai at irc://irc.freenode.net or AI mailing list. [4] [1]: https://meta.wikimedia.org/wiki/Wiki_labels [2]: https://www.mediawiki.org/wiki/ORES [3]: https://phabricator.wikimedia.org/T165046 [4]: https://lists.wikimedia.org/mailman/listinfo/ai Best -- Amir Sarabadani Tafreshi, on behalf of Scoring platform team Software Engineer (contractor) ------------------------------------- Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

1 0

Nose tests error with revscoring
by Sumit Asthana 22 May '17

22 May '17

Hi, I was trying to setup revscoring[0] on my GNU/Arch Linux. I'm using python3.6 in a virtual environment and installed all dependencies from requirements. However, when I run nosetests in the revscoring directory, it shows the error "AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'"[1] Is the error possibly because I'm using python3.6 or due to something else? [0] - https://github.com/wiki-ai/revscoring [1] - https://dpaste.de/MErp -Thanks, Sumit Asthana, IIT Patna

2 2

Re: [AI] [discovery] Collecting human labeled relevance judgements for search from readers
by Erik Bernhardson 04 May '17

04 May '17

On Thu, May 4, 2017 at 11:44 AM, Jan Drewniak <jdrewniak(a)wikimedia.org> wrote: > Hi Erik > > From my understanding, it looks like your looking to collect relevance > data "in reverse". Typically, for this type of data collection, I would > assume that you'd present a query with some search results, and ask users > "which results are relevant to this query" (which is what discernatron > does, at a very high effort level). > Indeed this is looking to go in reverse. The problem with asking people performing a query if the results are any good is the specific queries I'm interested in are not performed by very man people. These queries see on average less than one instance per week. By doing it in reverse we can sample from a (hopefully) much larger distribution. I still need to do some analysis though to see if these long tail queries also return long tail pages, as in ones that only receive a few tens of hits per day. If the result pages are also rarely viewed then this scheme will likely not work. We do have a particularly large sample of queries (~ 10 million or so) to draw from though, so can likely find queries with popular enough pages to get information about. > What I think your proposing instead is that when a user visits an article, > we present them with a question that asks "would this search query be > relevant to the article you are looking at". > > I can see this working, provided that the query is controlled and the > question is *not* phrased like it is above. > > I think that for this to work, the question should be phrased in a way > that elicits a simple "top-level" (maybe "yes" or "no") response. For > example, the question "*is this page about*: 'hydrostone halifax nova > scotia' " can be responded to with a thumbs up 👍 or thumbs down 👎, but a > question like "is this article relevant to the following query: ..." seems > more complicated 🤔 . > Indeed word smithing will be important here. I'm not sure 'is this page about' will be quite the right question, but I'm also not sure what the right question is. Relevance is a little more nuanced than what the page is about, some judgement needs to be made about the intent of the query and if the page can satisfy that intent. > > On Thu, May 4, 2017 at 6:29 PM, Erik Bernhardson < > ebernhardson(a)wikimedia.org> wrote: > >> On Wed, May 3, 2017 at 12:44 PM, Jonathan Morgan <jmorgan(a)wikimedia.org> >> wrote: >> >>> Hi Erik, >>> >>> I've been using some similar methods to evaluate Related Article >>> recommendations >>> <https://meta.wikimedia.org/wiki/Research:Evaluating_RelatedArticles_recomme…> >>> and the source of the trending article card >>> <https://meta.wikimedia.org/wiki/Research:Comparing_most_read_and_trending_e…> >>> in the Explore feed on Android. Let me know if you'd like to sit down and >>> chat about experimental design sometime. >>> >>> - J >>> >>> >> This might be useful. I'll see if i can find a time on both our >> calendars. I should note though this is explicitly not about experimental >> design. The data is not going to be used for experimental purposes, but >> rather to feed into a machine learning pipeline that will re-order search >> results to provide the best results at the top of the list. For the purpose >> of ensuring the long tail is represented in the training data for this >> model I would like to have a few tens of thousands of labels for (query, >> page) combinations each month. The relevance of pages to a query does have >> some temporal aspect, so we would likely want to only use the last N months >> worth of data (TBD). >> >> On Wed, May 3, 2017 at 12:24 PM, Erik Bernhardson < >>> ebernhardson(a)wikimedia.org> wrote: >>> >>>> At our weekly relevance meeting an interesting idea came up about how >>>> to collect relevance judgements for the long tail of queries, which make up >>>> around 60% of search sessions. >>>> >>>> We are pondering asking questions on the article pages themselves. >>>> Roughly we would manually curate some list of queries we want to collect >>>> relevance judgements for. When a user has spent some threshold of time >>>> (60s?) on a page we would, for some % of users, check if we have any >>>> queries we want labeled for this page, and then ask them if the page is a >>>> relevant result for that query. In this way the amount of work asked of >>>> individuals is relatively low and hopefully something they can answer >>>> without too much work. We know that the average page receives a few >>>> thousand page views per day, so even with a relatively low response rate we >>>> could probably collect a reasonable number of judgements over some medium >>>> length time period (weeks?) >>>> >>>> These labels would almost certainly be noisy, we would need to collect >>>> the same judgement many times to get any kind of certainty on the label. >>>> Additionally we would not be able to really explain the nuances of a >>>> grading scale with many points, we would probably have to use either a >>>> thumbs up/thumbs down approach, or maybe a happy/sad/indifferent smiley >>>> face. >>>> >>>> Does this seem reasonable? Are there other ways we could go about >>>> collecting the same data? How to design it in a non-intrusive manner that >>>> gets results, but doesn't annoy users? Other thoughts? >>>> >>>> >>>> For some background: >>>> >>>> * We are currently generating labeled data using statistical analysis >>>> (clickmodels) against historical click data. This analysis requires there >>>> to be multiple search sessions with the same query presented with similar >>>> results to estimate the relevance of those results. A manual review of the >>>> results showed queries with clicks from at least 10 sessions had reasonable >>>> but not great labels, queries with 35+ sessions looked pretty good, and >>>> queries with hundreds of sessions were labeled really well. >>>> >>>> * an analysis of 80 days worth of search click logs showed that 35 to >>>> 40% of search sessions are for queries that are repeated more than 10 times >>>> in that 80 day period. Around 20% of search session are for queries that >>>> are repeated more than 35 times in that 80 day period. ( >>>> https://phabricator.wikimedia.org/P5371) >>>> >>>> * Our privacy policy prevents us from keeping more than 90 days worth >>>> of data from which to run these clickmodels. Practically 80 days is >>>> probably a reasonable cutoff, as we will want to re-use the data multiple >>>> times before needing to delete it and generate a new set of labels. >>>> >>>> * We currently collect human relevance judgements with Discernatron ( >>>> https://discernatron.wmflabs.org/). This is useful data for manual >>>> evaluation of changes, but the data set is much too small (low hundreds of >>>> queries, with an average of 50 documents per query) to integrate into >>>> machine learning. The process of judging query/document pairs for the >>>> community is quite tedious, and it doesn't seem like a great use of >>>> engineer time for us to do this ourselves. >>>> >>>> _______________________________________________ >>>> AI mailing list >>>> AI(a)lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/ai >>>> >>>> >>> >>> >>> -- >>> Jonathan T. Morgan >>> Senior Design Researcher >>> Wikimedia Foundation >>> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> >>> >>> >>> _______________________________________________ >>> discovery mailing list >>> discovery(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/discovery >>> >>> >> >> _______________________________________________ >> discovery mailing list >> discovery(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > > -- > Jan Drewniak > UX Engineer, Discovery > Wikimedia Foundation > > _______________________________________________ > discovery mailing list > discovery(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/discovery > >

1 0

Collecting human labeled relevance judgements for search from readers
by Erik Bernhardson 04 May '17

04 May '17

At our weekly relevance meeting an interesting idea came up about how to collect relevance judgements for the long tail of queries, which make up around 60% of search sessions. We are pondering asking questions on the article pages themselves. Roughly we would manually curate some list of queries we want to collect relevance judgements for. When a user has spent some threshold of time (60s?) on a page we would, for some % of users, check if we have any queries we want labeled for this page, and then ask them if the page is a relevant result for that query. In this way the amount of work asked of individuals is relatively low and hopefully something they can answer without too much work. We know that the average page receives a few thousand page views per day, so even with a relatively low response rate we could probably collect a reasonable number of judgements over some medium length time period (weeks?) These labels would almost certainly be noisy, we would need to collect the same judgement many times to get any kind of certainty on the label. Additionally we would not be able to really explain the nuances of a grading scale with many points, we would probably have to use either a thumbs up/thumbs down approach, or maybe a happy/sad/indifferent smiley face. Does this seem reasonable? Are there other ways we could go about collecting the same data? How to design it in a non-intrusive manner that gets results, but doesn't annoy users? Other thoughts? For some background: * We are currently generating labeled data using statistical analysis (clickmodels) against historical click data. This analysis requires there to be multiple search sessions with the same query presented with similar results to estimate the relevance of those results. A manual review of the results showed queries with clicks from at least 10 sessions had reasonable but not great labels, queries with 35+ sessions looked pretty good, and queries with hundreds of sessions were labeled really well. * an analysis of 80 days worth of search click logs showed that 35 to 40% of search sessions are for queries that are repeated more than 10 times in that 80 day period. Around 20% of search session are for queries that are repeated more than 35 times in that 80 day period. ( https://phabricator.wikimedia.org/P5371) * Our privacy policy prevents us from keeping more than 90 days worth of data from which to run these clickmodels. Practically 80 days is probably a reasonable cutoff, as we will want to re-use the data multiple times before needing to delete it and generate a new set of labels. * We currently collect human relevance judgements with Discernatron ( https://discernatron.wmflabs.org/). This is useful data for manual evaluation of changes, but the data set is much too small (low hundreds of queries, with an average of 50 documents per query) to integrate into machine learning. The process of judging query/document pairs for the community is quite tedious, and it doesn't seem like a great use of engineer time for us to do this ourselves.

2 2

Re: [AI] [Commons-l] Programmatically categorizing media in the Commons with Machine Learning
by Jordan Adler 19 Apr '17

19 Apr '17

GCP has a number of models-as-a-service <https://cloud.google.com/products/machine-learning/> that might be useful. On Mon, Apr 3, 2017 at 6:46 PM Daniel Mietchen < daniel.mietchen(a)googlemail.com> wrote: > Hi Jordan, > can your pipeline help with video or perhaps even audio as well? > There are lots of such files as well that need categorization. > Thanks, > Daniel > > On Tue, Apr 4, 2017 at 12:05 AM, Jordan Adler <jmadler(a)google.com> wrote: > > Looks like some of these images still need categorization. I think > there's > > still an unrealized opportunity here to use the results I shared to work > the > > backlog of the category on the Commons. > > > > On Thu, Aug 11, 2016 at 1:47 PM Pine W <wiki.pine(a)gmail.com> wrote: > >> > >> Forwarding. > >> > >> Pine > >> > >> ---------- Forwarded message ---------- > >> From: "Jordan Adler" <jmadler(a)google.com> > >> Date: Aug 11, 2016 13:06 > >> Subject: [Commons-l] Programmatically categorizing media in the Commons > >> with Machine Learning > >> To: "commons-l(a)wikimedia.org" <commons-l(a)lists.wikimedia.org> > >> Cc: "Ray Sakai" <rsakai(a)reactive.co.jp>, "Ram Ramanathan" > >> <ramramanathan(a)google.com>, "Kazunori Sato" <kazsato(a)google.com> > >> > >> Hey folks! > >> > >> > >> A few months back a colleague of mine was looking for some unstructured > >> images to analyze as part of a demo for the Google Cloud Vision API. > >> Luckily, I knew just the place, and the resulting demo, built by > Reactive > >> Inc., is pretty awesome. It was shared on-stage by Jeff Dean during the > >> keynote at GCP NEXT 2016. > >> > >> > >> I wanted to quickly share the data from the programmatically identified > >> images so it could be used to help categorize the media in the Commons. > >> There's about 80,000 images worth of data: > >> > >> > >> map.txt (5.9MB): A single text file mapping id to filename in a "id : > >> filename" format, one per line > >> > >> results.tar.gz (29.6MB): a tgz'd directory of json files representing > the > >> output of the API, in the format "${id}.jpg.json" > >> > >> > >> We're making this data available under the CC0 license, and these links > >> will likely be live for at least a few weeks. > >> > >> > >> If you're interested in working with the Cloud Vision API to tag other > >> images in the Commons, talk to the WMF Community Tech team. > >> > >> > >> Thanks for your help! > >> > >> > >> _______________________________________________ > >> Commons-l mailing list > >> Commons-l(a)lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/commons-l > >> > > > > _______________________________________________ > > Commons-l mailing list > > Commons-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/commons-l > > > > _______________________________________________ > Commons-l mailing list > Commons-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/commons-l >

1 0

Scoring Platform Team update
by Aaron Halfaker 14 Apr '17

14 Apr '17

Hey folks, In this update, I'm going to change some things up to try and make this update easier for you to consume. The biggest change you'll notice is that I've broken up the [#] references in each section. I hope that saves you some scrolling and confusion. You'll also notice that I have changed the subject line from "Revision scoring" to "Scoring Platform" because it's now clear that, come July, I'll be leading a new team with that name at the Wikimedia Foundation. There'll be an announcement about that coming once our budget is finalized. I'll try to keep this subject consistent for the foreseeable future so that your email clients will continue to group the updates into one big thread. *Deployments & maintenance:* In this cycle, we've gotten better at tracking our deployments and noting what changes do out with each deployment. You can click on the phab task for a deployment and observe the sub-tasks to find out what was deployed. We had 3 deployments for ORES since mid-march[1,2,3]. We've had two deployments to Wikilabels[4,5] and we've added a maintenance notices for a short period of downtime that's coming up on April 21st[6,7]. 1. https://phabricator.wikimedia.org/T160279 -- Deploy ores in prod (Mid-March) 2. https://phabricator.wikimedia.org/T160638 -- Deploy ORES late march 3. https://phabricator.wikimedia.org/T161748 -- Deploy ORES early April 4. https://phabricator.wikimedia.org/T161002 -- Late march wikilabels deployment 5. https://phabricator.wikimedia.org/T163016 -- Deploy Wikilabels mid-April 6. https://phabricator.wikimedia.org/T162888 -- Add header to Wikilabels that warns of upcoming maintenance. 7. https://phabricator.wikimedia.org/T162265 -- Manage wikilabels for labsdb1004 maintenance *Making ORES better:* We've been working to make ORES easier to extend and more useful. ORES now reports it's relevant versions at https://ores.wikimedia.org/versions[8]. We've also reduced the complexity of our "precaching" system that scores edits before you ask for them[9,10]. We're taking advantage of logstash to store and query our logs[11]. We've also implemented some nice abstractions for requests and responses in ORES[12] that allowed us to improve our metrics tracking substantially[13]. 8. https://phabricator.wikimedia.org/T155814 -- Expose version of the service and its dependencies 9. https://phabricator.wikimedia.org/T148714 -- Create generalized "precache" endpoint for ORES 10. https://phabricator.wikimedia.org/T162627 -- Switch `/precache` to be a POST end point 11. https://phabricator.wikimedia.org/T149010 -- Send ORES logs to logstash 12. https://phabricator.wikimedia.org/T159502 -- Exclude precaching requests from cache_miss/cache_hit metrics 13. https://phabricator.wikimedia.org/T161526 -- Implement ScoreRequest/ScoreResponse pattern in ORES *New functionality:* In the last month and a half, we've added basic support to Korean Wikipedia[14,15]. Props to Revi for helping us work through a bunch of issues with our Korean language support[16,17,18]. We've also gotten the ORES Review tool deployed to Hebrew Wikipedia[19,20,21,22] and Estonian Wikipedia[23,24,25]. We're also working with the Collaboration team to implement the threshold test statistics that they need to tune their new Edit Review interface[26] and we're working towards making this kind of work self-serve so that that product team and other tool developers won't have to wait on us to implement these threshold stats in the future[27]. 14. https://phabricator.wikimedia.org/T161617 -- Deploy reverted model for kowiki 15. https://phabricator.wikimedia.org/T161616 -- Train/test reverted model for kowiki 16. https://phabricator.wikimedia.org/T160752 -- Korean generated word lists are in chinese 17. https://phabricator.wikimedia.org/T160757 -- Add language support for Korean 18. https://phabricator.wikimedia.org/T160755 -- Fix tokenization for Korean 19. https://phabricator.wikimedia.org/T161621 -- Deploy ORES Review Tool for hewiki 20. https://phabricator.wikimedia.org/T130284 -- Deploy edit quality models for hewiki 21. https://phabricator.wikimedia.org/T160930 -- Train damaging and goodfaith models for hewiki 22. https://phabricator.wikimedia.org/T130263 -- Complete hewiki edit quality campaign 23. https://phabricator.wikimedia.org/T159609 -- Deploy ORES review tool to etwiki 24. https://phabricator.wikimedia.org/T130280 -- Deploy edit quality models for etwiki 25. https://phabricator.wikimedia.org/T129702 -- Complete etwiki edit quality campaign 26. https://phabricator.wikimedia.org/T162377 -- Implement additional test_stats in editquality 27. https://phabricator.wikimedia.org/T162217 -- Implement "thresholds", deprecate "pile of tests_stats" *ORES training / labeling campaigns:* Thanks to a lot of networking at Wikimedia Conference and some help from Ijon (Asaf Batrov), we've found a bunch of new collaborators to help us deploy ORES to new wikis. As is critcial in this process, we need to deploy labeling campaigns so that Wikipedians can help us train ORES. We've got new editquality labeling campaigns deployed to Albanian[28], Finnish[29], Latvian[30], Korean[31], and Turkish[21] Wikipedias. We've also been working on a new type of model: "Item quality" in Wikidata. We've deployed, labeled, and analyzed a pilot[33], fixed some critical bugs that came up[34,35], and we've finally launched a 5k item campaign which is already 17% done[36]! See https://www.wikidata.org/wiki/Wikidata:Item_quality_campaign if you'd like to help us out. 28. https://phabricator.wikimedia.org/T161981 -- Edit quality campaign for Albanian Wikipedia 29. https://phabricator.wikimedia.org/T161905 -- Edit quality campaign for Finnish Wikipedia 30. https://phabricator.wikimedia.org/T162032 -- Edit quality campaign for Latvian Wikipedia 31. https://phabricator.wikimedia.org/T161622 -- Deploy editquality campaign in Korean Wikipedia 32. https://phabricator.wikimedia.org/T161977 -- Start v2 editquality campaign for trwiki 33. https://phabricator.wikimedia.org/T159570 -- Deploy the pilot of Wikidata item quality campaign 34. https://phabricator.wikimedia.org/T160256 -- Wikidata items render badly in Wikilabels 35. https://phabricator.wikimedia.org/T162530 -- Implement "unwanted pages" filtering strategy for Wikidata 36. https://phabricator.wikimedia.org/T157493 -- Deploy Wikidata item quality campaign *Bug fixing:* As usual, we have a few weird bug that got in our way. We needed to move to a bigger virtual machine in "Beta Labs" because our models take up a bunch of hard drive space[37]. We found that Wikilabels wasn't removing expired tasks correctly and that this was making it difficult to finish labeling campaigns[38]. We also had a lot of right-to-left issues when we did an upgrade of OOjs UI[38]. There was an old bug we had with https://translatewiki.net in one of our message keys[39]. 37. https://phabricator.wikimedia.org/T160762 -- deployment-ores-redis /srv/ redis is too small (500MBytes) 38. https://phabricator.wikimedia.org/T161521 -- Wikilabels is not cleaning up expired tasks for Wikidata item quality campaign 39. https://phabricator.wikimedia.org/T161533 -- Fix RTL issues in Wikilabels after OOjs UI upgrade 40. https://phabricator.wikimedia.org/T132197 -- qqq for a wiki-ai message cannot be loaded -Aaron Principal Research Scientist Head of the Scoring Platform Team

1 0

Re: [AI] [discovery] Another round of name that thing
by Erik Bernhardson 11 Apr '17

11 Apr '17

Something about PLURAL just doesn't strike me. MjoLniR on the other hand doesn't seem too bad, if a little esoteric. And sorry but i think using non-ascii in the name of a git repository is just asking for trouble somewhere :P. I'm also not opposed to being very boring and calling it cirrusearch-mlr or cirrussearch-ltrank On Thu, Apr 6, 2017 at 9:46 AM, Mikhail Popov <mpopov(a)wikimedia.org> wrote: > OH I JUST GOT WHY WE CAN CAPITALIZE THE FINAL R. > > Okay, so MjöLniR => the hammer used for _M_achine _L_earning & _R_anking, > with the added benefit of the pronunciation being "myol-near" => ML-near => > learning to rank articles _near_ the query. > > BOOM! *mic drop* > > On Thu, Apr 6, 2017 at 9:40 AM, Trey Jones <tjones(a)wikimedia.org> wrote: > >> Got to capitalize the final R or don't capitalize the L! >> >> Plus, whatever are the two main components that go into building MjöLniR >> would be, somewhat opaquely, Sindri and Brokkr >> <https://en.wikipedia.org/wiki/Mj%C3%B6lnir>. >> >> Trey Jones >> Software Engineer, Discovery >> Wikimedia Foundation >> >> On Thu, Apr 6, 2017 at 12:32 PM, Mikhail Popov <mpopov(a)wikimedia.org> >> wrote: >> >>> MjöLnir? >>> >>> P.S. I like PLURAL. >>> >>> >>> On Thu, Apr 6, 2017 at 7:30 AM, David Causse <dcausse(a)wikimedia.org> >>> wrote: >>> >>>> I don't have good suggestions, I like PLURAL. >>>> >>>> On Thu, Apr 6, 2017 at 6:01 AM, Pine W <wiki.pine(a)gmail.com> wrote: >>>> >>>>> +1 for PLURAL. >>>>> >>>>> Pine >>>>> >>>>> >>>>> _______________________________________________ >>>>> discovery mailing list >>>>> discovery(a)lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/discovery >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> discovery mailing list >>>> discovery(a)lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/discovery >>>> >>>> >>> >>> _______________________________________________ >>> discovery mailing list >>> discovery(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/discovery >>> >>> >> >> _______________________________________________ >> discovery mailing list >> discovery(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > _______________________________________________ > discovery mailing list > discovery(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/discovery > >

2 1

← Newer
1
...
14
15
16
17
18
19
20
...
27
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

AI