I made a suggestion  in the ongoing discussion about the Wikimedia
Developer Summit  in January that AI should be a major topic. I am not
exactly an expert on it but my impression is that the Wikimedia movement is
largely missing to notice the beginnings of a huge shift in user
expectations towards smarter tools and interfaces. While there is some
attention to it (as the existence of this list proves), I don't think it is
proportional to the importance of the topic and the summit might be a good
chance to raise attention.
Input from people who, unlike me, actually know what they are talking about
would be very welcome on the wiki page :)
1. If someone is paid to do captioning and/or categorization work, such as
by a GLAM institution or a Wikimedia affiliate with a budget that supports
this kind of work, then integrating this research into Wikimedia workflows
could significantly increase that person's cost-effectiveness.
2. If volunteers are uploading large quantities of photos, this may make
captioning and categorization much less time consuming and therefore
volunteers may be more likely to do substantial captioning and
categorization work instead of doing the minimum amount of work necessary.
On Wed, Sep 28, 2016 at 12:19 AM, Jan Dittrich <jan.dittrich(a)wikimedia.de>
> I find it interesting which impact this could have on the sense of
> achievement for volunteers, if captions are autogenerated or suggested and
> them possibly affirmed or corrected.
> On one hand one could assume a decreased sense of ownership,
> on the other hand, it might be more easier to comment/correct then to
> write from scratch and feel much more efficient.
> 2016-09-27 23:08 GMT+02:00 Dario Taraborelli <dtaraborelli(a)wikimedia.org>:
>> I forwarded this separately to internally at WMF a few days ago. Clearly
>> – before thinking of building workflows for human contributors to generate
>> captions or rich descriptors of media files in Commons – we should look at
>> what's available in terms of off-the-shelf machine learning services and
>> #1 rule of sane citizen science/crowdsourcing projects: don't ask humans
>> to perform tedious tasks machines are pretty good at, get humans to curate
>> inputs and outputs of machines instead.
>> On Mon, Sep 26, 2016 at 5:55 PM, Pine W <wiki.pine(a)gmail.com> wrote:
>>> Perhaps of interest: "...We’re making the latest version of our image
>>> captioning system available as an open source model in TensorFlow."
>>> Wiki-research-l mailing list
>> *Dario Taraborelli *Head of Research, Wikimedia Foundation
>> wikimediafoundation.org • nitens.org • @readermeter
>> Wiki-research-l mailing list
> Jan Dittrich
> UX Design/ User Research
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Phone: +49 (0)30 219 158 26-0
> Imagine a world, in which every single human being can freely share in the
> sum of all knowledge. That‘s our commitment.
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> Wiki-research-l mailing list
I've been looking at some recent work that used Probabilistic Context-free
Grammars[1,2] to detect vandalism in Wikipedia. I wanted to send a quick
message to share some progress.
I've built a python library that implements a really simple PCFG training
and scoring strategy and written a quick demo of how it can work. In the
following demo, I show how we can build a probabilistic grammar using the
I'm a Little Teapot song. Note how sentences that are not
characteristic of the song score lower. Note that scores are log-scaled.
>>> sentences = [
... "I am a little teapot",
... "Here is my handle",
... "Here is my spout",
... "When I get all steamed up I just shout tip me over and
pour me out",
... "I am a very special pot",
... "It is true",
... "Here is an example of what I can do",
... "I can turn my handle into a spout",
... "Tip me over and pour me out"]
>>> teapot_grammar = TreeScorer.from_tree_bank(bllip_parse(s) for s in
>>> teapot_grammar.score(bllip_parse("Here is a little teapot"))
>>> teapot_grammar.score(bllip_parse("It is my handle"))
>>> teapot_grammar.score(bllip_parse("I am a spout"))
>>> teapot_grammar.score(bllip_parse("Your teapot is gay"))
>>> teapot_grammar.score(bllip_parse("Your mom's teapot is
This work is inspired by work that Arthur Tilley did on our team a last
year. The 'kasami' library represents a narrow slice of Arthur's work.
Next, I'm working on building out revscoring to implement some features
that use the scoring strategy on sentenced modified in an edit. I'm hoping
that this type of feature engineering will allow us to catch edits that
make articles more/less notable. I'm also targeting spammy language and
Today ORES in production was sending out unreasonable amount of timeout
errors. Causing icinga to scream and 14% failure rate on average for ORES
review tool jobs. It turned out that ores workers are logging too much
causing the nodes to run out of disk space.  I suspect we had similar
issue in our labs nodes.
I made changes for prod and labs and deployed it today. You can find more
details in the phab card
This is the 22nd weekly update from revision scoring team that we have sent
to this mailing list.
- We configured the default threshold for the ORES review tool on
Wikidata to be more strict (higher recall, lower precision)
- We fixed a display issue on Special:Contributions where the filters
would not wrap
Increasing model fitness:
- We finished demonstrating model fitness gains using hash-vector
features. Next, we'll be working to get the hash-vector features
implemented in revscoring/ORES.
- We implemented a new strategy for training and testing on all data
using cross-validation. This will both increase the fitness of the
models and make the statistics reported more robust.
Maintenance and robustness
- We fixed an indexing issues in ores_model that prevented the
deployment of updated models.
- We did a minor investigation to a short period of degraded service
quality on WMF Labs
1. https://phabricator.wikimedia.org/T144784 -- Change default threshold
for Wikidata to high
2. https://phabricator.wikimedia.org/T143518 -- Filter on user contribs has
nowrap, causing issues
3. https://phabricator.wikimedia.org/T128087 -- [Spike] Investigate
4. https://phabricator.wikimedia.org/T145812 -- Implement ~100 most
important hash vector features in editquality models
5. https://phabricator.wikimedia.org/T142953 -- Train on all data, Report
test statistics on cross-validation
6. https://phabricator.wikimedia.org/T144432 -- oresm_model index should
not be unique
7. https://phabricator.wikimedia.org/T145353 -- Investigate short period of
Aaron from the Revision Scoring team
One of ORES  applications is determining article quality. For example,
What would be the best assessment of an article in the given revision.
Users in wikiprojects use ORES data to check if articles need
re-assessment. e.g. if an article is in "Start" level and now good it's
enough to be a "B" article.
As part of Q4 goals, we made a dataset of article quality scores of all
articles in English Wikipedia  (Here's the link to download the dataset
) and we are publishing it in figshare as something you can cite 
also we are working on publishing monthly data for researchers to track
article quality data change over time. 
As a pet project of mine, I always wanted to put these data in a database.
So we can query the database and get much more useful data. For example
quality of articles in category 'History_of_Essex'  . The weighed sum
is a measure of quality which is a decimal number between 0 (really stub)
to 5 (a definitely featured article). We have also prediction column which
is a number in this map  for example if prediction is 5, it means ORES
thinks it should be a featured article.
I leave more use cases to your imagination :)
I'm looking for a more permanent place to put these data, please tell me if
it's useful for you.
 ORES is not a anti-vandalism tool, it's an infrastructure to use AI in
 (117 MBs)
Thanks for this. I mean WOW for lack of better words.
I'm especially impressed with the inclusion of the weighted scores
which allows the observation of small changes in quality. I was going
to suggest that you could do the same thing for the end of every year,
say for the last 5 years, so that we can see the improvement in
articles - but that would be too much to ask. But then I noticed you
are planning on doing this monthly. Double WOW.
Minor quibble - their are lots of disambiguation pages included. I'd
delete those if possible.
PS - WOW