[Cloud] Browser extension for unsourced Wikipedia articles

29 Sep 2017


      Hi everyone,
I've been hacking on a new tool and I thought I'd share what (little) I
have so far to get some comments and learn of related approaches from the
community.
The basic idea would be to have a browser extension that tells the user if
the current page they're viewing looks like a good reference for a
Wikipedia article, for some whitelisted domains like news websites. This
would hopefully prompt casual/opportunistic edits, especially for articles
that may be overlooked normally.
As a proof of concept for a backend, I built a simple bag-of-words model of
the TextExtracts of enwiki's
Category:All_articles_needing_additional_references. I then set up a tool
[1] to receive HTML input and retrieve the 5 most similar articles to that
input. You can try it out in your browser [2], or on the command line [3].
The results could definitely be better, but having tried it on a few
different articles over the past few days, I think there's some potential
there.
I'd be interested in hearing your thoughts on this. Specifically:
* If such a backend/API were available, would you be interested in using it
for other tools? If so, what functionality would you expect from it?
* I'm thinking of just throwing away the above proof of concept and using
ElasticSearch, though I don't know a lot about it. Is anyone aware of a
similar dataset that already exists there, by any chance? Or any reasons
not to go that way?
* Any other comments on the overall idea or implementation?
Thanks!
1- https://github.com/eggpi/similarity
2- https://tools.wmflabs.org/similarity/
3- Example: curl
https://www.nytimes.com/2017/09/22/opinion/sunday/portugal-drug-decriminaliz...
| curl -X POST http://tools.wmflabs.org/similarity/search --form "text=<-"
-- 
Guilherme P. Gonçalves

2024

2023

2022

2021

2020

2019

2018

2017

[Cloud] Browser extension for unsourced Wikipedia articles