Hi Ludovic,
This work sounds interesting, I'm looking forward to learning more about it
as your papers come out!
I read through the post on LinkedIn and from how I interpret it you are
only looking at two quality classes (Features Articles vs other articles).
This seems somewhat odd to me and I'd like to know more about why? The
current trend when it comes to predicting article quality in the English
Wikipedia does not limit the prediction problem to just FAs vs the rest,
instead it's using the whole quality scale[1]. See the list below for some
papers along this line of research.
I'm also really curious about what "standardize the cognitive accessibility
of Wikipedia" means? That might mean more than just "article quality",
hence why I'm asking.
All that being said, I think the approach sounds interesting and probably
adds some signal, so I'm curious to learn more how it works and performs.
References:
- Warncke-Wang, M., Cosley, D., & Riedl, J. Tell me more: an actionable
quality model for Wikipedia. OpenSym/WikiSym 2013. [We argue that metadata
isn't useful because contributors can't change it]
- Warncke-Wang, M., Ayukaev, V. R., Hecht, B., & Terveen, L. G. The
success and failure of quality improvement projects in peer production
communities. CSCW 2015. [See the Appendix for details of the improved model
and how to get good training data]
-
https://www.mediawiki.org/wiki/ORES builds upon the 2015 paper and is
a readily accessible API, reference datasets are available on figshare
<https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406>
and
also in the GitHub repository
<https://github.com/wikimedia/articlequality>. Now the benchmark to
compare against, as in the three other papers listed below.
- Dang, Q. V., & Ignat, C. L. Measuring quality of collaboratively
edited documents: the case of Wikipedia. CIC 2016. [Shows that adding
readability features can improve predictions]
- Dang, Q. V., & Ignat, C. L. An end-to-end learning solution for
assessing the quality of Wikipedia articles. OpenSym 2017. [Shows the
performance of RNNs, also contains an important discussion of performance,
interpretability, etc]
I also came across this recent paper by Schmidt and Zangerle that reports
significant improvements, but haven't yet had the time to read the paper
closely:
- Schmidt, M., & Zangerle, E. Article quality classification on
Wikipedia: introducing document embeddings and content features. OpenSym
2019.
Footnotes:
1. Typically without A-class articles due to how few of them they are.
Cheers,
Morten
On Mon, 23 Sep 2019 at 13:09, Ludovic Bocken <lbocken(a)gmail.com> wrote:
Hello,
I am finishing my PhDs and I think that you could be interested in my last
main work about the quality of Wikipedia :
https://www.linkedin.com/pulse/standardization-wikipedia-articles-according…
and in a future collaboration.
I would be very grateful for your feedbacks ! Several publications are in
preparation... Let me know if you are interested in following this
thread...
Have a nice week,
Ludovic BOCKEN
lbocken(a)gmail.com
www.ludovicbocken.com
Skype: ludovic.bocken
http://www.linkedin.com/in/ludovicbocken
2222 Rue Hochelaga,
Montréal, QC H2K 4N8
+1 (514) 649 0755
*Avis de confidentialité*
Le présent message transmis par télécopie est confidentiel, et son contenu
peut être protégé par le secret professionnel. Il est à l’usage exclusif de
son ou sa destinataire. Toute autre personne est par les présentes avisée
qu’il lui est strictement interdit de le diffuser, de le distribuer ou de
le reproduire. Si la ou le destinataire ne peut être joint ou vous est
inconnu, nous vous prions d’en informer immédiatement l’expéditeur ou
l’expéditrice et de détruire ce message et toute copie de celui-ci.
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l