Thank you Aiko! This is excellent work. Thank you for helping us offer this
valuable new data service to the Wikimedia Movement.
Best,
Jonathan
On Sat, Mar 7, 2020 at 6:03 AM Ai-Jou Chou <qwanqwanro(a)gmail.com> wrote:
Hi all,
I’m happy to announce the outcome of an Outreachy internship
<https://phabricator.wikimedia.org/T233707> that I’m finishing up. It is a
new tool and public dataset named Citation Detective which tool developers
and researchers can now use for their projects.
Citation Detective <https://meta.wikimedia.org/wiki/Citation_Detective>
contains sentences that have been identified as needing a citation using a
machine learning-based classifier published earlier last year
<https://arxiv.org/pdf/1902.11116.pdf> by WMF researchers and
collaborators. As part of Outreachy, I developed a tool
<https://github.com/AikoChou/citationdetective> (hosted on Toolforge
<https://tools.wmflabs.org>) to run through Wikipedia and extract
high-scoring sentences along with contextual information.
As an example use case for this data, I also created a proof of concept for
integrating Citation Detective and Citation Hunt
<https://tools.wmflabs.org/citationhunt>. Check out my prototype Citation
Hunt <https://tools.wmflabs.org/aiko-citationhunt>, which uses Citation
Detective to import sentences that would not normally be featured in
Citation Hunt. The repository for that is here
<https://github.com/AikoChou/citationhunt>.
This dataset currently includes sentences from ~120,000 randomly selected
articles from the English Wikipedia. In future work, we hope to expand this
to more language Wikipedia projects and a greater number of articles. It is
also possible to expand the database to contain more fields in a future
version according to feedback from tool developers and researchers. More
use cases for this type of data were identified in a design research
project
<
https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statem…
conducted last year by Jonathan Morgan.
You can find more information in our Wiki Workshop submission
<
https://commons.wikimedia.org/wiki/File:Citation_Detective_WikiWorkshop2020…
and in my blog
<https://rollingmist.home.blog/> which documented the whole
journey.
Thank you very much!
Kind regard,
Aiko
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
(Uses He/Him)
*Please note that I do not expect a response from you on evenings or
weekends*