Hi all,
I’m happy to announce the outcome of an Outreachy internship https://phabricator.wikimedia.org/T233707 that I’m finishing up. It is a new tool and public dataset named Citation Detective which tool developers and researchers can now use for their projects.
Citation Detective https://meta.wikimedia.org/wiki/Citation_Detective contains sentences that have been identified as needing a citation using a machine learning-based classifier published earlier last year https://arxiv.org/pdf/1902.11116.pdf by WMF researchers and collaborators. As part of Outreachy, I developed a tool https://github.com/AikoChou/citationdetective (hosted on Toolforge https://tools.wmflabs.org) to run through Wikipedia and extract high-scoring sentences along with contextual information.
As an example use case for this data, I also created a proof of concept for integrating Citation Detective and Citation Hunt https://tools.wmflabs.org/citationhunt. Check out my prototype Citation Hunt https://tools.wmflabs.org/aiko-citationhunt, which uses Citation Detective to import sentences that would not normally be featured in Citation Hunt. The repository for that is here https://github.com/AikoChou/citationhunt.
This dataset currently includes sentences from ~120,000 randomly selected articles from the English Wikipedia. In future work, we hope to expand this to more language Wikipedia projects and a greater number of articles. It is also possible to expand the database to contain more fields in a future version according to feedback from tool developers and researchers. More use cases for this type of data were identified in a design research project https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/API_design_research conducted last year by Jonathan Morgan.
You can find more information in our Wiki Workshop submission https://commons.wikimedia.org/wiki/File:Citation_Detective_WikiWorkshop2020.pdf and in my blog https://rollingmist.home.blog/ which documented the whole journey.
Thank you very much!
Kind regard, Aiko
Thank you Aiko! This is excellent work. Thank you for helping us offer this valuable new data service to the Wikimedia Movement.
Best, Jonathan
On Sat, Mar 7, 2020 at 6:03 AM Ai-Jou Chou qwanqwanro@gmail.com wrote:
Hi all,
I’m happy to announce the outcome of an Outreachy internship https://phabricator.wikimedia.org/T233707 that I’m finishing up. It is a new tool and public dataset named Citation Detective which tool developers and researchers can now use for their projects.
Citation Detective https://meta.wikimedia.org/wiki/Citation_Detective contains sentences that have been identified as needing a citation using a machine learning-based classifier published earlier last year https://arxiv.org/pdf/1902.11116.pdf by WMF researchers and collaborators. As part of Outreachy, I developed a tool https://github.com/AikoChou/citationdetective (hosted on Toolforge https://tools.wmflabs.org) to run through Wikipedia and extract high-scoring sentences along with contextual information.
As an example use case for this data, I also created a proof of concept for integrating Citation Detective and Citation Hunt https://tools.wmflabs.org/citationhunt. Check out my prototype Citation Hunt https://tools.wmflabs.org/aiko-citationhunt, which uses Citation Detective to import sentences that would not normally be featured in Citation Hunt. The repository for that is here https://github.com/AikoChou/citationhunt.
This dataset currently includes sentences from ~120,000 randomly selected articles from the English Wikipedia. In future work, we hope to expand this to more language Wikipedia projects and a greater number of articles. It is also possible to expand the database to contain more fields in a future version according to feedback from tool developers and researchers. More use cases for this type of data were identified in a design research project < https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Stateme...
conducted last year by Jonathan Morgan.
You can find more information in our Wiki Workshop submission < https://commons.wikimedia.org/wiki/File:Citation_Detective_WikiWorkshop2020....
and in my blog https://rollingmist.home.blog/ which documented the whole journey.
Thank you very much!
Kind regard, Aiko _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org