Hello wiki-research community!
I'm sharing a call-for-papers for a workshop that I'm helping to organize at EMNLP 2024 https://2024.emnlp.org/ that will be focused on celebrating Wikimedia's contributions to the NLP community and highlighting approaches to ensuring the sustainability of this relationship for years to come. Our website for the workshop is on Meta (and I've copied the relevant content below as well): https://meta.wikimedia.org/wiki/NLP_for_Wikipedia_(EMNLP_2024)
The workshop will be hybrid (virtual and in-person components). We have not been assigned a date yet but it will either be November 15th or 16th. To get a sense of potential costs, you can see last year's EMNLP conference registration: https://2023.emnlp.org/registration/#virtual-pricing
== Overview ==
Co-located with the EMNLP 2024 (The 2024 Conference on Empirical Methods in Natural Language Processing)
Date: 15. or 16. November 2024 (TBA)
In Miami, Florida (hybrid event)
The workshop will be a hybrid event, i.e., we aim to facilitate online participation.
== Important Dates ==
Papers due: Thursday, *29. August 2024 *
Notification of accepted papers: Friday, 27. September 2024
Camera-ready papers due: Friday, 4. October 2024
Workshop date: 15. or 16. November 2024 (TBA)
All deadlines are midnight anywhere on earth (AOE).
== Overview ==
Wikipedia is a uniquely important resource for the NLP community; it is multilingual, can be freely reused under its open license, and is edited and maintained by a dedicated community of editors who have earned its status as a very high-quality dataset for many applications. With this value comes many tensions however: despite Wikipedia's presence in over 300 language editions, much focus in language modeling remains on the high-resource languages; despite the openness of Wikipedia and its role in many advances in natural language modeling, there are concerns that some of these advances such as generative text models could undermine Wikipedia and threaten its sustainability as a community and ultimately data resource; despite the heavy usage of Wikimedia data among the NLP community, few researchers work on developing tools that can contribute back to the Wikimedia community.
The goal of this workshop is both to celebrate Wikimedia's contributions to the NLP community and highlight approaches to ensuring the sustainability of this relationship for years to come. We will invite researchers to contribute novel uses of Wikimedia data or studies of the impact of Wikimedia data within the NLP community. We will also discuss successful approaches to developing tooling that can assist the Wikimedia community in maintaining and improving the breadth of the Wikimedia projects.
== Topics ==
We invite contributions on a wide range of topics related to NLP and Wikipedia, including but not limited to:
* Wikipedia text analysis and understanding
* Text generation and summarization for Wikipedia articles
* Multilingual and cross-lingual approaches for Wikipedia content
* Quality assessment and vandalism detection in Wikipedia
* Recommendation systems for Wikipedia content
* Semantic enrichment and entity linking in Wikipedia
* Applications of NLP for structured data in Wikimedia projects
* Misinformation detection for Wikipedia
* Ethical considerations and biases in NLP for Wikipedia
* Impact of LLMs on Wikipedia's communities
* Human-AI collaboration for improving Wikipedia content
* Benchmark datasets and evaluation metrics
* Knowledge-intensive NLP over Wikipedia content
We also encourage papers that include the creation of new datasets relevant to NLP tasks to support the Wikimedia communities. For example:
* References across languages by topic
* Edit summaries and associated diffs
* Talk page discussions and outcomes
* Edits that inserted new facts along with the text from the supporting reference
While we encourage use of Wikipedia content, NLP work from other Wikimedia platforms such as Wikisource or Wikidata labels is also welcome. If you have questions about potential research ideas or existing resources in a given topical area, feel free to reach out to the workshop organizers at nlp4wikipedia@googlegroups.com and we will do our best to help out.
== Submission Guidelines ==
We welcome the following types of contributions.
= Track 1: Novel Works =
The papers in this track will be peer-reviewed by at least three researchers using a single-blind review process and published as the workshop proceedings if accepted. We invite the following types of papers (page limits excluding references):
- Full research paper: Novel research contributions (8 pages)
- Short research paper: Novel research contributions of smaller scope than full papers (4 pages)
- Resource paper: New dataset or other resources directly relevant to Wikimedia research, including the publication of that resource (8 pages)
- Demo paper: New system supporting the Wikipedia community (4 pages)
Submissions must be as PDF using the ACL template, available here: https://github.com/acl-org/acl-style-files Papers have to be submitted through OpenReview: https://openreview.net/group?id=EMNLP/2024/Workshop/NLP_for_Wikipedia
= Track 2: Published Works =
This track welcomes papers previously published at a peer-reviewed research venue to be presented and discussed in the workshop. They do not have to follow the formatting and page limit instructions from Track 1 and can instead be submitted in the original format.
Previously published papers will be reviewed by the organising committee in terms of the topical fit and prominence of the publication venue. They will not be published as part of the proceedings. We invite the following types of papers:
- Full research paper: Previously published research contributions
- Resource paper: Previously published datasets or other resources that are important or interesting to the community
- Demo paper: Presenting a previously published system supporting the Wikipedia community
Papers have to be submitted through OpenReview (please add “[PUBLISHED]” at the beginning of the title on the submission page so we know that you are submitting to this track): https://openreview.net/group?id=EMNLP/2024/Workshop/NLP_for_Wikipedia
Best,
Isaac Johnson, Wikimedia Foundation
On behalf of the rest of the organizing committee:
Lucie-Aimée Kaffee, Hugging Face
Tajuddeen Gwabade, Masakhane
Fabio Petroni, Samaya AI
Angela Fan, Meta
Daniel van Strien, Hugging Face