Dear All,
Small reminder that we will meet today here
<https://meet.google.com/yum-qabj-szx> at 4pm Warsaw/Berlin time to
present/ finalize working groups.
Also if you have any recommendations of people, especially from the Target
Inclusion Countries that we could support (page 2
<https://docs.google.com/document/d/19nqb-5WDQYSpvBf1620BuWHQ0746fGdRK48Q2mG…>),
would be great. I need an email to invite and they have to just write a few
sentences how the action could help them and vice versa.
Thank you,
Brett, Matt, Iolanda, et al.,
On Mon, Aug 26, 2024 at 4:55 PM Brett Buttliere <b.buttliere(a)uw.edu.pl>
wrote:
> Dear All,
>
> The COST Action application
> <https://docs.google.com/document/d/19nqb-5WDQYSpvBf1620BuWHQ0746fGdRK48Q2mG…>
> is moving along well in our opinion, thank you for all of your involvements
> and happy to have met so many of you at Wikimania.
>
> The grant is due October 26th, and given that it is about 2 months away,
> we want to get feedback on the grant as it is, and to talk about how you
> can join and support the Action, also in terms of creating the ECOST
> profile so you can participate formally.
>
> To do this, we will meet this Friday, August 30th, at 4pm, Warsaw Time,
> at this Google Meet <https://meet.google.com/yum-qabj-szx>.
>
> At this stage we are formulating working groups (~p.20 here
> <https://docs.google.com/document/d/19nqb-5WDQYSpvBf1620BuWHQ0746fGdRK48Q2mG…>)
> and trying to identify the working group leaders. If you think you would
> like to be a working group leader or if you have a suggestion for a working
> group (Especially if you are in a target country/ early career/ female)
> please come or at least send an email with some paragraphs of the general
> idea/ how you might like to be involved. There are some examples of working
> group descriptions on page 20-22.
>
> The working groups are defined right now approximately as:
>
> -
>
> Measuring and Demonstrating the Impact of Wikimedia for Open
> Knowledge.
> -
>
> Identifying and sharing best practices for research using the
> wikimedia ecosystem.
> -
>
> Development of materials to make it easier to contribute to Wikimedia.
> -
>
> Training trainers and educators in using the Wikimedia ecosystem in
> the classroom.
> -
>
> Establishing TOPS style guidelines for organizations to implement and
> sign onto.
> -
>
> Integrating the Wikimedia ecosystem with EOSC, presenting solutions
> e.g., WikiData.
> -
>
> Building disciplinary organizations to contribute/ review science
> content in their area.
> -
>
> Arming Universities and GLAMs with tools to make their content
> available i.e., open.
>
>
> The 4 year plan is to year 1, develop materials at Wikimania Paris, Year 2
> go together to an EOSC/ EU conference to present materials and bring them
> on board, Year 3, go in groups to present at disciplinary conferences, and
> Year 4 come back all together with a big conference in one of the
> inclusiveness nations.
>
> *We are looking for *group leaders, participants, as well as people who
> might be interested in the more leadership roles e.g., Grant Awarding
> Coordinator, Science Communication Coordinator, and general members of the
> Action Management Committee. Especially if you know some administration who
> has handled such a project before and could consult with us, it would be
> great.
>
> Hopefully, see you Friday at this link
> <https://meet.google.com/yum-qabj-szx>!
>
> Brett Buttliere, Matt Vetter, Iolanda Pensa, et al.
>
> University of Warsaw
>
>
Hello wiki-research community!
I'm sharing a call-for-papers for a workshop that I'm helping to organize
at EMNLP 2024 <https://2024.emnlp.org/> that will be focused on celebrating
Wikimedia's contributions to the NLP community and highlighting approaches
to ensuring the sustainability of this relationship for years to come. Our
website for the workshop is on Meta (and I've copied the relevant content
below as well):
https://meta.wikimedia.org/wiki/NLP_for_Wikipedia_(EMNLP_2024)
The workshop will be hybrid (virtual and in-person components). We have not
been assigned a date yet but it will either be November 15th or 16th. To
get a sense of potential costs, you can see last year's EMNLP conference
registration: https://2023.emnlp.org/registration/#virtual-pricing
== Overview ==
Co-located with the EMNLP 2024 (The 2024 Conference on Empirical Methods in
Natural Language Processing)
Date: 15. or 16. November 2024 (TBA)
In Miami, Florida (hybrid event)
The workshop will be a hybrid event, i.e., we aim to facilitate online
participation.
== Important Dates ==
Papers due: Thursday, *29. August 2024 *
Notification of accepted papers: Friday, 27. September 2024
Camera-ready papers due: Friday, 4. October 2024
Workshop date: 15. or 16. November 2024 (TBA)
All deadlines are midnight anywhere on earth (AOE).
== Overview ==
Wikipedia is a uniquely important resource for the NLP community; it is
multilingual, can be freely reused under its open license, and is edited
and maintained by a dedicated community of editors who have earned its
status as a very high-quality dataset for many applications. With this
value comes many tensions however: despite Wikipedia's presence in over 300
language editions, much focus in language modeling remains on the
high-resource languages; despite the openness of Wikipedia and its role in
many advances in natural language modeling, there are concerns that some of
these advances such as generative text models could undermine Wikipedia and
threaten its sustainability as a community and ultimately data resource;
despite the heavy usage of Wikimedia data among the NLP community, few
researchers work on developing tools that can contribute back to the
Wikimedia community.
The goal of this workshop is both to celebrate Wikimedia's contributions to
the NLP community and highlight approaches to ensuring the sustainability
of this relationship for years to come. We will invite researchers to
contribute novel uses of Wikimedia data or studies of the impact of
Wikimedia data within the NLP community. We will also discuss successful
approaches to developing tooling that can assist the Wikimedia community in
maintaining and improving the breadth of the Wikimedia projects.
== Topics ==
We invite contributions on a wide range of topics related to NLP and
Wikipedia, including but not limited to:
* Wikipedia text analysis and understanding
* Text generation and summarization for Wikipedia articles
* Multilingual and cross-lingual approaches for Wikipedia content
* Quality assessment and vandalism detection in Wikipedia
* Recommendation systems for Wikipedia content
* Semantic enrichment and entity linking in Wikipedia
* Applications of NLP for structured data in Wikimedia projects
* Misinformation detection for Wikipedia
* Ethical considerations and biases in NLP for Wikipedia
* Impact of LLMs on Wikipedia's communities
* Human-AI collaboration for improving Wikipedia content
* Benchmark datasets and evaluation metrics
* Knowledge-intensive NLP over Wikipedia content
We also encourage papers that include the creation of new datasets relevant
to NLP tasks to support the Wikimedia communities. For example:
* References across languages by topic
* Edit summaries and associated diffs
* Talk page discussions and outcomes
* Edits that inserted new facts along with the text from the supporting
reference
While we encourage use of Wikipedia content, NLP work from other Wikimedia
platforms such as Wikisource or Wikidata labels is also welcome. If you
have questions about potential research ideas or existing resources in a
given topical area, feel free to reach out to the workshop organizers at
nlp4wikipedia(a)googlegroups.com and we will do our best to help out.
== Submission Guidelines ==
We welcome the following types of contributions.
= Track 1: Novel Works =
The papers in this track will be peer-reviewed by at least three
researchers using a single-blind review process and published as the
workshop proceedings if accepted. We invite the following types of papers
(page limits excluding references):
- Full research paper: Novel research contributions (8 pages)
- Short research paper: Novel research contributions of smaller scope than
full papers (4 pages)
- Resource paper: New dataset or other resources directly relevant to
Wikimedia research, including the publication of that resource (8 pages)
- Demo paper: New system supporting the Wikipedia community (4 pages)
Submissions must be as PDF using the ACL template, available here:
https://github.com/acl-org/acl-style-files Papers have to be submitted
through OpenReview:
https://openreview.net/group?id=EMNLP/2024/Workshop/NLP_for_Wikipedia
= Track 2: Published Works =
This track welcomes papers previously published at a peer-reviewed research
venue to be presented and discussed in the workshop. They do not have to
follow the formatting and page limit instructions from Track 1 and can
instead be submitted in the original format.
Previously published papers will be reviewed by the organising committee in
terms of the topical fit and prominence of the publication venue. They will
not be published as part of the proceedings. We invite the following types
of papers:
- Full research paper: Previously published research contributions
- Resource paper: Previously published datasets or other resources that are
important or interesting to the community
- Demo paper: Presenting a previously published system supporting the
Wikipedia community
Papers have to be submitted through OpenReview (please add “[PUBLISHED]” at
the beginning of the title on the submission page so we know that you are
submitting to this track):
https://openreview.net/group?id=EMNLP/2024/Workshop/NLP_for_Wikipedia
Best,
Isaac Johnson, Wikimedia Foundation
On behalf of the rest of the organizing committee:
Lucie-Aimée Kaffee, Hugging Face
Tajuddeen Gwabade, Masakhane
Fabio Petroni, Samaya AI
Angela Fan, Meta
Daniel van Strien, Hugging Face
--
Isaac Johnson <https://meta.wikimedia.org/wiki/User:Isaac_(WMF)> (he/him)
-- Senior Research Scientist -- Wikimedia Foundation
Please feel free to forward to anyone you know who might be interested.
Thank you,
Brett
---------- Forwarded message ---------
From: Brett Buttliere <b.buttliere(a)uw.edu.pl>
Date: Mon, Aug 26, 2024 at 4:55 PM
Subject: Cost Action Meeting on Friday August 30th, at 4pm
To: <wiki-research-l(a)lists.wikimedia.org>
Dear All,
The COST Action application
<https://docs.google.com/document/d/19nqb-5WDQYSpvBf1620BuWHQ0746fGdRK48Q2mG…>
is moving along well in our opinion, thank you for all of your involvements
and happy to have met so many of you at Wikimania.
The grant is due October 26th, and given that it is about 2 months away, we
want to get feedback on the grant as it is, and to talk about how you can
join and support the Action, also in terms of creating the ECOST profile so
you can participate formally.
To do this, we will meet this Friday, August 30th, at 4pm, Warsaw Time, at
this Google Meet <https://meet.google.com/yum-qabj-szx>.
At this stage we are formulating working groups (~p.20 here
<https://docs.google.com/document/d/19nqb-5WDQYSpvBf1620BuWHQ0746fGdRK48Q2mG…>)
and trying to identify the working group leaders. If you think you would
like to be a working group leader or if you have a suggestion for a working
group (Especially if you are in a target country/ early career/ female)
please come or at least send an email with some paragraphs of the general
idea/ how you might like to be involved. There are some examples of working
group descriptions on page 20-22.
The working groups are defined right now approximately as:
-
Measuring and Demonstrating the Impact of Wikimedia for Open Knowledge.
-
Identifying and sharing best practices for research using the wikimedia
ecosystem.
-
Development of materials to make it easier to contribute to Wikimedia.
-
Training trainers and educators in using the Wikimedia ecosystem in the
classroom.
-
Establishing TOPS style guidelines for organizations to implement and
sign onto.
-
Integrating the Wikimedia ecosystem with EOSC, presenting solutions
e.g., WikiData.
-
Building disciplinary organizations to contribute/ review science
content in their area.
-
Arming Universities and GLAMs with tools to make their content available
i.e., open.
The 4 year plan is to year 1, develop materials at Wikimania Paris, Year 2
go together to an EOSC/ EU conference to present materials and bring them
on board, Year 3, go in groups to present at disciplinary conferences, and
Year 4 come back all together with a big conference in one of the
inclusiveness nations.
*We are looking for *group leaders, participants, as well as people who
might be interested in the more leadership roles e.g., Grant Awarding
Coordinator, Science Communication Coordinator, and general members of the
Action Management Committee. Especially if you know some administration who
has handled such a project before and could consult with us, it would be
great.
Hopefully, see you Friday at this link
<https://meet.google.com/yum-qabj-szx>!
Brett Buttliere, Matt Vetter, Iolanda Pensa, et al.
University of Warsaw
Hello everyone,
It’s been just over a week since returning from Wikimania, and I wanted to
share some reflections on the experience.
This year marked my first in-person Wikimania since joining the Wikimedia
Foundation. It also coincided with the completion of a year in the role of
Research Community Officer, and came at a time when many initiatives for
the research community, including the Research Fund and Wiki Workshop, are
being planned for the year ahead.
My primary goals at Wikimania were to share more about the work we’re doing
within the research community at WMF, understand the needs of community
members, and build connections with them. Here are a few highlights from
the week that I’d like to share:
-
I enjoyed attending many great sessions as part of the Research track
and other tracks led by research community members, such as Wikimedia
and Public AI: a tale of two cultural technologies
<https://wikimania.eventyay.com/2024/talk/8BVHMP/>, Abstract Wikipedia
and the dream of a Universal Language
<https://wikimania.eventyay.com/2024/talk/NUXQAC/>, Opening the Academia
keynote <https://wikimania.eventyay.com/2024/talk/RA8R3V/>, to name just
a few. We also had excellent sessions from current and former Research Fund
recipients like Thank you for the flowers but I would like a Wikimartisor
<https://wikimania.eventyay.com/2024/talk/CEJCLL/> and Codifying Digital
Behavior: A Socio-Legal Study of the Wikimedia Universal Code of Conduct
<https://wikimania.eventyay.com/2024/talk/GNDQ39/>. All these sessions,
along with many others, underscored the dedication of the community to our
projects and movement.
-
I had a lot of side discussions with many of you about your needs from
the Research Team and the Foundation. For instance, several attendees
shared challenges in accessing certain Wikimedia Research resources and
expressed a desire for more updates on ongoing initiatives like Wiki
Workshop and the Research Showcase. These conversations have given us a
clearer understanding of how we can enhance communication and resources for
the community moving forward.
-
One of the week’s highlights was organizing the first-ever Research
Meetup at Wikimania. The turnout exceeded expectations, with attendees
staying beyond the scheduled time. This gathering fostered new connections,
sparked conversations about mutual research interests, and opened up
possibilities for future collaborations.
Overall, the week was filled with positive and encouraging exchanges. The
energy from these discussions will help guide ongoing efforts to strengthen
Wikimedia Research. In the near future, updates on initiatives inspired by
these conversations will be shared here on the mailing list. Meanwhile, if
you'd like to continue the dialogue, feel free to schedule an office hour
<https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours> with me or
a member of the team. Additionally, if you know others who might benefit
from engaging with the Wikimedia Research Community, please encourage them
to subscribe to this list and follow @wikiresearch
<https://x.com/wikiresearch> on Twitter/X for the latest updates.
Looking forward to continuing this important work together.
Best,
Kinneret
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello everyone,
If you are at Wikimania, we would like to invite you to join us at a *Wiki
Research Meetup *tomorrow. There is no set agenda, so no need to prepare
anything in advance. Just come hang out, enjoy, and get together with
other Wiki Research community members!
*Date:* Tomorrow, August 8, 2024
*Time:* 6:00 PM
*Location:* Room 1, ICC (main Wikimania venue)
Looking forward to seeing many of you!
Best,
Kinneret
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
Open science aims to make research results and materials freely accessible to everyone, with the goal of increasing knowledge circulation, increasing transparency, and providing the means to reproduce and generalize published findings.
Venue: Max Planck Institute for Demographic Research (MPIDR), Rostock, Germany
Dates: March 17-18, 2025
Website: https://www.demogr.mpg.de/en/news_events_6123/calendar_1921/second_rostock_…
--
This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.
Hi all,
The next Research Showcase will be live-streamed next Wednesday, July 24,
at 9:30 AM PST / 16:30 UTC. Find your local time here
<https://zonestamp.toolforge.org/1721838600>. The theme for this showcase is
*Machine Translation on Wikipedia*.
You are welcome to watch via the YouTube stream:
https://www.youtube.com/live/O7AqvHgqUVk. As usual, you can join the
conversation in the YouTube chat as soon as the showcase goes live.
This month's presentations:
The Promise and Pitfalls of AI Technology in Bridging Digital Language
DivideBy *Kai Zhu, Bocconi University*Machine translation technologies have
the potential to bridge knowledge gaps across languages, promoting more
inclusive access to information regardless of native languages. This study
examines the impact of integrating Google Translate into Wikipedia's
Content Translation system in January 2019. Employing a natural experiment
design and difference-in-differences strategy, we analyze how this
translation technology shock influenced the dynamics of content production
and accessibility on Wikipedia across over a hundred languages. We find
that this technology integration leads to a 149% increase in content
production through translation, driven by existing editors becoming more
productive as well as an expansion of the editor base. Moreover, we observe
that machine translation enhances the propagation of biographical and
geographical information, helping to close these knowledge gaps in the
multilingual context. However, our findings also underscore the need for
continued efforts to mitigate the preexisting systemic barriers. Our study
contributes to our knowledge on the evolving role of artificial
intelligence in shaping knowledge dissemination through enhanced language
translation capabilities.Implications of Using Inorganic Content in Arabic
Wikipedia EditionsBy *Saied Alshahrani and Jeanna Matthews, Clarkson
University*Wikipedia articles (content pages) are one of the widely
utilized training corpora for NLP tasks and systems, yet these articles are
not always created, generated, or even edited organically by native
speakers; some are automatically created, generated, or translated using
Wikipedia bots or off-the-shelf translation tools like Google Translate
without human revision or supervision. We first analyzed the three Arabic
Wikipedia editions, Arabic (AR), Egyptian Arabic (ARZ), and Moroccan Arabic
(ARY), and found that these Arabic Wikipedia editions suffer from a few
serious issues, like large-scale automatic creations and translations from
English to Arabic, all without human involvement, generating content
(articles) that lack not only linguistic richness and diversity but also
content that lacks cultural richness and meaningful representation of the
Arabic language and its native speakers. We second studied the performance
implications of using such inorganic, unrepresentative articles to train
NLP tasks or systems, where we intrinsically evaluated the performance of
two main NLP upstream tasks, namely word representation and language
modeling, using word analogy and fill-mask evaluations. We found that most
of the models trained on the organic and representative content
outperformed or, at worst, performed on par with the models trained with
inorganic content generated using bots or translated using templates
included, demonstrating that training on unrepresentative content not only
impacts the representation of native speakers but also impacts the
performance of NLP tasks or systems. We recommend avoiding utilizing the
automatically created, generated, or translated articles on Wikipedia when
the task is a representation-based task, like measuring opinions,
sentiments, or perspectives of native speakers, and also suggest that when
registered users employ automated creation or translation, their
contributions should be marked differently than “registered user” for
better transparency; perhaps “registered user (automation-assisted)”.
Best,Kinneret
Dear colleagues,
I am writing to you on behalf of Jing Lu, an MSc student specialising in
Human-Computer Interaction at the School of Computer Science, University of
St Andrews. Jing is researching a collaborative Wiki editing tool as part
of her dissertation project.
Jing is currently seeking Wikipedians in the UK to evaluate an
early prototype of her tool. Due to time constraints and the challenges
Jing faced in finding participants, your prompt response would be greatly
appreciated! Below is the detailed invitation from Jing:
------------------------------
Hello everyone,
My name is Jing Lu, and I am a postgraduate studenspecialisingng in
Human-Computer Interaction at the School of Computer Science, University of
St Andrews. I am currently working on my dissertation project titled
*"WikiSync:
A New Wikipedia Onboarding Tool: Improving Wikipedia Editor Retention."*
This tool aims to enhance the training experience for new editors by
providinsynchroniseded editing capabilities. I am seeking participants to
help evaluate the interface design of this tool and to test its current
functionalities.
The evaluation will be conducted in three parts, all done remotely, and you
will be working with a group of other participants. Please note that the
evaluation is not difficult and requires no preparation or training. You
simply need to interact with the tool based on your intuition and share
your thoughts.
* 1. Part One: Using a computer or tablet (not a mobile phone), you will
collaborate with other participants to edit content using the provided URL.
During this session, your screen activity will be recorded.2. Part Two: You
will join other participants in a group interview to share your experiences
and feedback on using the tool.3. Part Three: You will complete an online
questionnaire to provide your overall impressions of the interface design.*
Duration: The entire evaluation process will take approximately 1.5 to 2
hours.
*Reward:* *£15 Amazon voucher for each participant*.
*If you are interested in participating, please click the link to fill out
a survey:*
https://qualtricsxmzl6txwqr6.qualtrics.com/jfe/form/SV_6GtmFXZVnGpFYxM
Very soon after, I will contact you via the email you provided to discuss
your availability and schedule the evaluation. Given the time-sensitive
nature of an MSc project, your prompt participation would be greatly
appreciated. If you have any questions, feel free to contact me directly.
Contact detail: Jing Lu (jl402(a)st-andrews.ac.uk)
------------------------------
Additionally, if you could also share this invitation with your newly
trained Wikipedians, it would be incredibly helpful for Jing’s research.
Thank you for considering this opportunity to support Jing’s research. Your
participation would be really appreciated!
Please let me or Jing know if you have any questions.
Best regards,
Abd
----
*Dr Abd Alsattar Ardati*
*Lecturer*
School of Computer Science
University of St Andrews
St Andrews, KY16 9SX
Contact: +44 (0)1334 461861 <+441334461861> / abd.ardati(a)st-andrews.ac.uk
I aspire to a healthy life:work balance. Please only respond to my emails
during your normal working hours; I do not expect a response outwith these
hours.
The University of St Andrews is a charity registered in Scotland, No:
SC013532