Hi all,
As part of our efforts to better serve the Wikimedia research community, we
are happy to share that we are collaborating with the Security team at WMF
to help prioritize the release of data that can be useful for your
research. The Security team is working to make more datasets privatized and
public to avoid the need for non-disclosure agreements. You can learn more
here: https://meta.wikimedia.org/wiki/Differential_privacy.
Over the next 12 months, the Security team plans to release 5 datasets:
-
country-language-pageview ongoing (end of 2022)
-
country-language-pageview historical (March 2023)
-
geo-aggregated grants data back to 2009 (Feb 2023)
-
geoeditors monthly (June 2023)
-
dataset informed by research community priorities identified in this
survey (second half of 2023)
The released datasets need to meet certain privacy requirements:
-
They can not include any natural language (e.g. specific search queries
or deletion logs) so as to avoid the release of personally identifiable
information;
-
They need to be sufficiently large (at least thousands of entries,
preferably more) so as to reduce noise;
-
The data can not be so sensitive that an individual user will be harmed
by disclosure of the data (e.g. IP addresses, content containing personally
identifying information).
We invite you to complete a brief survey
<https://docs.google.com/forms/d/e/1FAIpQLSe_LAt6V2Q1GUf3Z8lnt8uAOZnHTO5rNgF…>
to help us identify and prioritize the types of datasets that you would
find useful for your work. Results of this survey will inform the fifth
dataset, scheduled to be released in late 2023. This survey is conducted
via a third-party service, which may subject it to additional terms. For
more information on privacy and data-handling, see the survey privacy
statement:
https://foundation.wikimedia.org/wiki/Legal:Data_Release_Priorities_Survey_…
The survey will remain open until November 3, 2022. After that time,
members of the Research and Security teams will review the data and report
out about the suggestions that were received and how the work will proceed.
If you prefer to not respond via the Google form, you can email your
feedback to us or set up a time to discuss. You can also leave questions
and comments on the Talk page:
https://meta.wikimedia.org/wiki/Differential_privacy
Thanks for your help!
Emily Lescak, WMF Research team
Hal Triedman, WMF Security team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hello everyone,
The next Research Showcase, focused on Editor Retention, will be
live-streamed Wednesday, January 18. Find your local time here
<https://zonestamp.toolforge.org/1674063059>.
YouTube stream: https://www.youtube.com/watch?v=gS8ELcVZ8Q4
You can join the conversation on IRC at #wikimedia-research. You can also
watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Vital Signsː Measuring Wikipedia Communities’ HealthBy *Cristian Consonni,
Eurecat - Centre Tecnològic de Catalunya, Barcelona*Community health in
Wikipedia is a complex topic that has been at the center of discussion for
Wikipedia and the scientific community for years. Researchers observed that
the number of active editors for the largest Wikipedias started declining
after an initial phase of exponential growth. Some media outlets picked
this fact as a death announcement for the project, but the news of
Wikipedia's death turned out to be greatly exaggerated. However, it remains
true that researchers and community activists need to understand how to
measure community health and describe it more accurately. In this
presentation, we would like to go beyond the traditional metrics used to
describe the status of the community. We propose the creation of 6 sets of
language-independent indicators that we call "Vital Signs." We borrow the
analogy from the medical field, as these indicators represent a first step
in defining the health status of a community; they can constitute a
valuable reference point to foresee and prevent future risks. We present
our analysis for several Wikipedia language editions, showing that
communities renew their productive force even with stagnating absolute
numbers; we observe a general need for renewal in positions related to
particular functions or administratorship. We created a dashboard to
visualize all the indicators we have computed and hope that the communities
will find it helpful for improving their health.
- Paperː Community Vital Signs: Measuring Wikipedia Communities’
Sustainable Growth and Renewal
<https://meta.wikimedia.org/wiki/File:Community_Vital_Signs_Research_Paper_-…>
Learning to Predict the Departure Dynamics of Wikidata EditorsBy *Guangyuan
Piao, Maynooth University*Wikidata as one of the largest open collaborative
knowledge bases has drawn much attention from researchers and practitioners
since its launch in 2012. As it is collaboratively developed and maintained
by a community of a great number of volunteer editors, understanding and
predicting the departure dynamics of those editors are crucial but have not
been studied extensively in previous works. In this paper, we investigate
the synergistic effect of two different types of features: statistical and
pattern-based ones with DeepFM as our classification model which has not
been explored in a similar context and problem for predicting whether a
Wikidata editor will stay or leave the platform. Our experimental results
show that using the two sets of features with DeepFM provides the best
performance regarding AUROC (0.9561) and F1 score (0.8843), and achieves
substantial improvement compared to using either of the sets of features
and over a wide range of baselines.
- Paperː Learning to Predict the Departure Dynamics of Wikidata Editors
<https://parklize.github.io/publications/ISWC2021.pdf>
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
ACM/IEEE JCDL 2023 – June 26 - 30, 2023, Santa Fe, New Mexico
https://2023.jcdl.org/
*Rethinking Digital Records*
/Exploring new perspectives, challenges, and opportunities for
libraries, archives, museums, and galleries/
The notion of what constitutes a digital library has evolved over time.
In recent years, the ability to retain a digital record of the volatile
world has been of critical importance for ensuring that our collective
history is available for posterity. Progressing research and
disseminating state-of-the-art advancements in digital libraries are
thus of supreme importance to humanity.
The annual ACM/IEEE Joint Conference on Digital Libraries (JCDL) is
the primary international event for the inter- and multi-disciplinary
community of academics and practitioners in digital libraries coming
from computer, information and social sciences, and other related
disciplines. JCDL encompasses the many meanings of the term digital
libraries, including notions of managing, operating, developing,
curating, evaluating, or utilizing collections of
data/information/knowledge in various domains.
*Topics*
Topics of interest, as they relate to digital libraries, include, but
are not limited to:
Users and Interactions
- Collaborative and participatory information environments
- Crowdsourcing and human computation
- Human-information interaction
- Information visualization
- Social networks, virtual organizations and networked information
- Social media, community building, and applications
- User behavior and modeling
Search and Recommendation
- AI / Machine learning / Data mining for DLs
- Dataset retrieval
- Information and knowledge systems
- Information retrieval
- Knowledge discovery
- Natural language processing
- Navigational and exploratory search
- Personalization and contextualization
Digital Libraries in Practice
- Digital archiving and preservation
- Digital humanities and heritage
- Knowledge organization systems in practice
- Personal digital information management
- Performance evaluation
- Policy and law
- Privacy and intellectual property
- Scientific data management
Content and Structures
- Data curation and stewardship
- Document genres
- Extracting semantics, entities, and patterns from large collections
- Infrastructure and service design
- Linked data and its applications
- Research data management
- Web and network science
*Paper Types and Formats*
JCDL 2023 offers two paper submission deadlines: Submissions for
research papers (long or short) are due on January 29, 2023. The
deadline for late-breaking results and datasets is February
12, 2023. The submission formats are outlined below.
*Research Papers* (deadline: January 29, 2023)
Authors may choose between two formats:
- /Full papers/ have at most 10 pages and report on mature work, or
efforts that have reached an important milestone. They will get
presentation slots of 20 to 30 minutes.
- /Short papers/ have at most 4 pages and highlight efforts that might
be in an early stage, but are important for the community to
be made
aware of; they can also present theories or systems that can be
described concisely in the limited space. Short papers will get
presentation slots of 10 to 15 minutes.
*Late Breaking Results and Datasets* (deadline: February 12, 2023)
This comprises submissions falling into the following categories:
- /Late breaking results/ present new insights or information about
research that was completed after the research paper submission
deadline.
- /Dataset submissions/ a new category that allow description of
relevant research datasets. These need to be either fully
publicly
available or have to contain a publicly available subset.
Late Breaking Results and Datasets submissions should be 2-4 pages and
will be allotted a 5 minute presentation slot at the conference.
*Submission Guidelines*
All submissions must be original works, not previously published or
under review for publication elsewhere, in English, in PDF format, and
in the current ACM two-column conference format. Suitable LaTeX, Word,
and Overleaf templates are available from the ACM Website (use
"sigconf" proceedings template for LaTeX and the Interim Template for
Word, https://www.acm.org/publications/proceedings-template).
Complete papers are required; submissions consisting solely of an
abstract or those that are otherwise incomplete will not be reviewed.
For all formats, references do not count to the page
limit. Submissions are to be made via
https://easychair.org/conferences/?conf=jcdl2023
*All submissions will be rigorously peer-reviewed in a double-blind
reviewing process.*
*Submissions must be anonymous* and all references to authors' works
have to be anonymized. We recommend using services like
https://anonymous.4open.science/ to anonymously share code or
data. Anonymized works that are available as preprints (e.g., on arXiv
or SSRN) may be submitted without citing them. Reviewers will be
instructed not to actively look for such preprints, and finding such a
preprint does not clash with our submission policies.
All accepted papers will be included in the proceedings and will be
presented at the conference. At least one author of each accepted
paper is required to register for, and present the work at the
conference on-site in Santa Fe. In case of traveling restrictions
(COVID related or otherwise), an exception may be made to allow
authors to present the work remotely.
*Calls for workshops and tutorials, posters and demos, and panels will
be published separately.*
*Submission Deadlines*
All dates are Anywhere on Earth (AoE)
- January 29, 2023 – Research paper submissions
- February 12, 2023 – Late breaking results, preliminary works,
datasets submissions
- Mid-March, 2023 – Notification of acceptance
- April 2, 2023 – Final camera-ready deadline for all submissions
*Program Chairs*
- Anat Ben-David, Open University of Israel
- Robert Jäschke, Humboldt-Universität zu Berlin
- Mat Kelly, Drexel University
*General Chair*
- Martin Klein, Los Alamos National Laboratory
*Contact*
For any questions about paper submissions you may contact the program
chairs by email to jcdl2023(a)easychair.org.
--
Prof. Dr. Robert Jäschke
Humboldt-Universität zu Berlin & L3S Research Center Hannover
< https://amor.cms.hu-berlin.de/~jaeschkr/ >< +49 (0)30 2093-70960 >
< https://weltliteratur.net/ >>>>><<<<< https://dev.bibsonomy.org/ >