Hello,
We are enthusiastic to announce that we will open the next request
for proposals to the Research Fund in the coming months. We encourage you
to review the Meta page [1] for general information about the Fund,
eligibility criteria, and previous submissions and begin planning your
proposals. Stay tuned for further instructions, office hour dates, and an
announcement of the first round of grantees. You can reach out to
research_fund(a)wikimedia.org with questions.
Best,
Emily, on behalf of the Research Fund Organizing Committee
[1]
https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_%26_Tech…
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi all,
If you interact with the Wikimedia Foundation's Research team [1] or
rely on some of the services that we offer to the Wikimedia research
community, I'd like to share an update with you regarding a change on
our team's end for the period of July 2022 until June 30, 2023 in
terms of how we invest our time on the Wikimedia Research Community
front.
==Context for the change==
We want to build a course that can help researchers (or potential
future researchers) learn how to contribute their research expertise
to the Wikimedia research community and Wikimedia Movement most
effectively and joyfully. :) Developing the course material will
require significant time investment from our small team. As a result,
we are pausing a couple of the existing team initiatives/activities or
are reducing time investment on some fronts. We may pause or lower
time investments on some other fronts as we progress towards
developing the course.
==What are we changing?==
1. We are pausing Monthly Research Office Hours [2] for 6 months with
a possibility of extending the pause to 12 months. Emily Lescak, our
senior Research Community Officer, had iterated over the format of the
office hours and we were looking forward to launching with the new
format in August. We will pick up experimenting with the changes that
she had planned once we start investing in this space again. :)
2. We are going to reduce our public speaking work (talks, tutorials,
keynotes, ...) and we may reduce some of the research community
service work we normally offer unless we have already committed to
them (PC member, track chair or PC chair roles and responsibilities,
etc.). We may continue doing the research service work in our
volunteer time.
3. We will be generally more conservative for picking up new
initiatives or actions/activities for our team.
==What will not change in the coming 6 months?==
* We are continuing to invest on three programs that our team has led
over the years: Address Knowledge Gaps, Improve Knowledge Integrity,
and Building the Foundations. [3]
* On the Research Community front, our team is currently planning to
continue to maintain these initiatives/activities:
** Research Fund
** WMF Research Award of the Year
** Wiki Workshop
** Monthly Research Showcases
** bi-annual Research Report
** Formal Collaborations program and mentoring interns
==Should you expect more changes in our services for you?==
We may need to reduce investments on more fronts. We will be able to
say this more accurately after we make some progress towards
developing the course. If we make some relatively major choices in
terms of time investment, we intend to continue notifying you through
this mailing list.
As a general approach: If there is a need for further prioritization
of activities, we will continue pausing or reducing time investment
considering the impact on the Wikimedia Research community as well as
estimated time needed for doing the activity.
==How can I be involved in developing the course?==
Thank you for considering joining forces on this front! :) Please
write to Emily Lescak (cc-ed) with your proposal about contributing to
the development of the course. We will reach out to you based on the
specifics of your proposal and the needs of the course.
And of course, there will be a page on MetaWiki about the course once
we know slightly more than "we want to develop a course".
==What we can look forward to?==
For years we have discussed developing a clear entry point for
researchers to learn how to contribute research to the Wikimedia
projects. We have done some initiatives within our team and in
collaborations with other teams: developing tutorials for the
resources that researchers can use, offering office hours to help
folks get started, investing in documentation, Research Funds, and
more. The course can act as a unifying effort or a force-multiplier
that can help us reach to more researchers from across the globe with
more diverse backgrounds and experiences and show them the possibility
and joy of contributing research to the Wikimedia projects.
If you have questions about this prioritization, you can reach out to
me directly or write on this thread.
Best,
Leila
[1] https://research.wikimedia.org/team.html
[2] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[3] https://research.wikimedia.org/projects.html
--
Leila Zia
Head of Research
Wikimedia Foundation
Hi all,
The next Research Showcase, featuring the recipients of this year's
Wikimedia Foundation Research Awards of the Year, will be live-streamed
Wednesday, July 20, at 9:30 AM PST/16:30 UTC. Find your local time here
<https://zonestamp.toolforge.org/1658334607>.
YouTube stream: https://www.youtube.com/watch?v=KMvXOQU5fX4
<https://www.youtube.com/watch?v=KMvXOQU5fX4>
You are welcome to ask questions via YouTube chat or on IRC at
#wikimedia-research.
This month's presentations:
Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine
LearningBy *Krishna Srinivasan (Google)*The milestone improvements brought
about by deep representation learning and pre-training techniques have led
to large performance gains across downstream NLP, IR and Vision tasks.
Multimodal modeling techniques aim to leverage large high-quality
visio-linguistic datasets for learning complementary information across
image and text modalities. In this talk, I introduce the Wikipedia-based
Image Text (WIT) Dataset to better facilitate multimodal, multilingual
learning. WIT is composed of a curated set of 37.5 million entity rich
image-text examples with 11.5 million unique images across 108 Wikipedia
languages.
WIT’s unique advantages include: WIT is the largest multimodal dataset by
the number of image-text examples by 3x (at the time of writing). WIT is
massively multilingual (first of its kind) with coverage over 100+
languages. WIT represents a more diverse set of concepts and real world
entities relative to what previous datasets cover.
WIT Dataset is available for download and use via a Creative Commons
license here: https://github.com/google-research-datasets/wit
I conclude the talk with future directions to expand and extend the WIT
dataset. Link to paperː https://arxiv.org/pdf/2103.01913.pdf
Assessing the Quality of Sources in Wikidata Across LanguagesBy *Gabriel
Amaral (King's College London)*Wikidata is one of the most important
sources of structured data on the web, built by a worldwide community of
volunteers. As a secondary source, its contents must be backed by credible
references; this is particularly important as Wikidata explicitly
encourages editors to add claims for which there is no broad consensus, as
long as they are corroborated by references. Nevertheless, despite this
essential link between content and references, Wikidata’s ability to
systematically assess and assure the quality of its references remains
limited. To this end, we carry out a mixed-methods study to determine the
relevance, ease of access, and authoritativeness of Wikidata references, at
scale and in different languages, using online crowdsourcing, descriptive
statistics, and machine learning. The findings help us ascertain the
quality of references in Wikidata, and identify common challenges in
defining and capturing the quality of user-generated multilingual
structured data on the web. Link to paperː
https://dl.acm.org/doi/abs/10.1145/3484828
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Emily, on behalf of the Research team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
[Apologies for cross-postings]
Final Call for Applications to the Doctoral Programme
Deadline 15. July 2022
15th Conference on Intelligent Computer Mathematics
- CICM 2022 -
September 19-23, 2022
Tbilisi, Georgia (hybrid event)
http://www.cicm-conference.org/2022
--------------------------------------------------------------------------------
Digital and computational solutions are becoming the prevalent means
for the generation, communication, processing, storage and curation of
mathematical information.
CICM brings together the many separate communities that have developed
theoretical and practical solutions for mathematical applications such
as computation, deduction, knowledge management, and user interfaces.
It offers a venue for discussing problems and solutions in each of
these areas and their integration.
CICM 2022 Invited Speakers:
* Erika Abraham (RWTH Aachen University)
* Deyan Ginev (FAU Erlangen-N��rnberg and NIST)
* S��bastien Gou��zel (IRMAR, Universit�� de Rennes 1)
CICM 2022 Programme committee:
see https://www.cicm-conference.org/2022/cicm.php?event=&menu=pc
The Doctoral Programme provides a dedicated forum for PhD students to
present and discuss their ideas, ongoing or planned research, and
achieved results in an open atmosphere. It will consist of
presentations by the PhD students to get constructive feedback,
advice, and suggestions from the research advisory board, researchers,
and other PhD students. Each PhD student will be assigned to an
experienced researcher from the research advisory board who will act
as a mentor and who will provide detailed feedback and advice on their
intended and ongoing research.
Application
Students at any stage of their PhD can apply and should submit the
following documents:
* A two-page abstract of your thesis describing your research
questions, research plans, completed and remaining research,
evaluation plans and publication plans;
* A two-page CV that includes background information (name,
university, supervisor), education (degree sought, year/status of
degree, previous degrees), employments, relevant research experience
(publications, presentations, attended conferences or workshops,
etc.)
- Deadline: July 15, 2022 (not a cut-off time and late submissions may still be considered)
- Notification of acceptance: July 29, 2022
All submissions should be made via EasyChair at
https://easychair.org/conferences/?conf=cicm2022
+++ apologies for cross-postings +++
The Max Planck Institute for Demographic Research<http://www.demogr.mpg.de/en> (MPIDR) is seeking to appoint a full-time post-doctoral researcher to join the Research Group on Labor Demography<http://www.demogr.mpg.de/en/research_6120/labor_demography_4733/>.
We welcome applications from researchers with a PhD in demography, sociology, economics, statistics, epidemiology, or a similar field. The successful candidate will work on a project aimed at understanding which factors are shaping the length of working life, and they will develop their own agenda within this project. We are seeking creative, self-driven, and collaborative scholars. Strong quantitative analysis skills are required. Knowledge of R or Stata is an advantage, as is experience with longitudinal data analysis, causal inference, or labor market research.
We provide a stimulating research-oriented community, an excellent infrastructure, and opportunities to work with exciting datasets. The successful applicant will be offered a contract for up to 4 years with remuneration commensurate to experience (starting from approx. 57,000 EUR gross per year for researchers who have just completed their PhD, up to approx. 71,000 EUR gross per year for more senior scientists), based on the salary structure of the German public sector (Öffentlicher Dienst, TVöD Bund). It is expected that successful applicants will be in residence at the MPIDR. Support for relocation costs is available.
Please apply online via this portal<http://survey.demogr.mpg.de/index.php/846559?lang=en> and include in a single PDF file:
1. Letter of interest (max. 1 page)
2. Curriculum Vitae (max. 3 pages, focusing on your most relevant achievements)
3. A writing example (e.g., one of your publications)
4. Contact information for up to 2 academic referees
In order to receive full consideration, please apply by August 15. Interviews are planned for September. The exact starting date is flexible. Applicants should have completed their doctoral degree; however, PhD students who expect to obtain their degree in 2022 or early 2023 may apply.
For inquiries about the position, please contact Christian Dudel<http://www.demogr.mpg.de/en/about_us_6113/staff_directory_1899/christian_du…> at dudel(a)demogr.mpg.de<mailto:dudel@demogr.mpg.de?subject=Post-doc%3A%20Labor%20Demography>.
The MPIDR is one of the leading demographic research centers in the world. It is part of the Max Planck Society<http://www.mpg.de/en>, a network of 86 institutes that form Germany's premier basic-research organization. Max Planck Institutes have an established record of world-class, foundational research in the sciences, technology, social sciences and the humanities. They offer a unique environment that combines the best aspects of an academic setting and a research laboratory.
The Max Planck Society offers a broad range of measures to support the reconciliation of work and family. These are complemented by the MPIDR's own initiatives. For more information, see: demogr.mpg.de/go/work-family<https://www.demogr.mpg.de/go/work-family>.
In addition, there are a range of central initiatives and measures primarily geared towards helping young female researchers and mothers to advance their career. See the link below for some examples: demogr.mpg.de/go/career-development<https://www.demogr.mpg.de/go/career-development>.
We value diversity and are keen to employ individuals from minorities and under-represented groups.
The Max Planck Society is committed to increasing the number of individuals with disabilities in its workforce and therefore encourages applications from such qualified individuals. Furthermore, the Max Planck Society seeks to increase the number of women in those areas where they are underrepresented and therefore explicitly encourages women to apply.
--
This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.
Dear all,
due to several requests we decided to extend the deadlines for the
Poster & Demo track.
The new dates are as follows:
Paper Submission Deadline: July 11, 2022 (11:59 pm, Hawaii time
- originally July 4)
Notification of Acceptance: July 22, 2022 (11:59 pm, Hawaii time
originally July 4)
Camera-Ready Paper: August 15, 2022 (11:59 pm, Hawaii time)
SEMANTiCS 2022 especially invites contributions that target the
intersections between computational semantics and neighboring research
areas such as machine learning, language technologies, sensor
technologies, distributed ledgers and beyond.
For details please go to: https://2022-eu.semantics.cc/cfp
Looking forward to your submissions! Stay tuned and stay safe!
Please also pay attention to our early bird registration discounts
available till August 6!
We are looking forward to meet you in Vienna!
With kind regards,
Umutcan Simsek & David Chavez-Fraga
-- P&D Track Chairs --
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours Tuesday, 2022-07-05. Find your local time here
<https://zonestamp.toolforge.org/1657036800>.
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3]. You are welcome to add questions / items to the etherpad in advance,
or when you arrive at the session. Even if you are unable to attend the
session, you can leave a question that we can address asynchronously. If
you do not have a specific agenda item, you are welcome to hang out and
enjoy the conversation. More detailed information (e.g., about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves available to answer
research related questions that you as Wikimedia volunteer editors,
organizers, affiliates, staff, and researchers face in your projects and
initiatives. Here are some example cases we hope to be able to support you
with:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour. However, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Emily, on behalf of the WMF Research Team
[1] https://research.wikimedia.org
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
The Third Wikidata Workshop
Second Call for Papers
Co-located with the 21st International Conference on Semantic Web (ISWC
2022).
Date: October 23 or 24, 2022
The workshop will be held online, afternoon European time.
Website: https://wikidataworkshop.github.io/2022/
== Important dates ==
Papers due: Friday, 29 July 2022
Notification of accepted papers: Friday, September 23, 2022
Camera-ready papers due: Monday, October 3, 2022
Workshop date: October 23/24, 2022
== Overview ==
Wikidata is an openly available knowledge base, hosted by the Wikimedia
Foundation. It can be accessed and edited by both humans and machines and
acts as a common structured-data repository for several Wikimedia projects,
including Wikipedia, Wiktionary, and Wikisource. It is used in a variety of
applications by researchers and practitioners alike.
In recent years, we have seen an increase in the number of publications
around Wikidata. While there are several dedicated venues for the broader
Wikidata community to meet, none of them focuses on publishing original,
peer-reviewed research. This workshop fills this gap - we hope to provide a
forum to build this fledgling scientific community and promote novel work
and resources that support it.
The workshop primarily seeks original contributions that address the
opportunities and challenges of creating, contributing to, and using a
global, collaborative, open-domain, multilingual knowledge graph such as
Wikidata.
We encourage a range of submissions, including novel research, opinion
pieces, and descriptions of systems and resources, which are naturally
linked to Wikidata and its ecosystem or enabled by it. What we are less
interested in are works that use Wikidata alongside or in lieu of other
resources to carry out some computational task - unless the work feeds back
into the Wikidata ecosystem, for instance by improving or commenting on
some Wikidata aspect, or suggesting new design features, tools, and
practices.
This year, we also added a track for already published work. To foster
conversations around the topic of Wikidata, we invite authors of papers
published at other conferences to submit their papers to present at the
workshop. These will not be included in the proceedings but gives a chance
for authors to interact with the community.
We welcome interdisciplinary work, as well as interesting applications that
shed light on the benefits of Wikidata and discuss areas of improvement.
The workshop is planned as an interactive half-day event, in which most of
the time will be dedicated to discussions and exchange rather than oral
presentations. For this reason, all accepted papers will be presented in
short talks and accompanied by a poster. All works will be presented
online.
== Topics ==
Topics of submissions include, but are not limited to:
- Data quality and vandalism detection in Wikidata
- Referencing in Wikidata
- Anomaly, bias, or novelty detection in Wikidata
- Algorithms for aligning Wikidata with other knowledge graphs
- The Semantic Web and Wikidata
- Community interaction in Wikidata
- Multilingual aspects in Wikidata
- Machine learning approaches to improve data quality in Wikidata
- Tools, bots, and datasets for improving or evaluating Wikidata
- Participation, diversity, and inclusivity aspects in the Wikidata
ecosystem
- Human-bot interaction
- Managing knowledge evolution in Wikidata
- Abstract Wikipedia
== Submission guidelines ==
We welcome the following types of contributions.
= Track 1: Novel Works =
The papers in this track will be peer-reviewed by at least three
researchers. Accepted papers will be published as open access papers on
CEUR (authors can also waive this). We invite the following types of papers:
- Full research paper: Novel research contributions (7-12 pages)
- Short research paper: Novel research contributions of smaller scope than
full papers (3-6 pages)
- Position paper: Well-argued ideas and opinion pieces, not yet in the
scope of a research contribution (6-8 pages)
- Resource paper: New dataset or other resources directly relevant to
Wikidata, including the publication of that resource (8-12 pages)
- Demo paper: New system critically enabled by Wikidata (6-8 pages)
Submissions must be as PDF or HTML, formatted in the style of the Springer
Publications format for Lecture Notes in Computer Science (LNCS). For
details on the LNCS style, see Springer’s Author Instructions.
Papers have to be submitted through easychair (Please add “[NOVEL]” in the
beginning of the title on the submission page so we know that you are
submitting to this track):
https://easychair.org/my/conference?conf=wikidataworkshop2022
= Track 2: Published works =
This track welcomes papers previously published at a peer-reviewed research
venue, to be presented and discussed in the workshop. They do not have to
follow the formatting and page limit instructions from Track 1, and can
instead be submitted in the original format.
Previously published papers will be reviewed by the organising committee in
terms of topical fit and prominence of the publication venue. They will not
be published as part of the proceedings. We invite the following types of
papers:
- Full research paper: Previously published research contributions
- Resource paper: Previously published datasets or other resources that are
important or interesting to the community
- Demo paper: Presenting a previously published system critically enabled
by Wikidata
Papers have to be submitted through easychair (please add “[PUBLISHED]” in
the beginning of the title on the submission page so we know that you are
submitting to this track):
https://easychair.org/my/conference?conf=wikidataworkshop2022
== Proceedings ==
The complete set of papers from the Novel Works Track will be published
with the CEUR Workshop Proceedings (CEUR-WS.org).
== Organizing committee ==
Lucie-Aimée Kaffee, University of Copenhagen, lucie.kaffee[[(a)]]gmail.com
Simon Razniewski, Max Planck Institute for Informatics, srazniew[[@]]
mpi-inf.mpg.de
Kholoud Alghamdi, King's College London, kholoud.alghamdi[[(a)]]kcl.ac.uk
Gabriel Maia Rocha Amaral, King's College London, gabriel.amaral[[@]]
kcl.ac.uk
== Programme committee ==
Seyed Amir Hosseini Beghaeiraveri, Heriot-Watt University
Houcemeddine Turki, Data Engineering and Semantics Research Unit,
University of Sfax, Tunisia
Filip Ilievski, Information Sciences Institute, University of Southern
California, Marina del Rey, CA, USA
Mahir Morshed, University of Illinois at Urbana-Champaign
Daniel Garijo, Universidad Politécnica de Madrdid
Niel Chah, University of Toronto & Microsoft
Alasdair Gray, Heriot Watt University
Thomas Pellissier Tanon, Lexistems
John Samuel, CPE Lyon
Dennis Diefenbach, The QA Company
Heiko Paulheim, University of Mannheim
Cristina Sarasua, University of Zurich
Pavlos Vougiouklis, Huawei
Pierre-Henri Paris, Télécom Paris
Lydia Pintscher, Wikimedia Deutschland
Isaac Johnson, Wikimedia Foundation
Alessandro Piscopo, BCC
Luis Galárraga, Inria
Danai Symeonidou, INRAE
Andrew D. Gordon, Microsoft Research and University of Edinburgh
David Abián, King’s College London
Elisavet Koutsiana, King’s College London
--
Lucie-Aimée Kaffee
Hi all,
The next Research Showcase, *Wikipedia's Languages*, will be live-streamed
Wednesday, June 15, at 4:00 AM PST/11:00 AM UTC. View your local time here
<https://zonestamp.toolforge.org/1655290800>.
YouTube stream: https://www.youtube.com/watch?v=AZQM1dtn3g0
You are welcome to ask questions via YouTube chat or on IRC at
#wikimedia-research.
This month's presentations:
Quantifying knowledge synchronisation in the 21st centuryBy *Jisung Yoon
(Pohang University of Science and Technology)*Humans acquire and accumulate
knowledge through language usage and eagerly exchange their knowledge for
advancement. Although geographical barriers had previously limited
communication, the emergence of information technology has opened new
avenues for knowledge exchange. However, it is unclear which communication
pathway is dominant in the 21st century. Here, we explore the dominant path
of knowledge diffusion in the 21st century using Wikipedia, the largest
communal dataset. We evaluate the similarity of shared knowledge between
population groups, distinguished based on their language usage. When
population groups are more engaged with each other, their knowledge
structure is more similar, where engagement is indicated by socio-economic
connections, such as cultural, linguistic, and historical features.
Moreover, geographical proximity is no longer a critical requirement for
knowledge dissemination. Furthermore, we integrate our data into a
mechanistic model to better understand the underlying mechanism and suggest
that the knowledge "Silk Road" of the 21st century is based online.
The Language Geography of WikipediaBy *Martin Dittus*Every language is a
system of being, doing, knowing, and imagining. With over 7,000 active
languages in the world, how many languages are fully represented online? To
answer this question, digital non-profit Whose Knowledge? initiated the
first ever report on the State of the Internet's Languages. As part of this
report, Martin Dittus and Mark Graham have investigated the languages of
Wikipedia. Wikipedia began with a single English-language edition more than
two decades ago, and now offers more than 300 language editions, which
places it at the forefront of digital language support. However, this does
not mean that speakers of these languages get access to the same content:
Wikipedia’s language editions vary widely in scale. We further find that
this inequality is also reflected in Wikipedia’s geographic coverage: not
all places are captured in every language. Wikipedia's coverage often
follows the global distribution of speakers of the respective language. Yet
even when we account for the distribution of language populations, certain
language communities are much more strongly represented on Wikipedia than
others. As a consequence, we find that for many countries in Africa,
Central and South America, and South Asia, most of the content about those
countries is in a foreign language, often a European-colonial language. In
other words, in many of these places, people may need to be able to speak a
second (possibly foreign) language in order to access Wikipedia information
about their own places. Why do we see these differences? And what can be
done to improve things?
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Emily, on behalf of the Research team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation