Hi everyone,
We’re preparing for the upcoming issue of the research newsletter (
https://meta.wikimedia.org/wiki/Research:Newsletter ) and looking for
contributors. If you are interested in reviewing or summarizing recently
published research for our audience of Wikimedians and academic
researchers, please take a look at
https://etherpad.wikimedia.org/p/WRN202310 and add your name next to any
paper you are interested in covering. This issue (for October 2023) is
scheduled for publication on November 5, 2023 20:00 UTC, texts should be in
a day before that. If you can't make this deadline but would like to cover
a particular paper in the subsequent issue, leave a note next to the
paper's entry. As usual, short notes and one-paragraph reviews are most
welcome, too.
Alhaji Darajaati on behalf of the Newsletter team
[Moving research-wmf to Bcc.]
Dear Hanxuan Sun.
Thank you for reaching out.
*Some tips for increasing the chances of success for your project*
- *Reduce the chance of surprising existing Wikipedia volunteers. *For
example,
- If your project involves recruiting existing Wikipedia editors or
changing content in a Wikipedia language, please make sure you
communicate
that to the relevant Wikipedia language community and engage in follow-up
conversations they may want to have with you. On English
Wikipedia, you can
do it at
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(miscellaneous)
. (Village pump is a place where many communities Wikipedia language
communities maintain for this type of conversation. You can find
the other
languages' village pump pages by clicking on the languages menu on the
top-right side of https://en.wikipedia.org/wiki/Wikipedia:Village_pump
.)
- Create a research page for your project on MetaWiki
https://meta.wikimedia.org/wiki/Research:New_project and link it in
your communications. This is the place where others can learn more about
your research. Sample projects at
https://meta.wikimedia.org/wiki/Research:Index . (Reference your IRB
from this research page if you can.)
- *Understand the context. *We can give you tips to improve your work,
however, the relevant Wikimedia project (Wikipedia language community in
your case), is the community who you'll need to primarily work with.
- *Survey privacy and data retention. *With Wikimedians, sometimes less
is more. :) I highly recommend you think hard about what data you actually
need and how long you will keep it for what reason. Keeping sensitive data
in perpetuity can raise alarms b/c privacy is something many Wikipedians
value.
- *Survey questions. *There are at least a few folks on this list that
have expertise on this front. They may choose to leave feedback for you.
Thanks for being proactive and asking for feedback. :) I had a quick look
at the first few pages. One question that immediately caught my attention
was the question about gender where you ask about the gender and you offer
options about sex. That question needs a fix, please. You can find a sample
of survey questions (including gender related ones at
https://meta.wikimedia.org/wiki/Community_Insights/2022_Survey_Questions
). Having seen this one example you can improve, I highly recommend that
you seek specific input into your survey questions from a survey specialist
before running the survey to make sure the survey questions can help you
with the questions you want to answer as part of the research and that they
are as close as possible to the latest best practices in survey design.
Good luck!
Best,
Leila
--
Leila Zia
Head of Research
Wikimedia Foundation
On Wed, Oct 18, 2023 at 11:51 AM Hanxuan Sun <hanxuan.sun(a)unsw.edu.au>
wrote:
> Dear Wikipedia organization members,
>
>
>
> Researchers at UNSW are conducting a project about exploring the reasons
> behind translators making changes to their translations on Wikipedia, a
> popular online encyclopedia which uses a ‘crowdsourcing’ approach,
> attracting volunteer translators to translate its content. The research
> will investigate the factors that influence translators’ decisions to
> revise existing translations, such as the quality of translation (e.g.,
> from machine translation), personal beliefs, and discussions with peers in
> online communities, etc.
>
>
>
> The research study is looking recruit people who meet the following
> criteria:
>
>
>
> 1. 18 years of age or older;
> 2. Live in Australia;
> 3. Proficient Chinese and English bilinguals;
> 4. Active Wikipedia online volunteers engaged in revision.
>
> Participants will be asked to complete the following research activities
> if they agree to participate:
>
> - Online surveys with 34 questions that will take approximately 20 to
> 25 minutes to complete; and/or
> - Followed-up one-on-one interviews via Zoom, which will need around
> 30 minutes; and/or
> - Observational study for an active group; and/or
> - Focus group discussion via zoom, which will take around 2 hours.
> - A full description of all research activities, including any risks,
> harms or discomforts that you may experience while participating in this
> research is included in the attached Participant Information Statement and
> Consent Form.
> -
>
> Please contact the following person via email or phone to register your
> interest in taking part in the research:
>
>
>
> *Name*
>
> Hanxuan Sun
>
> *Position*
>
> Student Investigator
>
> *Email*
>
> hanxuan.sun(a)unsw.edu.au
>
>
>
> If you have questions about the research and would like to contact the
> Chief Investigator, please contact the following person:
>
> *Chief Investigator *
>
> *Name*
>
> Stephen Doherty
>
> *Position*
>
> Chief Investigator
>
> *Telephone*
>
> (02)9385 1681
>
> *Email*
>
> s.doherty(a)unsw.edu.au
>
>
>
>
>
> This project is approved by the ethics committee in UNSW, which is
> attached. The Participant’s consent form is attached in the cover page of
> the surveys. The link of surveys are: English version:
> https://unsw.au1.qualtrics.com/jfe/form/SV_3fNVlLeMgM3BNzg ; Chinses
> version: https://unsw.au1.qualtrics.com/jfe/form/SV_0AjTai9lMOf48FE .
> Would you please check the content of the surveys at your most convenience?
> Thank you so much! If you have any questions, feel free to contact me.
>
>
>
> Best,
>
> Hanxuan Sun.
> _______________________________________________
> Research-wmf mailing list -- research-wmf(a)lists.wikimedia.org
> To unsubscribe send an email to research-wmf-leave(a)lists.wikimedia.org
>
Hi all,
The next Research Showcase, focused on *Data Privacy*, will be
live-streamed on Wednesday, October 18, at 9:30 AM PST / 16:30 UTC. Find
your local time here <https://zonestamp.toolforge.org/1697646641>.
YouTube stream: https://www.youtube.com/watch?v=ntgRsMaDlsw. As usual, you
can join the conversation in the YouTube chat as soon as the showcase goes
live.
This month's presentations:
Wikipedia Reader Navigation: When Synthetic Data Is EnoughBy *Akhil Arora,
EPFL*Every day millions of people read Wikipedia. When navigating the vast
space of available topics using hyperlinks, readers describe trajectories
on the article network. Understanding these navigation patterns is crucial
to better serve readers’ needs and address structural biases and knowledge
gaps. However, systematic studies of navigation on Wikipedia are hindered
by a lack of publicly available data due to the commitment to protect
readers' privacy by not storing or sharing potentially sensitive data. In
this paper, we ask: How well can Wikipedia readers' navigation be
approximated by using publicly available resources, most notably the
Wikipedia clickstream data <https://wikinav.toolforge.org/>? We
systematically quantify the differences between real navigation sequences
and synthetic sequences generated from the clickstream data, in 6 analyses
across 8 Wikipedia language versions. Overall, we find that the differences
between real and synthetic sequences are statistically significant, but
with small effect sizes, often well below 10%. This constitutes
quantitative evidence for the utility of the Wikipedia clickstream data as
a public resource: clickstream data can closely capture reader navigation
on Wikipedia and provides a sufficient approximation for most practical
downstream applications relying on reader data. More broadly, this study
provides an example for how clickstream-like data can generally enable
research on user navigation on online platforms while protecting users’
privacy.
How to tell the world about data you cannot show them: Differential privacy
at the Wikimedia FoundationBy *Hal Triedman, Wikimedia Foundation*The
Wikimedia Foundation (WMF), by virtue of its centrality on the internet,
collects lots of data about platform activities. Some of that data is made
public (e.g. global daily pageviews); other data types are not shared (or
are pseudonymized prior to sharing), largely due to privacy concerns.
Differential privacy is a statistical definition of privacy that has gained
prominence in academia, but is still an emerging technology in industry. In
this talk, I share the story of how we put differential privacy into
production at the WMF, through looking at the case study of geolocated
daily pageview counts.
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Best,
Kinneret
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
The September 2023 issue of the Wikimedia Research Newsletter is out:
https://meta.wikimedia.org/wiki/Research:Newsletter/2023/September
In this issue :
1. In blind test, readers prefer ChatGPT output over Wikipedia articles in
terms of clarity, and see both as equally credible
2. FlaggedRevs study finds that concerns about limiting Wikipedia's "anyone
can edit" principle "may be overstated"
3. Briefly
4. Other recent publications
*** 9 recent publications were covered or listed in this issue ***
Alhaji Darajaati on behalf of the Newsletter team
---
Wikimedia Research Newsletter
*
https://meta.wikimedia.org/wiki/Research:Newsletter/
* Follow us on Twitter/X: https://twitter.com/WikiResearch
* Follow us on Mastodon: https://mastodon.social/@wikiresearch
* Like us on Facebook: Facebook.com/WikiResearch/
* Receive this newsletter by mail: Research-newsletter Mailing List
Hello everyone!
I’m part of a research group that is working on developing a framework for
the ethical conduct of research with online communities; the framework is
intended for use by online communities, researchers, and institutional
review boards. To help develop the framework, we're looking for members of
the Wikipedia community interested in joining us for one or more
participatory workshops.
(1) If this research piques your interest, you can learn more at our Meta-Wiki
page
<https://meta.wikimedia.org/wiki/Research:Beyond_the_Individual:_Community-E…>.
Consider leaving any feedback you have on the research talk page
<https://meta.wikimedia.org/wiki/Research_talk:Beyond_the_Individual:_Commun…>;
your questions, concerns, and ideas are greatly appreciated and will only
make this work better in the long run.
(2) We are looking for 6-12 participants for our first workshop. If you're
interested in joining, please leave us your email here
<https://umn.qualtrics.com/jfe/form/SV_cBiSeHJoJe0pjlY>, and we'll reach
out personally with more information. Or, if you prefer, leave a comment on
the research talk page
<https://meta.wikimedia.org/wiki/Research_talk:Beyond_the_Individual:_Commun…>,
and we can follow up via Wiki.
Cheers,
Matthew
--
Matthew Zent
GroupLens Research
https://zentavious.github.io
He/Him
Hello (Semantic) MediaWiki maintainers, software developers, consultants, researchers!
The SMWCon 2023 (https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2023) will be held on location in Paderborn, Germany (and online). On three days there will be talks, tutorials and hackathons.
Registration
----------------
Registration is open on Eventbrite<https://www.eventbrite.com/e/smwcon-fall-2023-tickets-719554987337>.
Go there and take advantage of the early bird rates!
Call for Contributions
----------------------------
This conference addressed everybody interested in wikis and open knowledge, especially in Semantic MediaWiki, e.g. users, developers, consultants, business or government representatives, and researchers.
This conference aimed to:
* inspire/onboard new users,
* inform on where and how MediaWiki is used,
* convey and consolidate best practices,
* initiate/foster/integrate application and development and
* strengthen the community of stakeholders and its service portfolio.
Learn how to "do" MediaWiki in order to assume your responsibilities regarding your organization's knowledge management.
Your experience is valuable for all of us! So please share and propose a talk, tutorial or other contribution.
Go to the Conference Page (https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2023)
and hit the 'Propose a talk here' button.
Please propose a contribution if you plan to have one, even if you don't have the details yet. For us it is important to know what we can expect.
We look forward to your contribution!
Best,
Bernhard and Tobias
on behalf of https://mwstake.org/ - the MediaWiki Stakeholders' Group
As part of Wikimedia Germany's work around reference reuse, we wrote a
tool which processes the HTML dumps of all articles and produces
detailed information about how Cite references (and Kartographer maps)
are used on each page.
I'm writing this list for advice on how to publish the results so that
the data can be easily discovered and consumed by researchers.
Currently, the data is contained in 3,100 JSON and NDJSON files hosted
on a Wikimedia Cloud VPS server, with a total size of 3.4GB. The
outputs can be split or merged into whatever form will make them more
useable.
For an overview of the columns and sample rows, please see this task:
https://phabricator.wikimedia.org/T341751
We plan to run the scraper again in the future, and its modular
architecture makes it simple to include or exclude additional
information if anyone has suggestions about what else we might want to
extract from rendered articles. To read more about the tool itself and
why we decided to process HTML dumps directly, see this post:
https://mw.ludd.net/wiki/Elixir/HTML_dump_scraper
-Adam Wight
[[mw:Adamw]]
https://meta.wikimedia.org/wiki/WMDE_Technical_Wishes
Tidskrift för ABM invites submissions for a thematic issue on information
futures. This special issue aims to explore and examine the transformative
role of digital technologies in shaping the landscape of cultural heritage,
knowledge access and information futures.
Topics of interest include, but are not limited to:
* The future of access to knowledge: Trends, challenges, and opportunities.
* AI and machine learning applications in preserving and disseminating
cultural heritage.
* Digital technologies and their impact on knowledge discovery and access.
* Ethical considerations in the use of AI for information retrieval and
curation.
* User experience and human-computer interaction in accessing knowledge in
a digital world.
* The role of libraries, archives, and museums in facilitating knowledge
access in the digital era.
* Social and cultural implications of the evolving information landscape.
We warmly welcome submissions outside of the specific theme as well.
Tidskrift för ABM publishes scientific papers, travelogues, reviews and
notes. We accept submissions in both Swedish and English.
DEADLINE 1ST OF OCTOBER.
Find the author guidelines and submit your work HERE
<https://journals.uu.se/tabm/about/submissions>.
View and please share our call for papers poster.
<https://www.canva.com/design/DAFs61fyXiY/Z59L3-xOjguExvTbabsoNw/view?utm_co…>
Best,
Biyanto R.
(biyanto.com)
Penafian: Anda tidak perlu membalas secepat mungkin, bila Anda menerima
surel ini pada akhir pekan atau hari libur.
Disclaimer: please do not feel obligated to respond my email during weekend
or holiday.
Dear list members,
I am happy to share my new paper which was just published OA in the Journal of Computational Social Science: https://doi.org/10.1007/s42001-023-00225-8
In this paper, I describe a new dataset which covers (almost) all offline meetings within the German-language version of Wikipedia from its launch in 2001 to March 2020. The dataset is published on OSF: https://osf.io/eha4r/
I would be delighted to see other researchers make use of this dataset - be it for substantial research on the intricacies of Wikipedia (I'm always up for collaborations, too!) or as a neat example dataset for teaching (be it on networks, temporal developments, spatial plotting, or on how to combine and work with small and big data).
A thank you goes out to the Wikimedia Foundation for supporting me with a project grant, and to the editor and reviewers handling the publication (you might well be on this list as well :-))!
Best
Nicole