Dear Christina,
You are likely to find more researchers and people who regularly work with
our metadata on the research mailing list.
Send Wiki-research-l mailing list submissions to
wiki-research-l(a)lists.wikimedia.org
To subscribe or unsubscribe, please visit
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.wik…
Regards
WSC
>
> 2. Re: Accessing wikipedia metadata (Gava, Cristina)
>
>
> On Thu, 16 Sept 2021 at 14:04, Gava, Cristina via Wikimedia-l <
> wikimedia-l(a)lists.wikimedia.org> wrote:
>
> > Hello everyone,
> >
> >
> >
> > It is my first time interacting in this mailing list, so I will be happy
> > to receive further feedbacks on how to better interact with the
> community :)
> >
> >
> >
> > I am trying to access Wikipedia meta data in a streaming and
> time/resource
> > sustainable manner. By meta data I mean many of the voices that can be
> > found in the statistics of a wiki article, such as edits, editors list,
> > page views etc.
> >
> > I would like to do such for an online classifier type of structure:
> > retrieve the data from a big number of wiki pages every tot time and use
> it
> > as input for predictions.
> >
> >
> >
> > I tried to use the Wiki API, however it is time and resource expensive,
> > both for me and Wikipedia.
> >
> >
> >
> > My preferred choice now would be to query the specific tables in the
> > Wikipedia database, in the same way this is done through the Quarry tool.
> > The problem with Quarry is that I would like to build a standalone
> script,
> > without having to depend on a user interface like Quarry. Do you think
> that
> > this is possible? I am still fairly new to all of this and I don’t know
> > exactly which is the best direction.
> >
> > I saw [1] <https://meta.wikimedia.org/wiki/Research:Data> that I could
> > access wiki replicas both through Toolforge and PAWS, however I didn’t
> > understand which one would serve me better, could I ask you for some
> > feedback?
> >
> >
> >
> > Also, as far as I understood [2]
> > <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake>, directly
> > accessing the DB through Hive is too technical for what I need, right?
> > Especially because it seems that I would need an account with production
> > shell access and I honestly don’t think that I would be granted access to
> > it. Also, I am not interested in accessing sensible and private data.
> >
> >
> >
> > Last resource is parsing analytics dumps, however this seems less organic
> > in the way of retrieving and polishing the data. As also, it would be
> > strongly decentralised and physical-machine dependent, unless I upload
> the
> > polished data online every time.
> >
> >
> >
> > Sorry for this long message, but I thought it was better to give you a
> > clearer picture (hoping this is clear enough). If you could give me even
> > some hint it would be highly appreciated.
> >
> >
> >
> > Best,
> >
> > Cristina
> >
> >
> >
> > [1] https://meta.wikimedia.org/wiki/Research:Data
> >
> > [2] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake
> > _______________________________________________
> > Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines
> > at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > https://meta.wikimedia.org/wiki/Wikimedia-l
> > Public archives at
> >
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
> > To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
>
Hello everyone,
It is my first time interacting in this mailing list, so I will be happy to receive further feedbacks on how to better interact with the community :)
I am trying to access Wikipedia meta data in a streaming and time/resource sustainable manner. By meta data I mean many of the voices that can be found in the statistics of a wiki article, such as edits, editors list, page views etc.
I would like to do such for an online classifier type of structure: retrieve the data from a big number of wiki pages every tot time and use it as input for predictions.
I tried to use the Wiki API, however it is time and resource expensive, both for me and Wikipedia.
My preferred choice now would be to query the specific tables in the Wikipedia database, in the same way this is done through the Quarry tool. The problem with Quarry is that I would like to build a standalone script, without having to depend on a user interface like Quarry. Do you think that this is possible? I am still fairly new to all of this and I don't know exactly which is the best direction.
I saw [1]<https://meta.wikimedia.org/wiki/Research:Data> that I could access wiki replicas both through Toolforge and PAWS, however I didn't understand which one would serve me better, could I ask you for some feedback?
Also, as far as I understood [2]<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake>, directly accessing the DB through Hive is too technical for what I need, right? Especially because it seems that I would need an account with production shell access and I honestly don't think that I would be granted access to it. Also, I am not interested in accessing sensible and private data.
Last resource is parsing analytics dumps, however this seems less organic in the way of retrieving and polishing the data. As also, it would be strongly decentralised and physical-machine dependent, unless I upload the polished data online every time.
Sorry for this long message, but I thought it was better to give you a clearer picture (hoping this is clear enough). If you could give me even some hint it would be highly appreciated.
Best,
Cristina
[1] https://meta.wikimedia.org/wiki/Research:Data
[2] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake
Hello all,
The September Wikimedia Research Showcase will be on September 15 at 16:30
UTC (9:30am PT/ 12:30pm ET/ 18:30pm CEST). The theme will be "socialization
on Wikipedia" with speakers Rosta Farzan and J. Nathan Matias.
Livestream: https://www.youtube.com/watch?v=YVqabVvLIZU
Talk 1
Speaker: Rosta Farzan (School of Computing and Information, University of
Pittsburgh)
Title: Unlocking the Wikipedia clubhouse to newcomers: results from two
studies
Abstract: It is no news to any of us that success of online production
communities such as Wikipedia highly relies on a continuous stream of
newcomers to replace the inevitable high turnover and to bring on board new
sources of ideas and workforce. However, these communities have been
struggling with attracting newcomers, especially from a diverse population
of users, and further retention of newcomers. In this talk, I will present
about two different approaches in engaging new editors in Wikipedia: (1)
newcomers joining through the Wiki Ed program, an online program in which
college students edit Wikipedia articles as class assignments; (2)
newcomers joining through a Wikipedia Art+Feminism edit-a-thon. I present
how each approach incorporated techniques in engaging newcomers and how
they succeed in attracting and retention of newcomers.
More information:
- Bring on Board New Enthusiasts! A Case Study of Impact of Wikipedia
Art + Feminism Edit-A-Thon Events on Newcomers
<https://link.springer.com/chapter/10.1007/978-3-319-47880-7_2>, SocInfo
2016 (pdf
<http://saviaga.com/wp-content/uploads/2016/06/socinfo_ediathons.pdf>)
- Successful Online Socialization: Lessons from the Wikipedia Education
Program <https://dl.acm.org/doi/abs/10.1145/3392857>, CSCW 2020 (pdf
<https://www.cc.gatech.edu/~dyang888/docs/cscw_li_2020_wiki.pdf>)
Talk 2
Speaker: J. Nathan Matias <http://natematias.com/> (Citizens and Technology
Lab <http://citizensandtech.org/>, Cornell University Departments of
Communication and Information Science)
Title: The Effect of Receiving Appreciation on Wikipedias. A Community
Co-Designed Field Experiment
Abstract: Can saying “thank you” make online communities stronger & more
inclusive? Or does thanking others for their voluntary efforts have little
effect? To ask this question, the Citizens and Technology Lab (CAT Lab)
organized 344 volunteers to send thanks to Wikipedia contributors across
the Arabic, German, Polish, and Persian languages. We then observed the
behavior of 15,558 newcomers and experienced contributors to Wikipedia. On
average, we found that organizing volunteers to thank others increases
two-week retention of newcomers and experienced accounts. It also caused
people to send more thanks to others. This study was a field experiment, a
randomized trial that sent thanks to some people and not to others. These
experiments can help answer questions about the impact of community
practices and platform design. But they can sometimes face community
mistrust, especially when researchers conduct them without community
consent. In this talk, learn more about CAT Lab's approach to community-led
research and discuss open questions about best practices.
More information:
-
Volunteers Thanked Thousands of Wikipedia Editors to Learn the Effects
of Receiving Thanks
<https://citizensandtech.org/2020/06/effects-of-saying-thanks-on-wikipedia/>,
blogpost (in EN, DE, AR, PL, FA) <https://osf.io/ueq5f/>
-
The Diffusion and Influence of Gratitude Expressions in Large-Scale
Cooperation: A Field Experiment in Four Knowledge Networks
<https://osf.io/ueq5f/>, paper preprint
More information: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
Of course I am not from WMCUG, but they issued a statement regarding the
series of serious office actions:
https://qiuwen.wmcug.org.cn/archives/390/on-wmf-office-action-zh-1/
Of course there is an archive.is link so as not to , you know, they may
just take down:
https://archive.is/EE6AD
Interesting to see if they really translated this to English.
Regards,
William
Dear Wikimedians,
Below you can find the report of the Wikimedia Community User Group
Albania for 2020:https://meta.wikimedia.org/wiki/Wikimedia_Community_User_Group_Albania…
If you have any questions or suggestions, please let us know.
Best regards,
Nafie
On behalf of Wikimedia Community User Group Albania members
_______________________________________________
Please note: all replies sent to this mailing list will be immediately directed to Wikimedia-l, the public mailing list of the Wikimedia community. For more information about Wikimedia-l:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
_______________________________________________
WikimediaAnnounce-l mailing list -- wikimediaannounce-l(a)lists.wikimedia.org
To unsubscribe send an email to wikimediaannounce-l-leave(a)lists.wikimedia.org
*Translations can be found on
Meta: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections/2021/2021-09…
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections/2021/2021-09…>*
Thank you to everyone who participated in the 2021 Board election. The
Elections Committee has reviewed the votes of the 2021 Wikimedia Foundation
Board of Trustees election, organized to select four new trustees. A record
6,873 people from across 214 projects cast their valid votes. The following
four candidates received the most support:
1.
Rosie Stephenson-Goodknight
2.
Victoria Doronina
3.
Dariusz Jemielniak
4.
Lorenzo Losa
Waiting for the Board’s appointment
While these candidates have been ranked through the community vote, they
are not yet appointed to the Board of Trustees. They still need to pass a
successful background check and meet the qualifications outlined in the
Bylaws. This process can be longer depending on the country of residence of
the candidates. The Board has set a tentative date to appoint new trustees
at the end of this month. The Board also has approved
<https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…>
a short extension to the terms of the exiting trustees to allow a smooth
transition.
Thanks to all the candidates
Thanks to all candidates for their participation. They achieved a record in
the number of candidates and regional diversity, with more than half of the
19 candidates from regions outside North America and Western Europe. This
election used Single Transferable Voting
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections/Single_Trans…>
for the first time. This system does not indicate a number of votes or
percentage of support. Rather, it shows in which round each candidate was
eliminated. You can review the full results on Meta-Wiki
<https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_ele…>,
which document the order in which the candidates were mathematically
eliminated.
Thanks to all the election volunteers
The Board of Trustees stressed
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_noticeboard/2021…>
the importance of increasing diversity on the Board. Dozens of volunteers
supported by a team of multilingual facilitators promoted the election in
up to 61 languages. They hosted many conversations about the Board election
in more than 50 languages and encouraged community members to participate
in all areas of the election.
Statistics
The 2021 Board of Trustees election broke new ground in several areas. The
Movement Strategy and Governance team will publish a report with the most
remarkable metrics soon. In the meantime, some statistics
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections/2021/Stats>
can be found on Meta-Wiki. Here you have some highlights.
-
Participation increased by 1,753 voters over 2017. Overall turnout was
10.13%, 1.1 percentage points more than in 2017.
-
The highest participation among wikis with at least 5 eligible voters
was seen on the Hausa and Igbo Wikipedias. Both wikis had a participation
of 75% (6 of 8 eligible voters). Other high participation numbers were seen
on the Telugu, Nepalese, and Punjabi Wikipedias.
-
The largest increase in participation among wikis with at least 50
eligible voters was the Catalan Wikipedia, on which 36.3% of eligible
voters voted (28 percentage points higher than in 2017).
-
There were 214 wikis represented in the election. This is determined by
the wiki on which the account was originally created.
-
A total of 74 wikis that did not participate in 2017 produced voters in
this election.
-
A total of 226 wikis had at least one eligible voter but produced no
voters. The largest electorate in this group was Cantonese Wikipedia, with
25 eligible voters.
In the upcoming days, an anonymized list of votes will be released that
will allow deeper inspection and publication of more metrics.
2022 election
The next Board of Trustees election is planned to take place in 2022.
Interested community members can watch the Wikimedia Foundation elections
page for updates
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections>. Four
community seats will be selected at that time. It is not too early to
consider and prepare for candidacy. Community members may like to
check out Candidate
Resources
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections/Candidate_Re…>
to learn more about what to expect and how to prepare for this role.
--
*Jackie Koerner*
*she/her*
Facilitator, Movement Strategy and Governance
*English language communities and Meta-Wiki*