Hello everyone,
We have completed an interview study to learn about the values of Wikipedia
stakeholders around the ORES ecosystem. You can find the full study
description here
<https://meta.wikimedia.org/wiki/Research:Applying_Value-Sensitive_Algorithm…>
.
After analyzing our interview data, we were interested to find that all
stakeholders' values seem to converge on five major values around how
algorithms ought to operate on Wikipedia:
1.
Algorithmic systems should reduce the effort of community maintenance
work.
2.
Algorithmic systems should maintain human judgement as the final
authority.
3.
Algorithmic systems should support the workflows of individual people
with different priorities at different times.
4.
Algorithmic systems should encourage positive engagement with diverse
editor groups, such as newcomers, females, and minorities.
5.
Algorithmic systems should establish the trustworthiness of both people
and algorithms within the community.
We are inviting everyone to share feedback about our interpretation of the
data by reviewing the preliminary results of our study. Please leave
comments in this google doc
<https://docs.google.com/document/d/17AByGDxS2n9Cfon6vtgDGO3lUu0lQhe3eWNdAVZ…>,
or reply directly to the thread. We also posted about this at Village Pump
<https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)#Share_your…>.
Thanks,
Estelle (aka FauxNeme on Wikipedia)
--
*C. Estelle Smith *
*Graduate Research Fellow*
*University of Minnesota, Department of Computer Science*
*Keller Hall, 200 Union St., SE**Minneapolis, MN 55455*
*Cell: 612.226.7789 | **Twitter: @memyselfandHCI*
*https://colleenestellesmith.com/ <https://colleenestellesmith.com/>*
*Pronouns: she/her/hers*
Hi,
There is a phenomenon in Wikipedias in smaller languages: There activity
level of people who actually know the language of the wiki and make
meaningful text contributions is relatively low, and the activity of people
from other wikis who make various technical edits that don't require the
knowledge of the language is relatively high.
I call the latter group "helpful strangers". They can do things such as
fixing categories, fixing invalid wiki syntax, editing templates, adding
images, etc.—things that don't require knowing the language well, and can
be achieved by copying and pasting, by guessing things from interlanguage
links, or by writing language-neutral things, such as numbers or filenames.
Now, I've written "relatively low" and "relatively high", but these are
just my anecdotal impressions. Has anyone thought of a way to quantify this
more precisely?
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Kiril,
I wrote something a while back in java that was able to get the number of
contributions per user for a given language in Wikipedia. It could be able
to be altered for your purposes if the datastructure of the namespaces is
the same or similar.
https://github.com/hachacha/wikiParticipants
particularly this file
https://github.com/hachacha/wikiParticipants/blob/master/src/wikipediansbyn…
Altering which contributions would be saved within a specific date range is
possible.
God Bless,
Jonathan
On Fri, Jun 7, 2019 at 8:00 AM <wiki-research-l-request(a)lists.wikimedia.org>
wrote:
> Send Wiki-research-l mailing list submissions to
> wiki-research-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
> 1. Fwd: [Wikidata] Scaling Wikidata Query Service (Pine W)
> 2. Database of all users (Kiril Simeonovski)
> 3. Re: Database of all users (Federico Leva (Nemo))
> 4. Re: Database of all users (Kiril Simeonovski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 6 Jun 2019 19:35:13 +0000
> From: Pine W <wiki.pine(a)gmail.com>
> To: "wikitech-l(a)lists.wikimedia.org" <wikitech-l(a)lists.wikimedia.org>,
> Wiki Research-l <wiki-research-l(a)lists.wikimedia.org>
> Subject: [Wiki-research-l] Fwd: [Wikidata] Scaling Wikidata Query
> Service
> Message-ID:
> <CAF=dyJiJFXf7Jp8NUUu90Zd2dBT6J=
> FhTyjirAWRhN+UV2jLpQ(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Forwarding in case this is of interest.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> ---------- Forwarded message ---------
> From: Guillaume Lederrey <glederrey(a)wikimedia.org>
> Date: Thu, Jun 6, 2019 at 7:33 PM
> Subject: [Wikidata] Scaling Wikidata Query Service
> To: Discussion list for the Wikidata project. <
> wikidata(a)lists.wikimedia.org>
>
>
> Hello all!
>
> There has been a number of concerns raised about the performance and
> scaling of Wikdata Query Service. We share those concerns and we are
> doing our best to address them. Here is some info about what is going
> on:
>
> In an ideal world, WDQS should:
>
> * scale in terms of data size
> * scale in terms of number of edits
> * have low update latency
> * expose a SPARQL endpoint for queries
> * allow anyone to run any queries on the public WDQS endpoint
> * provide great query performance
> * provide a high level of availability
>
> Scaling graph databases is a "known hard problem", and we are reaching
> a scale where there are no obvious easy solutions to address all the
> above constraints. At this point, just "throwing hardware at the
> problem" is not an option anymore. We need to go deeper into the
> details and potentially make major changes to the current architecture.
> Some scaling considerations are discussed in [1]. This is going to take
> time.
>
> Reasonably, addressing all of the above constraints is unlikely to
> ever happen. Some of the constraints are non negotiable: if we can't
> keep up with Wikidata in term of data size or number of edits, it does
> not make sense to address query performance. On some constraints, we
> will probably need to compromise.
>
> For example, the update process is asynchronous. It is by nature
> expected to lag. In the best case, this lag is measured in minutes,
> but can climb to hours occasionally. This is a case of prioritizing
> stability and correctness (ingesting all edits) over update latency.
> And while we can work to reduce the maximum latency, this will still
> be an asynchronous process and needs to be considered as such.
>
> We currently have one Blazegraph expert working with us to address a
> number of performance and stability issues. We
> are planning to hire an additional engineer to help us support the
> service in the long term. You can follow our current work in phabricator
> [2].
>
> If anyone has experience with scaling large graph databases, please
> reach out to us, we're always happy to share ideas!
>
> Thanks all for your patience!
>
> Guillaume
>
> [1]
> https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
> [2] https://phabricator.wikimedia.org/project/view/1239/
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 7 Jun 2019 08:57:38 +0200
> From: Kiril Simeonovski <kiril.simeonovski(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: [Wiki-research-l] Database of all users
> Message-ID:
> <
> CABuEHm5mfDeo7sjrpmW_aK-Mpd2qH0c2JyGnoU9OdB5YtuTheg(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Dear all,
>
> I was wondering if there is a way to extract a database of all users (or
> selection of users according to some criteria) with their contributions to
> the Wikimedia projects until a fixed point of time from the XTools.
>
> Thank you.
>
> Best regards,
> Kiril
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 7 Jun 2019 10:53:30 +0300
> From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>, Kiril Simeonovski
> <kiril.simeonovski(a)gmail.com>
> Subject: Re: [Wiki-research-l] Database of all users
> Message-ID: <33f8a998-2144-1d49-5347-8c59018e2fcb(a)gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Kiril Simeonovski, 07/06/19 09:57:
> > with their contributions to
> > the Wikimedia projects
>
> Do you mean the *number* of their contributions, or literally all their
> contributions? Filtering the stub dumps would be one systematic way to
> get all the metadata about edits.
>
> If you just need aggregate numbers with some filter by date, namespace
> or other, the fastest way is probably to write a script which loops
> through all the databases on Labs. For instance I made this to list the
> users who contribute in a certain language, to find translators for very
> small languages:
> <
> https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/lists/+/master/sc…
> >
>
> Federico
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 7 Jun 2019 09:57:45 +0200
> From: Kiril Simeonovski <kiril.simeonovski(a)gmail.com>
> To: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> Cc: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] Database of all users
> Message-ID:
> <CABuEHm7ahWx9P=
> xa_km1S+Q3Z0WkOHaxcOunFx3AsA_cfnv-hg(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Federico,
>
> Thanks for the straightforward answer. My idea is to extract the number of
> contributions across projects and namespaces.
>
> Best,
> Kiril
>
> On Fri, Jun 7, 2019 at 9:53 AM Federico Leva (Nemo) <nemowiki(a)gmail.com>
> wrote:
>
> > Kiril Simeonovski, 07/06/19 09:57:
> > > with their contributions to
> > > the Wikimedia projects
> >
> > Do you mean the *number* of their contributions, or literally all their
> > contributions? Filtering the stub dumps would be one systematic way to
> > get all the metadata about edits.
> >
> > If you just need aggregate numbers with some filter by date, namespace
> > or other, the fastest way is probably to write a script which loops
> > through all the databases on Labs. For instance I made this to list the
> > users who contribute in a certain language, to find translators for very
> > small languages:
> > <
> >
> https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/lists/+/master/sc…
> > >
> >
> > Federico
> >
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 166, Issue 4
> ***********************************************
>
Dear all,
I was wondering if there is a way to extract a database of all users (or
selection of users according to some criteria) with their contributions to
the Wikimedia projects until a fixed point of time from the XTools.
Thank you.
Best regards,
Kiril
Forwarding in case this is of interest.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message ---------
From: Guillaume Lederrey <glederrey(a)wikimedia.org>
Date: Thu, Jun 6, 2019 at 7:33 PM
Subject: [Wikidata] Scaling Wikidata Query Service
To: Discussion list for the Wikidata project. <wikidata(a)lists.wikimedia.org>
Hello all!
There has been a number of concerns raised about the performance and
scaling of Wikdata Query Service. We share those concerns and we are
doing our best to address them. Here is some info about what is going
on:
In an ideal world, WDQS should:
* scale in terms of data size
* scale in terms of number of edits
* have low update latency
* expose a SPARQL endpoint for queries
* allow anyone to run any queries on the public WDQS endpoint
* provide great query performance
* provide a high level of availability
Scaling graph databases is a "known hard problem", and we are reaching
a scale where there are no obvious easy solutions to address all the
above constraints. At this point, just "throwing hardware at the
problem" is not an option anymore. We need to go deeper into the
details and potentially make major changes to the current architecture.
Some scaling considerations are discussed in [1]. This is going to take
time.
Reasonably, addressing all of the above constraints is unlikely to
ever happen. Some of the constraints are non negotiable: if we can't
keep up with Wikidata in term of data size or number of edits, it does
not make sense to address query performance. On some constraints, we
will probably need to compromise.
For example, the update process is asynchronous. It is by nature
expected to lag. In the best case, this lag is measured in minutes,
but can climb to hours occasionally. This is a case of prioritizing
stability and correctness (ingesting all edits) over update latency.
And while we can work to reduce the maximum latency, this will still
be an asynchronous process and needs to be considered as such.
We currently have one Blazegraph expert working with us to address a
number of performance and stability issues. We
are planning to hire an additional engineer to help us support the
service in the long term. You can follow our current work in phabricator
[2].
If anyone has experience with scaling large graph databases, please
reach out to us, we're always happy to share ideas!
Thanks all for your patience!
Guillaume
[1]
https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
[2] https://phabricator.wikimedia.org/project/view/1239/
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Wiki Research List!
Are you interested to make discoveries that could help your Wiki project
thrive? Let's talk! Or join us for a one-day research summit after
Wikimania on August 19. Funding is available.
CivilServant <http://civilservant.io/> is a research nonprofit that works
collaboratively with Wikipedians to test tools and practices designed to
reach the goals those communities care about most. We currently are looking
to add partnerships that start in 2020.
If you are an experienced Wikipedian who has helped lead initiatives in
your language community - or other Wiki project - and you want to help
learn how to make your community stronger, we would love to hear from you.
Are you curious about what tools are effective in reaching your Wikipedia's
goals? Is your community thinking about using a new or existing tool or
practice? We can work with your community to design and run A/B tests to
discover which practices are effective. Along the way, we also hope to
contribute to science!
If you are interested to speak with us, please introduce yourself here
<https://docs.google.com/forms/d/e/1FAIpQLSd2BwuVH34pw244D4ZoEE6MS06ZRj7XWDx…>.
We also invite you to participate in a one day Research Summit which will
be held in Stockholm on August 19th
<https://meta.wikimedia.org/wiki/CivilServant%27s_Wikimedia_studies/Summit_S…>,
the day after Wikimania. We hope to welcome any Wikipedian who is
interested in learning about CivilServant and how to design A/B tests
(given space limitations) .
Funding: We have 10 full scholarships to pay for attendance at the Research
Summit and Wikimania (including airfare & accommodations) as well as 15
partial scholarships to cover the costs of staying in Stockholm two
additional nights to attend the summit. We invite you to apply for either
scholarship here
<https://docs.google.com/forms/d/e/1FAIpQLSd2BwuVH34pw244D4ZoEE6MS06ZRj7XWDx…>
.
We look forward to hearing from you. If you have questions about
CivilServant's work, please visit our Meta page
<https://meta.wikimedia.org/wiki/CivilServant%27s_Wikimedia_studies> or our
website <https://civilservant.io/> where you can read about some of our
previous
<https://civilservant.io/moderation_experiment_r_science_rule_posting.html>
studies
<https://civilservant.io/do_downvotes_cause_bad_behavior_jan_2018.html>
with Reddit communities. - or feel free to reach out directly to
CivilServant's research manager, Juliakamin(cs)
<https://meta.wikimedia.org/wiki/User:Juliakamin(cs)>.
Finally, if you know other Wikipedians who may be interested to talk with
us, please feel free to forward this email to them.
Thanks!
--
J. Nathan Matias <http://natematias.com/> : Princeton University :
CivilServant <http://civilservant.io> : MIT Media Lab : Cornell University
Fall 2019
<https://medium.com/@natematias/im-joining-the-cornell-university-department…>
: @natematias <http://twitter.com/natematias> : blog
<https://natematias.com/external-posts/>
Hi,
Does anybody knows who to contact to request access to Wikipedia search
logs?
I am aware of the previous effort made to make this information public and
the privacy problems involved [1]. Nonetheless, I think that there is a lot
of space for research without publicly disclosing the dataset, namely
through NDAs.
Thanks in advance for your help.
[1]
http://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia-
search-data-now-available/
--
Sérgio Nunes
====================
ECIR 2020 - 42nd European Conference on Information Retrieval
Call for Workshops and Tutorials
Lisbon, Portugal - April 14-17, 2020
http://www.ecir2020.org/
====================
ECIR 2020 Workshop and Tutorial proposals deadline is September 1st,
2019 - we welcome your submission! Please find further information
below.
Workshops
The purpose of workshops is to provide a platform for presenting novel
ideas and research results in a focused and more interactive way.
Workshops can be of either a half-day (3 hours plus breaks) or a full
day (6 hours plus breaks). Workshops are encouraged to be as dynamic
and interactive as possible and should lead to a concrete outcome,
such as the publication of a summary paper and/or workshop
proceedings. The information required for a workshop proposal is on
the conference website. Workshop proposals will be reviewed by the
workshop committee. A summary paper of the workshop will be published
in the conference proceedings.
Please find more information at
http://www.ecir2020.org/call-for-workshops/
Tutorials
Tutorials inform the community on recent advances in core IR research,
related research, or on novel application areas related to IR. They
may focus on specific problems or specific domains in which IR
research may be applied. Tutorials can be of either a half-day (3
hours plus breaks) or a full day (6 hours plus breaks). Tutorials are
encouraged to be as interactive as possible. Please follow tutorial
proposal instructions on the conference website (link below). Tutorial
proposals will be reviewed by the tutorial committee. A summary of the
tutorial will be published in the conference proceedings.
Further information can be found at
http://www.ecir2020.org/call-for-tutorials/
Important dates:
1 September 2019 – Workshop and Tutorial submission
1 October 2019 – Workshop and Tutorial notification
14 April 2020 – Workshops and Tutorials
Hope to see you in Lisbon!