Hello everyone,
Wikimedia is participating in the winter edition of this year's Outreachy <
https://www.outreachy.org/> [1] (December 2021–February 2022)! The deadline
to submit projects on the Outreachy website is *September 30th, 2021*.
If you would like to share an idea for a project that you would like to
mentor or you are not familiar with the program and want to learn anything
more about it, feel free to reply to this email or leave a note on <
https://phabricator.wikimedia.org/T289893> [2].
*About the Outreachy program*:
Outreachy offers three-month internships to work remotely in Free and Open
Source Software (FOSS), coding and non-coding projects with experienced
mentors. These internships run twice a year–from May to August and December
to March. Interns are paid a stipend of USD 5,500 for the three months of
work. They also have a USD 500 stipend to travel to conferences and events.
Interns often find employment after their internship with Outreachy
sponsors or jobs that use the skills they learned during their internship.
This program is open to both students and non-students. Outreachy expressly
invites the following people to apply:
- Women (both cis and trans), trans men, and genderqueer people.
- Anyone who faces under-representation, systematic bias, or
discrimination in the technology industry in their country of residence.
- Residents and nationals of the United States of any gender who are
Black/African American, Hispanic/Latinx, Native American/American Indian,
Alaska Native, Native Hawaiian, or Pacific Islander.
See a blog post highlighting experiences and outcomes of interns who
participated in a previous round of Outreachy with Wikimedia <
https://techblog.wikimedia.org/2021/06/02/outreachy-round-21-experiences-an…>
[3]
Some tips for mentors for proposing projects:
- Follow this task description template when you propose a project in
Phabricator: <
https://phabricator.wikimedia.org/tag/outreach-programs-projects/> [4].
Add #Outreachy (Round 23) tag to it.
- Remember, the project should require an experienced developer ~15 days
to complete and a newcomer ~3 months.
- Each project should have at least two mentors, and one of them should
hold a technical background.
- When it comes to picking a project, you could propose one that is:
- Relevant for your language community or brings impact to the
Wikimedia ecosystem in the future.
- Welcoming and newcomer-friendly and has a moderate learning curve.
- A new idea you are passionate about, there are no deadlines
attached to it; you always wanted to see it happen but couldn't
due to lack
of resources help!
- About developing a standalone tool (possibly hosted on Wikimedia
Toolforge), with fewer dependencies on Wikimedia's core
infrastructure, it
doesn't necessarily require a specific programming language.
See roles and responsibilities of an Outreachy mentor <
https://www.mediawiki.org/wiki/Outreachy/Mentors> [5].
We look forward to your participation!
Cheers,
Srishti
(On behalf of the organization team)
[1] https://www.outreachy.org/
[2] https://phabricator.wikimedia.org/T289893
[3]
https://techblog.wikimedia.org/2021/06/02/outreachy-round-21-experiences-an…
[4] https://phabricator.wikimedia.org/tag/outreach-programs-projects/
[5] https://www.mediawiki.org/wiki/Outreachy/Mentors
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
As you may know, Wikibase currently does not normalize pagenames/filenames
on save (e.g. underscores in the input for properties of datatype Commons
media are allowed). At the same time, Wikidata’s quality constraints
extension
<https://www.mediawiki.org/wiki/Extension:WikibaseQualityConstraints>
triggers a constraint violation after saving, if underscores are used. This
is by design as to long-established
<https://www.wikidata.org/wiki/Template:Constraint:Commons_link> Community
practices. As a result, this inconsistency leaves users with unnecessary
manual work.
We will update Wikibase so that when a new edit is saved via UI or API, and
a pagename/filename is added or changed in that edit, then this
pagename/filename will be normalized on save ("My file_name.jpg" -> "My
file name.jpg").
More generally, the breaking change is that a user of the Wikibase API may
send one data value when saving an edit, and get back a slightly different
(normalized) data value after the edit was made: it is no longer the case
that data values are either saved unmodified or totally rejected (e.g. if a
file doesn’t exist on Commons). Since this guarantee is being removed with
this breaking change announcement, we may introduce further normalizations
in the future and only announce them as significant changes, not breaking
changes.
The change is currently available on test.wikidata.org and
test-commons.wikimedia.org. It will be deployed on Wikidata on or shortly
after September 6th. If you have any questions or feedback, please feel
free to let us know in this ticket
<https://phabricator.wikimedia.org/T251480>.
Cheers,
Lucas Werkmeister
--
Lucas Werkmeister (he/er)
Full Stack Developer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata community members,
Thank you for all of your work helping Wikidata grow and improve over the
years. In the spirit of better communication, we would like to take this
opportunity to share some of the current challenges Wikidata Query Service
(WDQS) is facing, and some strategies we have for dealing with them.
WDQS currently risks failing to provide acceptable service quality due to
the following reasons:
1.
Blazegraph scaling
1.
Graph size. WDQS uses Blazegraph as our graph backend. While
Blazegraph can theoretically support 50 billion edges
<https://blazegraph.com/>, in reality Wikidata is the largest graph
we know of running on Blazegraph (~13 billion triples
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=…>),
and there is a risk that we will reach a size
<https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit
of what it can realistically support
<https://phabricator.wikimedia.org/T213210>. Once Blazegraph is maxed
out, WDQS can no longer be updated. This will also break Wikidata tools
that rely on WDQS.
2.
Software support. Blazegraph is end of life software, which is no
longer actively maintained, making it an unsustainable backend
to continue
moving forward with long term.
Blazegraph maxing out in size poses the greatest risk for catastrophic
failure, as it would effectively prevent WDQS from being updated further,
and inevitably fall out of date. Our long term strategy to address this is
to move to a new graph backend that best meets our WDQS needs and is
actively maintained, and begin the migration off of Blazegraph as soon as a
viable alternative is identified <https://phabricator.wikimedia.org/T206560>
.
In the interim period, we are exploring disaster mitigation options for
reducing Wikidata’s graph size in the case that we hit this upper graph
size limit: (i) identify and delete lower priority data (e.g. labels,
descriptions, aliases, non-normalized values, etc); (ii) separate out
certain subgraphs (such as Lexemes and/or scholarly articles). This would
be a last resort scenario to keep Wikidata and WDQS running with reduced
functionality while we are able to deploy a more long-term solution.
1.
Update and access scaling
1.
Throughput. WDQS is currently trying to provide fast updates, and
fast unlimited queries for all users. As the number of SPARQL queries
grows over time
<https://www.mediawiki.org/wiki/User:MPopov_(WMF)/Wikimania_2021_Hackathon>alongside
graph updates, WDQS is struggling to sufficiently keep up
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=…>
in each dimension of service quality without compromising anywhere. For
users, this often leads to timed out queries.
2.
Equitable service. We are currently unable to adjust system behavior
per user/agent. As such, it is not possible to provide equitable
service to
users: for example, a heavy user could swamp WDQS enough to hinder
usability by community users.
In addition to being a querying service for Wikidata, WDQS is also part of
the edit pipeline of Wikidata (every edit on Wikidata is pushed to WDQS to
update the data there). While deploying the new Flink-based Streaming
Updater <https://phabricator.wikimedia.org/T244590> will help with
increasing throughput of Wikidata updates, there is a substantial risk that
WDQS will be unable to keep up with the combination of increased querying
and updating, resulting in more tradeoffs between update lag and querying
latency/timeouts.
In the near term, we would like to work more closely with you to determine
what acceptable trade-offs would be for preserving WDQS functionality while
we scale up Wikidata querying. In the long term, we will be conducting more
user research to better understand your needs so we can (i) optimize
querying via SPARQL and/or other methods, (ii) explore better user
management that will allow us to prevent heavy use of WDQS that does not
align with the goals of our movement and projects, and (iii) make it easier
for users to set up and run their own query services.
Though this information about the current state of WDQS may not be a total
surprise to many of you, we want to be as transparent with you as possible
to ensure that there are as few surprises as possible in the case of any
potential service disruptions/catastrophic failures, and that we can
accommodate your work as best as we can in the future evolution of WDQS. We
plan on doing a session on WDQS scaling challenges during WikidataCon this
year at the end of October.
Thanks for your understanding with these scaling challenges, and for any
feedback you have already been providing. If you have new concerns,
comments and questions, you can best reach us at this talk page
<https://www.wikidata.org/wiki/Wikidata_talk:Query_Service_scaling_update_Au…>.
Additionally, if you have not had a chance to fill out our survey
<https://docs.google.com/forms/d/e/1FAIpQLSe1H_OXQFDCiGlp0QRwP6-Z2CGCgm96MWB…>
yet, please tell us how you use the Wikidata Query Service (see privacy
statement
<https://foundation.wikimedia.org/wiki/WDQS_User_Survey_2021_Privacy_Stateme…>)!
Whether you are an occasional user or create tools, your feedback is needed
to decide our future development.
Best,
WMF Search + WMDE
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service, Wikimedia
Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, September 1st, 2021
Time: 15:00-16:00 GMT / 08:00-09:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
*NOTE: We have a new Google Meet link as of August 2021, which offers
international calling options.*
Hope to talk to you in a week!
—Trey
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC–4 / EDT
Dear all,
we're doing a Scholia hackathon right now, and in the context of
internationalization, one of the issues that came up was how AUTO_LANGUAGE
can be used outside the WDQS GUI.
Our intuitive assumption would have been that the AUTO_LANGUAGE gets
inserted into the query, that query then runs on WDQS, and the results are
displayed/ available via JSON. However, it seems that the query is being
sent to WDQS with AUTO_LANGUAGE as a variable.
Specifically, the JSON response to a query containing AUTO_LANGUAGE does
not seem to contain the language information. Is this correct/ intended?
More details via
https://github.com/WDscholia/scholia/issues/1640#issuecomment-908575849
Thanks for any pointers,
Daniel
Dear all,
[Apologies for cross-posting]
I'm posting here because there is an open devOps position at the Open
Science Lab in TIB Hannover where I work, and it might be of interest to
people on this list.
>>
https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/
job-advertisement-no-62-2021
<https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details
/job-advertisement-no-62-2021>
We are looking for someone with experience in OSS / Mediawiki / Wikibase
software (ideally) hence I'm posting here. Please feel free to spread
the word if you know anyone who might be interested and feel free to
reach out to me directly at lozana.rossenova(a)tib.eu
<mailto:lozana.rossenova@tib.eu> if you have any questions and want to
learn more.
The position is in Germany, but remote work is also possible.
Cheers,
Lozana Rossenova
--
Research Associate
Open Science Lab
Hello all,
The *Data Quality Days
<https://www.wikidata.org/wiki/Wikidata:Events/Data_Quality_Days_2021>*, a
series of community-powered events on the topic of data quality, will take
place online from *September 8th to 15th*. Together, we hope to start some
interesting discussions about data quality and highlight this topic through
various angles, to explore what data quality means in different areas of
Wikidata, to bring together people who are working on data quality on
Wikidata and who want to contribute, and to highlight and create tools that
can be useful when working on data quality.
The sessions are taking place online, at any time during the 8 days of the
event, and the program is built by and for the Wikidata community. If you
have any idea of presentation, workshop or discussion you would like to
facilitate during the Data Quality Days, feel free to add it directly into
the schedule. If you have any questions or need some help with preparing a
session, feel free to ask on the talk page
<https://www.wikidata.org/wiki/Wikidata_talk:Events/Data_Quality_Days_2021> or
to reach out to me directly.
Looking forward to talk about data quality with you all!
Cheers,
--
Léa Lacroix
Community Engagement Coordinator
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
This breaking change is relevant for anyone who consumes Wikidata RDF data
through Special:EntityData (rather than the dumps) without using the “dump”
flavor.
When an Item references other entities (e.g. the statement P31:Q5), the
non-dump (?flavor=dump) RDF output of that Item would include the labels
and descriptions of the referenced entities (e.g. P31 and Q5) in all
languages. That bloats the output drastically and causes performance
issues. See Special:EntityData/Q1337.rdf
<https://www.wikidata.org/wiki/Special:EntityData/Q1337.rdf> as an example.
We will change this so that for referenced entities, only labels and
descriptions in the request language (set e.g. via ?uselang=) and its
fallback languages are included in the response. For the main entity being
requested, labels, descriptions and aliases are still included in all
languages available, of course.
If you don’t actually need this “stub” data of referenced entities at all,
and are only interested in data about the main entity being requested, we
encourage you to use the “dump” flavor instead (include flavor=dump in the
URL parameters). In that case, this change will not affect you at all,
since the dump flavor includes no stub data, regardless of language.
This change is currently available for testing at test.wikidata.org. It
will be deployed on Wikidata on August 23rd. You are welcome to give us
general feedback by leaving a comment in this ticket
<https://phabricator.wikimedia.org/T285795>.
If you have any questions please do not hesitate to ask.
Cheers,
--
Mohammed Sadat
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
<https://www.wikimedia.de/newsletter/>.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.