Hello everyone,
We are delighted to share the fantastic news of our interns'
accomplishments during GSoC 2023 and Outreachy Round 26, under the guidance
of their mentors. Please join us in congratulating these remarkable
individuals for their unwavering commitment and hard work:
*GSoC’ Interns*:
1. Ahmed Amine Hassou (Morocco)
Project: Wiki Education Dashboard - Refactoring and upgrading React
Mentors: Sage Ross, Will Kent
2. Chenhao Liu (United States of America)
Project: Wiki Farm Support for Canasta
Mentors: Yaron Koren, Jeffrey Wang
3. Nikhil Mahajan (India)
Project: End-to-end test coverage for Abstract Wikipedia's Wikifunctions
Mentors: Stef Dunlap, James Forrester, Cory Massaro, Denny Vrandečić
4. Zexi Gong (China)
Project: Wikidocumentaries to import images from the web to Structured Data
on Commons
Mentors: TuukkaH, Susannaanas
5. Shashwat Khanna (India)
Project: Improve Programs & Events Dashboard UX for Article Scoped Programs
Mentor: Sage Ross
6. Saurabh Jamadagni (India)
Project: Adding a menu to Scribe-iOS application and expanding keyboard
languages
Mentor: Andrew McAllister
7. Ritika Pahwa (India)
Project: Commons Android app - Make upload more reliable
Mentors: Nicolas Raoul, Kaartic Sivaraam
8. Punith Nayak (India)
Project: Improve the functionality of VideoCutTool - New features
Mentors: Gopa Vasanth, Sohom Datta
9. Varun Shrivastava (India)
Project: Improve the functionality of VideoCutTool - Code Quality/Code
Health
Mentors: Gopa Vasanth, Sohom Datta
Additionally, we have the following *Outreachy interns*:
1. Abhishek Bhardwaj (India)
Project: Research imbalances in translation between languages on Wikipedia
Mentors: Adam Wight, Simulo, Kavitha A
2. Nathaly Toledo (Venezuela)
Project: Research imbalances in translation between languages on Wikipedia
Mentors: Adam Wight, Simulo, Kavitha A
3. Sulagna Saha (Bangladesh)
Project: Write a Ruby gem for analyzing Wikidata edits
Mentors: Sage Ross, Will Kent
These diligent interns have made significant contributions to our
community, and we encourage you to explore their reports and blogs to learn
more about their exceptional work. You can find additional information
about these projects on the following pages:
- https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023
- https://m.mediawiki.org/wiki/Outreachy/Round_26
Furthermore, we are thrilled to inform you that Srishti Sethi and I (Gopa)
recently attended the Google Summer of Code Mentor Summit held at Google
Headquarters in Sunnyvale, California. We represented the outstanding work
of our nine GSoC 2023 interns on Wikimedia projects and had the privilege
of learning from peers in various open-source software fields, including
arts, education, music, robotics, and more. This event, conducted in an
unconference, participant-driven style, provided a platform for attendees
to discuss the challenges of running open-source projects and share
valuable tips and advice for mentoring interns. It's worth noting that
Wikimedia has been an active participant in Google Summer of Code since
2006.
Once again, we extend our heartfelt congratulations to all our interns,
mentors, and everyone who supported them throughout this journey. We
eagerly anticipate their continued contributions to Wikimedia projects.
Cheers,
Wikimedia Organization Administrators (Srishti Sethi, Soham Datta, Sheila
and Gopa)
--
Regards,
Gopa Vasanth <https://gopavasanth.github.io/>
Twitter <https://twitter.com/gopavasanth1999> | LinkedIn
<https://www.linkedin.com/in/gopa-vasanth/> | GitHub
<https://github.com/gopavasanth> | Gerrit
<https://gerrit.wikimedia.org/r/#/q/gopavasanth>
“Yesterday is not ours to recover, but tomorrow is ours to win or lose.”
Hello everyone,
We have a quick update regarding the Mismatch Finder tool. In the next
deployment scheduled for November 1, the tool will let you report
mismatches on qualifiers in addition to the main part of a statement.
A Quick Overview
For those who might not be familiar with the Mismatch Finder, it's a tool
that helps spot potential mismatches between Wikidata Items and external
databases, presenting them to editors for review and correction. This tool
is also used to suggest new statements that should be part of Wikidata, but
need a human-review step before adding them. You can explore more about it
at Wikidata:Mismatch Finder
<https://www.wikidata.org/wiki/Wikidata:Mismatch_Finder>.
What's on the Horizon?
The Mismatch Finder currently allows mismatch providers to report
mismatches to data that is stored in the main part of a statement. In the
upcoming deployment, mismatch providers will have the option to report
mismatches found within qualifiers, as important data is stored there as
well. This change will require mismatch uploaders to adjust their workflow,
as we will introduce a new 'type' column to the accepted CSVs. The CSV
upload will look as follows:
CSV for a Q42 statement with qualifier, where both the statement and the
qualifier are mismatched:
item_id,statement_guid,property_id,wikidata_value,external_value,external_url,type
Q42,Q42$A3B1288B-67A9-4491-A3AA-20F881C292B9,P3373,Q14623673,”Shoshanna
Adams”,example.com,statement
Q42,Q42$A3B1288B-67A9-4491-A3AA-20F881C292B9,P1039,Q10943095,”cousin”,
example.com,qualifier
CSV for a Q42 statement with qualifier, where only the qualifier is
mismatched:
item_id,statement_guid,property_id,wikidata_value,external_value,external_url,type
Q42,Q42$A3B1288B-67A9-4491-A3AA-20F881C292B9,P1039,Q10943095,”cousin”,
example.com,qualifier
This new feature is currently being developed under T313467
<https://phabricator.wikimedia.org/T313467> where you can find more
detailed information and to follow the progress of this enhancement. If you
have questions or concerns or want to provide feedback, please use the
linked Phabbricator ticket or leave us a note at Wikidata talk:Mismatch
Finder <https://www.wikidata.org/wiki/Wikidata_talk:Mismatch_Finder>.
Cheers,
--
Mohammed S. Abdulai
*Community Communications Manager, Wikidata*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0) 30 577 116 2466
https://wikimedia.de
Grab a spot in my calendar for a chat: calendly.com/masssly.
A lot is happening around Wikidata - Keep up to date!
<https://www.wikidata.org/wiki/Wikidata:Status_updates> Current news and
exciting stories about Wikimedia, Wikipedia and Free Knowledge in our
newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>
.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Charlottenburg, VR 23855 B.
Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207. Geschäftsführende Vorstände: Franziska Heine,
Dr. Christian Humborg
Hi everyone,
Our next Wikidata+Wikibase office hours
<https://www.wikidata.org/wiki/Wikidata:Events#Office_hours> will be held
at 17:00 UTC on Wednesday, 18th October 2023 (18:00 Berlin) in the Wikidata
Telegram <https://t.me/joinchat/IeCRo0j5Uag1qR4Tk8Ftsg> group.
*The Wikidata and Wikibase office hours are online events where the
development team presents what we have been working on over the past
quarter, and the community is welcome to ask questions and discuss
important issues related to the development of Wikidata and Wikibase.*
See you there.
--
Mohammed S. Abdulai
*Community Communications Manager, Wikidata*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0) 30 577 116 2466
https://wikimedia.de
Grab a spot in my calendar for a chat: calendly.com/masssly.
A lot is happening around Wikidata - Keep up to date!
<https://www.wikidata.org/wiki/Wikidata:Status_updates> Current news and
exciting stories about Wikimedia, Wikipedia and Free Knowledge in our
newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>
.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Charlottenburg, VR 23855 B.
Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207. Geschäftsführende Vorstände: Franziska Heine,
Dr. Christian Humborg
Dear Wikidata community members,
The Search Platform team has been busy on the Search side improving how we
ingest documents into Elasticsearch
<https://wikitech.wikimedia.org/wiki/Search/Update_Pipeline>. This work is
ramping down and our next priority is to focus on exploring options to
scale Wikidata Query Service as mentioned in our annual plan.
What are we trying to address?
We are convinced that the current highest risk to Wikidata Query Service is
the data size and data growth.
Wikidata is growing at a rate of roughly 1 billion triples per year and is
already one of the largest public SPARQL endpoints on the internet. This is
already causing visible issues, such as queries that used to run in a
reasonable amount of time that are now timing out. It is also creating less
visible issues, both in managing the infrastructure (it took us ~3 months
to reload data from scratch last time we tried) and in the overall
stability of the system (see the Blazegraph failure playbook
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_up…>).
We have strong evidence that increased data size could lead to a hard
scaling limit of the service.
What are we NOT trying to address?
Write load: Wikidata sees around 1 million edits per day. Wikidata Query
Service used to be a bottleneck, and was preventing bots from editing via
maxlag <https://www.mediawiki.org/wiki/Manual:Maxlag_parameter>. This has
been addressed with the Wikidata Streaming Updater and does not need
further work at the moment.
Query load / query optimization: We know there are issues with queries
timing out, and that Wikidata Query Service is sometimes overloaded to the
point where we are dropping queries. The stability of the system is
imperfectly addressed by throttling queries, and more servers have been
added to handle additional load. While this is certainly inconvenient to
WDQS users, we think this is manageable and does not have as much impact as
complete failure of the system due to data size.
Replacing Blazegraph: Blazegraph is unmaintained and will eventually need
to be replaced. Our analysis of alternative backend
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_up…>
shows that there are limited options for a graph the size of Wikidata.
Addressing the graph size first will open more options when we work on
replacing Blazegraph.
What is the plan?
We want to experiment with splitting the Wikidata Query Service graph and
use federation for the queries that need access to all subgraphs. This is a
breaking change, which will require a number of queries to be rewritten,
either to access a new SPARQL endpoint, or to use federation. We want to
have a good understanding of the trade-offs before we commit to any
long-term solution.
We’ve identified separation of scholarly articles as a good first
experiment: Scholarly articles represent roughly half of Wikidata triples
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis…>,
affect only about 2% of queries
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Article…>
(many of which are done as part of the data imports), and such a split
would be easy to understand.
We did consider other potential splits, but they don’t seem as promising.
For example:
-
Truthy vs fully reified graph
<https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Statement_…>:
the truthy graph would be smaller, but we would still need to maintain the
full graph
-
Labels and descriptions: only amounts to ~20% of the graph, and would
require most queries to use federation
-
Astronomical objects: only amounts to ~10% of the graph
To better understand the impact of such a split, we will create a test
environment based on a split of a static dump, and create a test plan based
on a subset of queries that we see in the current WDQS production
environment. Eventually, we will expose this test environment to gather
more feedback from all of you.
What is the approximate schedule?
We anticipate the following, but may need to adjust if there are unforeseen
challenges.
-
By end of January 2024: availability of a somewhat stable testing
environment
-
By end of January 2024: testing of the split on a subset of existing
queries, feedback from all of you about how this split is functioning for
different workloads
-
By end of March 2024: reflection on this experiment and next steps,
either experimentation with a different split, or productionize the current
one
What is NOT part of the plan?
-
Other splits than scholarly articles. Other experiments might come up in
the future, but we want to focus on scholarly articles first.
-
Real-time updates. To reduce the complexity of the experiment, we will
focus on a static dump. If the experiment is successful, more work will be
done to ensure that those split graphs can be updated in real time.
-
Production implementation of multiple graphs: we will only commit to a
production implementation if the experiment is successful.
Success criteria
Part of the experimentation is understanding the impacts of this split, so
we only have imperfect metrics at this time.
-
Blazegraph stability is not threatened by the size of the graph. Our
expectation is that a size reduction of 25% will give us leeway. A proxy
metric for stability is our ability to reload the data from scratch in less
than 10 days.
-
Query time is not increased for most queries.
-
The number of queries requiring rewrite due to federation is minimal.
-
The number of queries rendered too expensive by federation is minimal.
How to learn more?
We will create a wiki page for the project shortly, this will be the main
focus point for discussions. You are always welcome to join the Search
Platform Office Hours
<https://wikitech.wikimedia.org/wiki/Search_Platform/Contact#Office_Hours>
(first Wednesday of every month) to ask more questions and have a direct
discussion with the team.
This communication is also available on wiki
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_up…>
.
Thank you all for your help and support!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello all,
As you may know, the WikidataCon 2023
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2023>, the conference
dedicated to the Wikidata community, will take place on October 28-29.
People around the world can join online
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2023/Attend_online>,
access the content of the conference live or in replay, and discuss with
other participants on the interactive and friendly platform Gathertown.
People living in the neighbouring regions have the possibility to join the
onsite event
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2023/Attend_in_Taipei>
taking place at the National Taipei University.
Whether you join online or onsite, we strongly encourage you to sign
up by filling
out the registration form <https://pretix.eu/wikidatatw/Wikidatacon2023/>
before October 22, so you can receive all the information you need to
attend the event.
While waiting for the WikidataCon to start, you can already have a look at
the exciting program <https://pretalx.com/wikidatacon2023/schedule/> put
together by community members from the ESEAP region and all around the
world.
Finally, you can also consider preparing a present for the Wikidata
birthday and presenting it during the birthday presents lightning talks
session
<https://www.wikidata.org/wiki/Wikidata:Eleventh_Birthday#Birthday_presents_…>,
and join or organize a satellite event
<https://www.wikidata.org/wiki/Wikidata:Eleventh_Birthday/Events> with your
local community.
Questions ? Feel free to contact the organizing team by writing on this
talk page <https://www.wikidata.org/wiki/Wikidata_talk:WikidataCon_2023> or
at contact(a)wikidatacon.org.
We are looking forward to seeing you at the WikidataCon 2023!
—
信件主旨:線上註冊參加或現場參加 WikidataCon 2023
大家好,
WikidataCon 2023 是專門針對 Wikidata 社群的會議,將於 10 月 28 日至 29 日舉行。
來自全球的人可以在線上參加,選擇觀看直播或看重播的會議內容,並在具有互動性和友善的平台 Gather Town 上,與其他會議參與者進行討論。
居住在臺灣鄰近區域的人,將有機會參加在國立臺北大學舉辦的現場活動。
無論你是參加線上,或是參加實體,我們強烈建議:請於 10 月 22 日以前,填寫報名表,以便你可以取得參加活動所需的所有資訊。
期待 WikidataCon 開始的時間,你可以看到由來自 ESEAP 地區與世界各地的社群成員已經彙整出的精彩議程內容。
最後,你還可以考慮,為 Wikidata Birthday 準備一項禮物,並在生日禮物閃電演講會議期間獻上祝福,並與當地社群一起參加或組織衛星活動。
還有哪些問題嗎? 請隨時在此討論頁面上寫信,或發送電子郵件至 contact(a)wikidatacon.org 與主辦團隊聯絡。
我們期待在 WikidataCon 2023 上見到你!
--
Léa Lacroix
Community Engagement & Events Consultant
Contractor for Wikimedia Deutschland e.V.
Hello all,
Following the past events dedicated to data quality and data reuse, the
Wikidata team wanted to host a new gathering dedicated to data modelling.
The Data Modelling Days
<https://www.wikidata.org/wiki/Wikidata:Events/Data_Modelling_Days_2023>
will take place online over three days and will host a variety of
discussions, workshops and practical sessions on the topics of Wikidata
ontologies, EntitySchemas, modelling issues and various other challenges.
The event is open to everyone, regardless of your experience with modelling
data on Wikidata. We particularly encourage people who are working on
specific topics to join the event and present their modelling challenges.
If you know people or groups who are already discussing modelling issues on
Wikidata, or would have something interesting to contribute, please share
this message with them!
You can find more information on the dedicated page
<https://www.wikidata.org/wiki/Wikidata:Events/Data_Modelling_Days_2023>, sign
up
<https://www.wikidata.org/wiki/Wikidata:Events/Data_Modelling_Days_2023/Part…>
and let us know what you are interested in, you can already propose discussions
and workshops on the talk page
<https://www.wikidata.org/wiki/Wikidata_talk:Events/Data_Modelling_Days_2023>
until November 19th.
If you cannot attend, don’t worry, most sessions will be recorded, notes
will be taken and slides will be shared.
We are looking forward to seeing you and learning more about your modelling
challenges during the Data Modelling Days! If you have any questions, feel
free to reach out to me.
Best,
--
Léa Lacroix
Community Engagement & Events Consultant
Contractor for Wikimedia Deutschland e.V.
Hello all,
As you may know, Wikidata was launched on October 29, 2012, and every year
in October, we celebrate the anniversary of the project with birthday
wishes, events and cake. As we will celebrate the 11th birthday of Wikidata
this year, we wanted to share with you how you can get involved in this
event!
1. Prepare a birthday present
Every year for the birthday, people prepare some presents for the Wikidata
community. These presents can be useful, fun or interesting: a new Wikidata
tool, a new WikiProject, a logo or another piece of art, a blog post, an
important community discussion… They can be worked on alone or in
collaboration with other people.
If you have ideas for a Wikidata birthday, now is the perfect time to start
thinking about it, and finding other people to work with you! Once the
present is ready to be announced, you can add it to the Presents page
<https://www.wikidata.org/wiki/Wikidata:Eleventh_Birthday/Presents>. If
you’re looking for inspiration, you can check what was done for the
previous anniversaries on the Eleventh birthday page
<https://www.wikidata.org/wiki/Wikidata:Eleventh_Birthday>.
2. Attend the WikidataCon 2023
The WikidataCon 2023
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2023> will take place
on October 28-29 with a hybrid format: while people from Taiwan and the
neighbouring regions can join the onsite event in Taipei, people from all
around the world will participate online. Most sessions will be broadcasted
and recorded, in English and Chinese.
If you would like to participate, you can register for the event
<https://pretix.eu/wikidatatw/Wikidatacon2023/>, and check out the program
<https://pretalx.com/wikidatacon2023/schedule/>.
3. Present your birthday gift during the WikidataCon 2023
You prepared a present for Wikidata's birthday and would like to present it
to the community? You can sign up for the birthday presents lightning talks
session that will take place online during the WikidataCon 2023
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2023>, on Day 2.5, October
29, at 14:15 UTC.
To register for a slot, please read the instructions and add your project
to this page
<https://etherpad.wikimedia.org/p/WikidataCon2023-BirthdayLTsession>.
Please make sure that you are registered for the WikidataCon 2023
<https://pretix.eu/wikidatatw/Wikidatacon2023/> in order to access the
session.
4. Join or organize other events
You can also autonomously organize a distributed birthday event with your
community or in your area: when your event is ready to be announced, please add
details on the Distributed events page
<https://www.wikidata.org/wiki/Wikidata:Eleventh_Birthday/Events>.
To connect with other people organizing Wikidata-related events, feel free
to join the Wikidata Events Telegram group
<https://t.me/joinchat/HGjGexK8LA2wJZEk1x1p_A>.
If you have any questions or need support, feel free to contact me or to
write a message on this talk page
<https://www.wikidata.org/wiki/Wikidata_talk:Eleventh_Birthday>.
Looking forward to interacting with you all around the Wikidata birthday!
Best,
--
Léa Lacroix
Community Engagement & Events Consultant
Contractor for Wikimedia Deutschland e.V.