Hello all!
We have been hard at work on our Graph Split experiment [1], and we now
have a working graph split that is loaded onto 3 test servers. We are
running tests on a selection of queries from our logs to help understand
the impact of the split. We need your help to validate the impact of
various use cases and workflows around Wikidata Query Service.
**What is the WDQS Graph Split experiment?**
We want to address the growing size of the Wikidata graph by splitting it
into 2 subgraphs of roughly half the size of the full graph, which should
support the growth of Wikidata for the next 5 years. This experiment is
about splitting the full Wikidata graph into a scholarly articles subgraph
and a “main” graph that contains everything else.
See our previous update for more details [2].
**Who should care?**
Anyone who uses WDQS through the UI or programmatically should check the
impact on their use cases, scripts, bots, code, etc.
**What are those test endpoints?**
We expose 3 test endpoints, for the full, main and scholarly articles
graphs. Those graphs are all created from the same dump and are not live
updated. This allows us to compare queries between the different endpoints,
with stable / non changing data (the data are from the middle of October
2023).
The endpoints are:
* https://query-full-experimental.wikidata.org/
* https://query-main-experimental.wikidata.org/
* https://query-scholarly-experimental.wikidata.org/
Each of the endpoints is backed by a single dedicated server of performance
similar to the production WDQS servers. We don’t expect performance to be
representative of production due to the different load and to the lack of
updates on the test servers.
**What kind of feedback is useful?**
We expect queries that don’t require scholarly articles to work
transparently on the “main” subgraph. We expect queries that require
scholarly articles to need to be rewritten with SPARQL federation between
the “main” and scholarly subgraphs (federation is supported for some
external SPARQL servers already [3], this just happens to be for internal
server-to-server communication). We are doing tests and analysis based on a
sample of query logs.
**We want to hear about:**
General use cases or classes of queries which break under federation
Bots or applications that need significant rewrite of queries to work with
federation
And also about use cases that work just fine!
Examples of queries and pointers to code will be helpful in your feedback.
**Where should feedback be sent?**
You can reach out to us using the project’s talk page [1], the Phabricator
ticket for community feedback [4] or by pinging directly Sannita (WMF) [5].
**Will feedback be taken into account?**
Yes! We will review feedback and it will influence our path forward. That
being said, there are limits to what is possible. The size of the Wikidata
graph is a threat to the stability of WDQS and thus a threat to the whole
Wikidata project. Scholarly articles is the only split we know of that
would reduce the graph size sufficiently. We can work together on providing
support for a migration, on reviewing the rules used for the graph split,
but we can’t just ignore the problem and continue with a WDQS that provides
transparent access to the full Wikidata graph.
Have fun!
Guillaume
[1]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split
[2]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_up…
[3]
https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Federation
[4] https://phabricator.wikimedia.org/T356773
[5] https://www.wikidata.org/wiki/User:Sannita_(WMF)
--
Guillaume Lederrey (he/him)
Engineering Manager
Wikimedia Foundation
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, February 7, 2024
Time: 16:00-17:00 UTC / 08:00 PST / 11:00 EST / 17:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
Community Wishathon is coming next month: March 15th-17th, 2024! So far, 78
participants have signed up to attend this online event since its first
announcement in December 2023 🎉
You can check out the project ideas and schedule on the event page at <
https://meta.wikimedia.org/wiki/Event:WishathonMarch2024>. These ideas come
from the wishes people share in Community Wishlist Survey <
https://meta.wikimedia.org/wiki/Community_Wishlist_Survey>. If you're a
user, developer, designer, or product lead, you are welcome to join! You
can sign up for the event on the wiki page and add your name, time zone,
and preferred contact method to a project by *February 15th, 2024*.
Before the event, you can learn about the projects and connect with others
who are interested. You can contact them using their preferred contact
method and consider forming project groups on Telegram.
During the event, there will be an opening session to introduce the event
and project ideas, async time for working on projects, self-organized
virtual project group meetings, and a showcase ceremony to share what each
group accomplished during the event. If you want to help with the event or
have questions, let us know on the event's talk page.
Cheers,
Srishti
On behalf of the Wishathon organizing committee
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello everyone,
The second edition of the Language & Internationalization newsletter
(January 2024) is available at this link: <
https://www.mediawiki.org/wiki/Wikimedia_Language_engineering/Newsletter/20…
>.
This newsletter is compiled by the Wikimedia Language team. It provides
updates from October–December 2023 quarter on new feature development,
improvements in various language-related technical projects and support
efforts, details about community meetings, and contributions ideas to get
involved in projects.
To stay updated, you can subscribe to the newsletter on its wiki page. If
you have any feedback or ideas for topics to feature in the newsletter,
please share them on the discussion page, accessible here: <
https://www.mediawiki.org/w/index.php?title=Talk:Wikimedia_Language_enginee…
>.
Cheers,
Srishti
On behalf of the WMF Language team
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>