Dropping my two cents here: I'm wondering about the Wikidata Linked Data
Fragments (LDF) service [1] usage.
LDF [2] is nice because it shifts the computation burden to the client,
at the cost of less expressive SPARQL queries, IIRC.
I think it would be a good idea to forward simple queries to that
service, instead of WDQS.
Cheers,
Marco
[1]
https://query.wikidata.org/bigdata/ldf
[2]
https://linkeddatafragments.org/
On 8/19/21 12:48 AM, Imre Samu wrote:
> > (i) identify and delete lower priority data (e.g. labels,
> descriptions, aliases, non-normalized values, etc);
>
> Ouch.
> For me
> - as a native Hungarian: the labels, descriptions, aliases - is
> extremely important
> - as a data user: I am using "labels","aliases" in my
concordances tools
> ( mapping wikidata-ids with external ids )
>
> So Please clarify the practical meaning of the *"delete"*
> *
> *Thanks in advance,
> Imre
>
>
>
> Mike Pham <mpham(a)wikimedia.org <mailto:mpham@wikimedia.org>> ezt írta
> (időpont: 2021. aug. 18., Sze, 23:08):
>
> Wikidata community members,
>
>
> Thank you for all of your work helping Wikidata grow and improve
> over the years. In the spirit of better communication, we would like
> to take this opportunity to share some of the current challenges
> Wikidata Query Service (WDQS) is facing, and some strategies we have
> for dealing with them.
>
>
> WDQS currently risks failing to provide acceptable service quality
> due to the following reasons:
>
> 1.
>
> Blazegraph scaling
>
> 1.
>
> Graph size. WDQS uses Blazegraph as our graph backend. While
> Blazegraph can theoretically support 50 billion edges
> <https://blazegraph.com/>, in reality Wikidata is the
> largest graph we know of running on Blazegraph (~13 billion
> triples
>
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>),
> and there is a risk that we will reach a size
>
<https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit
> of what it can realistically support
> <https://phabricator.wikimedia.org/T213210>. Once Blazegraph
> is maxed out, WDQS can no longer be updated. This will also
> break Wikidata tools that rely on WDQS.
>
> 2.
>
> Software support. Blazegraph is end of life software, which
> is no longer actively maintained, making it an unsustainable
> backend to continue moving forward with long term.
>
>
> Blazegraph maxing out in size poses the greatest risk for
> catastrophic failure, as it would effectively prevent WDQS from
> being updated further, and inevitably fall out of date. Our long
> term strategy to address this is to move to a new graph backend that
> best meets our WDQS needs and is actively maintained, and begin the
> migration off of Blazegraph as soon as a viable alternative is
> identified <https://phabricator.wikimedia.org/T206560>.
>
>
> In the interim period, we are exploring disaster mitigation options
> for reducing Wikidata’s graph size in the case that we hit this
> upper graph size limit: (i) identify and delete lower priority data
> (e.g. labels, descriptions, aliases, non-normalized values, etc);
> (ii) separate out certain subgraphs (such as Lexemes and/or
> scholarly articles). This would be a last resort scenario to keep
> Wikidata and WDQS running with reduced functionality while we are
> able to deploy a more long-term solution.
>
>
>
> 2.
>
> Update and access scaling
>
> 1.
>
> Throughput. WDQS is currently trying to provide fast
> updates, and fast unlimited queries for all users. As the
> number of SPARQL queries grows over time
>
<https://www.mediawiki.org/wiki/User:MPopov_(WMF)/Wikimania_2021_Hackathon>alongside
> graph updates, WDQS is struggling to sufficiently keep up
>
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&from=now-6M&to=now&refresh=1d>in
> each dimension of service quality without compromising
> anywhere. For users, this often leads to timed out queries.
>
> 2.
>
> Equitable service. We are currently unable to adjust system
> behavior per user/agent. As such, it is not possible to
> provide equitable service to users: for example, a heavy
> user could swamp WDQS enough to hinder usability by
> community users.
>
>
> In addition to being a querying service for Wikidata, WDQS is also
> part of the edit pipeline of Wikidata (every edit on Wikidata is
> pushed to WDQS to update the data there). While deploying the new
> Flink-based Streaming Updater
> <https://phabricator.wikimedia.org/T244590>will help with increasing
> throughput of Wikidata updates, there is a substantial risk that
> WDQS will be unable to keep up with the combination of increased
> querying and updating, resulting in more tradeoffs between update
> lag and querying latency/timeouts.
>
>
> In the near term, we would like to work more closely with you to
> determine what acceptable trade-offs would be for preserving WDQS
> functionality while we scale up Wikidata querying. In the long term,
> we will be conducting more user research to better understand your
> needs so we can (i) optimize querying via SPARQL and/or other
> methods, (ii) explore better user management that will allow us to
> prevent heavy use of WDQS that does not align with the goals of our
> movement and projects, and (iii) make it easier for users to set up
> and run their own query services.
>
>
> Though this information about the current state of WDQS may not be a
> total surprise to many of you, we want to be as transparent with you
> as possible to ensure that there are as few surprises as possible in
> the case of any potential service disruptions/catastrophic failures,
> and that we can accommodate your work as best as we can in the
> future evolution of WDQS. We plan on doing a session on WDQS scaling
> challenges during WikidataCon this year at the end of October.
>
>
> Thanks for your understanding with these scaling challenges, and for
> any feedback you have already been providing. If you have new
> concerns, comments and questions, you can best reach us at this talk
> page
>
<https://www.wikidata.org/wiki/Wikidata_talk:Query_Service_scaling_update_Aug_2021>.
> Additionally, if you have not had a chance to fill out our survey
>
<https://docs.google.com/forms/d/e/1FAIpQLSe1H_OXQFDCiGlp0QRwP6-Z2CGCgm96MWBBmiqsMLu0a6bhLg/viewform?usp=sf_link>yet,
> please tell us how you use the Wikidata Query Service (see privacy
> statement
>
<https://foundation.wikimedia.org/wiki/WDQS_User_Survey_2021_Privacy_Statement>)!
> Whether you are an occasional user or create tools, your feedback is
> needed to decide our future development.
>
>
> Best,
>
>
> WMF Search + WMDE
>
> _______________________________________________
> Wikidata mailing list -- wikidata(a)lists.wikimedia.org
> <mailto:wikidata@lists.wikimedia.org>
> To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
> <mailto:wikidata-leave@lists.wikimedia.org>
>
>
> _______________________________________________
> Wikidata mailing list -- wikidata(a)lists.wikimedia.org
> To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
>