On 25August, 2021 at 09:41:28, Samuel Klein (meta.sj(a)gmail.com
<mailto:meta.sj@gmail.com>) wrote:
Aha, hello jerven :) I should have
remembered your earlier
comment, delighted you are here.
Thank you again for sharing your promising experience +
benchmarks + suggestions -- and for highlighting both
similarities and differences.
SJ
On Tue, Aug 24, 2021 at 2:18 AM jerven Bolleman
<jerven.bolleman(a)sib.swiss> <mailto:jerven.bolleman@sib.swiss>
wrote:
Hi Samuel, All,
I am the software engineer responsible for
sparql.uniprot.org <http://sparql.uniprot.org>.
I already offered to help in
https://phabricator.wikimedia.org/T206561
<https://phabricator.wikimedia.org/T206561>.
So no need to ask Andra or Egon ;)
While we are good users of virtuoso, and strongly suggest it is
evaluated. As it is in general a good product that does
scale.[1]
One of the things we did differently than WDQS is to
introduce a
controlled layer between the "public" and the "database".
To allow things like query rewriting/redirection upon data
model
changes, as well as rewriting some schema rediscovery
queries to a known
faster query. We also parse the queries with RDF4J before
handing them
to virtuoso. This makes sure that the queries that we accept
are only
valid SPARQL 1.1. Avoiding users getting used to almost
SPARQL dialects
(i.e. retain the flexiblity to move to a different
endpoint). We are in
the process of updating this code and contributing it to
RDF4J, with the
first contribution in the develop/4.0.0 branch
I think a number of current customizations in WDQS can be
moved to a
front RDF4J layer. Then the RDF4J sail/repository layer can
be used to
preserve flexibility. So that WDQS can more easily switch
between
backend databases in the future.
One large difference between UniProt and WDQS is that
WikiData is
continually updated while UniProt is batch released a few
times a year.
WDQS is somewhat easier in some areas and more difficult in
others
because of that.
Regards,
Jerven
[1] No Database is perfect, but it does scale a lot better than
Blazegraph did. Which we also evaluated in the past. There
is still a
lot of potential in Virtuoso to scale even better in the future.
On 23/08/2021 21:36, Samuel Klein wrote:
Ah, that's lovely. Thanks for the update,
Kingsley!
Uniprot is a good
parallel to keep in mind.
For Egon, Andra, others who work with them: Is there
someone you'd
recommend chatting with at uniprot?
"scaling alongside uniprot" or at least engaging them on
how to
solve
shared + comparable issues (they also offer
authentication-free SPARQL
querying) sounds like a compelling option.
S.
On Thu, Aug 19, 2021 at 4:32 PM Kingsley Idehen via Wikidata
<wikidata(a)lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>
<mailto:wikidata@lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>>> wrote:
On 8/18/21 5:07 PM, Mike Pham wrote:
>
> Wikidata community members,
>
>
> Thank you for all of your work helping Wikidata grow
and improve
> over the years. In the spirit of better
communication, we would
> like to take this opportunity to share
some of the
current
> challenges Wikidata Query Service (WDQS)
is facing,
and some
> strategies we have for dealing with
them.
>
>
> WDQS currently risks failing to provide acceptable
service
quality
> due to the following reasons:
>
> 1.
>
> Blazegraph scaling
>
> 1.
>
> Graph size. WDQS uses Blazegraph as our graph
backend.
> While Blazegraph can
theoretically support 50
billion
> edges <https://blazegraph.com/
<https://blazegraph.com/>>, in reality Wikidata is
> the largest graph we know of
running on
Blazegraph (~13
> billion triples
>
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m>>),
> and there is a risk that we will
reach a size
>
<https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29
<https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>>limit
> of what it can realistically
support
> <https://phabricator.wikimedia.org/T213210
<https://phabricator.wikimedia.org/T213210>>. Once
> Blazegraph is maxed out, WDQS can
no longer
be updated.
> This will also break Wikidata
tools that rely
on WDQS.
>
> 2.
>
> Software support. Blazegraph is end of life
software,
> which is no longer actively
maintained,
making it an
> unsustainable backend to continue
moving
forward with long
> term.
>
>
> Blazegraph maxing out in size poses the greatest risk for
> catastrophic failure, as it would effectively prevent
WDQS from
> being updated further, and inevitably
fall out of
date. Our long
> term strategy to address this is to move
to a new
graph backend
> that best meets our WDQS needs and is
actively
maintained, and
> begin the migration off of Blazegraph as
soon as a viable
> alternative is identified
> <https://phabricator.wikimedia.org/T206560
<https://phabricator.wikimedia.org/T206560>>.
>
Hi Mike,
Do bear in mind that pre and post selection of
Blazegraph for
Wikidata, we've always offered an
RDF-based DBMS that
can handle
current and future requirements for Wikidata,
just as
we do DBpedia.
At the time of our first rendezvous, handling 50
billion triples
would have typically required our Cluster
Edition
which is a
Commercial Only offering -- basically, that
was the
deal breaker
back then.
Anyway, in recent times, our Open Source Edition has
evolved to
handle some 80 Billion+ triples (exemplified
by the
live Uniprot
instance) where performance and scale is
primary a
function of
available memory.
I hope this helps.
Related:
[1]
https://wikidata.demo.openlinksw.com/sparql
<https://wikidata.demo.openlinksw.com/sparql>
<https://wikidata.demo.openlinksw.com/sparql
<https://wikidata.demo.openlinksw.com/sparql>>-- Our Live
Wikidata
SPARQL Query Endpoint
[2]
https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97f…
<https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0>
<https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
<https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0>>
-- Google Spreadsheet about various Virtuoso
Configurations
associated with some well-known public
endpoints
[3]
https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE>
<https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE>> -- this
query
doesn't complete with the current
Blazegraph-based
Wikidata endpoint
[4]
https://t.co/GTATPPJNBI
<https://t.co/GTATPPJNBI>
<https://t.co/GTATPPJNBI
<https://t.co/GTATPPJNBI>> -- same
query
completing when applied to the Virtuoso-based
endpoint
[5]
https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69>
<https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69>> -- about
> loading Wikidata's datasets into a Virtuoso instance
> [6]
https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&am…
<https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live>
<https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live
<https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live>>
-- various demos shared via Twitter over the
years
regarding Wikidata
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home
Page:http://www.openlinksw.com <http://www.openlinksw.com>
<http://www.openlinksw.com
<http://www.openlinksw.com>>
<https://community.openlinksw.com>
<https://community.openlinksw.com
<https://community.openlinksw.com>>
<https://medium.com/openlink-software-blog>
<https://medium.com/openlink-software-blog
<https://medium.com/openlink-software-blog>>
<https://medium.com/virtuoso-blog>
<https://medium.com/virtuoso-blog
<https://medium.com/virtuoso-blog>>
Data Access Drivers
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
<https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers>
<https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
<https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers>>
Personal Weblogs (Blogs):
Medium
Blog:https://medium.com/@kidehen
<https://medium.com/@kidehen> <https://medium.com/@kidehen
<https://medium.com/@kidehen>>
<http://www.openlinksw.com/blog/~kidehen/>
<http://www.openlinksw.com/blog/~kidehen/
<http://www.openlinksw.com/blog/~kidehen/>>
<http://kidehen.blogspot.com> <http://kidehen.blogspot.com
<http://kidehen.blogspot.com>>
<https://www.pinterest.com/kidehen/>
<https://www.pinterest.com/kidehen/
<https://www.pinterest.com/kidehen/>>
Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
<https://www.quora.com/profile/Kingsley-Uyi-Idehen>
<https://www.quora.com/profile/Kingsley-Uyi-Idehen
<https://www.quora.com/profile/Kingsley-Uyi-Idehen>>
<https://twitter.com/kidehen> <https://twitter.com/kidehen
<https://twitter.com/kidehen>>
<https://plus.google.com/+KingsleyIdehen/about>
<https://plus.google.com/+KingsleyIdehen/about
<https://plus.google.com/+KingsleyIdehen/about>>
<http://www.linkedin.com/in/kidehen>
<http://www.linkedin.com/in/kidehen
<http://www.linkedin.com/in/kidehen>>
> Web Identities (WebID):
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
<http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i>
<http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
<http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i>>
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
<http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this>
<http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
<http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this>>
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>
<mailto:wikidata@lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>>
To unsubscribe send an email to
wikidata-leave(a)lists.wikimedia.org
<mailto:wikidata-leave@lists.wikimedia.org>
<mailto:wikidata-leave@lists.wikimedia.org
<mailto:wikidata-leave@lists.wikimedia.org>>
--
Samuel Klein @metasj w:user:sj
+1 617 529
4266
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>
To unsubscribe send an email to
wikidata-leave(a)lists.wikimedia.org
<mailto:wikidata-leave@lists.wikimedia.org>
--
*Jerven Tjalling Bolleman*
Principal Software Developer
*SIB | Swiss Institute of Bioinformatics*
1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
t +41 22 379 58 85
Jerven.Bolleman(a)sib.swiss <mailto:Jerven.Bolleman@sib.swiss>
-
www.sib.swiss <http://www.sib.swiss>
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>
To unsubscribe send an email to
wikidata-leave(a)lists.wikimedia.org
<mailto:wikidata-leave@lists.wikimedia.org>
--
Samuel Klein @metasj w:user:sj +1
617 529 4266
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>
To unsubscribe send an email to
wikidata-leave(a)lists.wikimedia.org
<mailto:wikidata-leave@lists.wikimedia.org>
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>
To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
<mailto:wikidata-leave@lists.wikimedia.org>