On 1/12/23 3:39 AM, Larry Gonzalez wrote:
Dear Kingsley,
Let me start saying that I appreciate and thank the effort of loading
complete wikidata over a graph database and make and sparql endpoint
available. I know it is not an easy task to do
I just tried out the new virtuoso-hosted sparql endpoint with some
queries. My experiments are not exhaustive at all, but I just wanted
to raise two concern that I detected
Considering a (very simple) query that count all humans:
'''
SELECT (count(?human) as ?c)
WHERE
{
?human wdt:P31 wd:Q5 .
}
'''
I get a result of 10396057, which is ok considering the dataset that
you are using
But if we try to export all instances of human (on a tsv file) with
the following query:
'''
SELECT ?human
WHERE
{
?human wdt:P31 wd:Q5 .
}
'''
Then I only get 100000 results. Is there a limit over the number of
results that a query can have?
Yes, because these services are primarily for ad-hoc querying rather
than wholesale data exports. If you want to export massive amounts of
data then you can do so using OFFSET and LIMIT.
Alternatively, you can instantiate your own instance in the Azure or AWS
cloud and use as you see fit.
Like what we provide regarding DBpedia, there's a server side
configuration in place for enforcing a "fair use" policy :)
Furthermore, if we want to get all humans ordered by id, then the
endpoint times out. The following is the query:
'''
SELECT ?human
WHERE
{
?human wdt:P31 wd:Q5 .
}
ORDER BY DESC(?human)
'''
If you set the query timeout to a value over 1000 msecs, the Virtuoso
Anytime Query feature will provide you with a partial solution which you
can use in conjunction with OFFSET and LIMIT to creative an interactive
cursor (or scrollable cursor). Beyond that, its back to the "fair use"
policy and option to instantiate your own service-specific instance
using our cloud offerings.
Regards,
Kingsley
Thank you again for all your efforts. I am looking forward to see how
this new endpoint work, :)
Are you planning to update regularly the dataset?
All the best!
Larry
https://iccl.inf.tu-dresden.de/web/Larry_Gonzalez
On 11.01.23 21:51, Kingsley Idehen via Wikidata wrote:
All,
We are pleased to announce immediate availability of an new
Virtuoso-hosted Wikidata instance based on the most recent datasets.
This instance comprises 17 billion+ RDF triples.
Host Machine Info:
Item Value
CPU
|2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz|
Cores
|24|
Memory
|378 GB|
SSD
|4x Crucial M4 SSD 500 GB|
Cloud related costs for a self-hosted variant, assuming:
*
dedicated machine for 1 year without upfront costs
*
128 GiB memory
*
16 cores or more
*
512GB SSD for the database
*
3T outgoing internet traffic (based on our DBpedia statistics)
vendor machine type memory vCPUs monthly machine
monthly disk monthly network monthly total
Amazon
r5a.4xlarge
128 GiB
16
$479.61
$55.96
$276.48
$812.05
Google
e2highmem-16
128 GiB
16
$594.55
$95.74
$255.00
$945.30
Azure
D32a
128 GiB
32
$769.16
$38.40
$252.30
$1,060.06
SPARQL Query and Full Text Search service endpoints:
*
https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services
Endpoint
*
https://wikidata.demo.openlinksw.com/fct -- Faceted Search &
Browsing
Additional Information
*
Loading the Wikidata dataset 2022/12 into Virtuoso Open Source -
Announcements - OpenLink Software Community (
openlinksw.com)
<https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580>
Happy New Year!
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home
Page:http://www.openlinksw.com
Community
Support:https://community.openlinksw.com
Weblogs (Blogs):
Company
Blog:https://medium.com/openlink-software-blog
Virtuoso
Blog:https://medium.com/virtuoso-blog
Data Access Drivers
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium
Blog:https://medium.com/@kidehen
Legacy
Blogs:http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest:https://www.pinterest.com/kidehen/
Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:https://twitter.com/kidehen
Google+:https://plus.google.com/+KingsleyIdehen/about
LinkedIn:http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me…
To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me…
To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page:
http://www.openlinksw.com
Community Support:
https://community.openlinksw.com
Weblogs (Blogs):
Company Blog:
https://medium.com/openlink-software-blog
Virtuoso Blog:
https://medium.com/virtuoso-blog
Data Access Drivers Blog:
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium Blog:
https://medium.com/@kidehen
Legacy Blogs:
http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest:
https://www.pinterest.com/kidehen/
Quora:
https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:
https://twitter.com/kidehen
Google+:
https://plus.google.com/+KingsleyIdehen/about
LinkedIn:
http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal:
http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
:
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this