Announce: New OpenLink Wikidata Snapshot released to AWS Cloud

List overview All Threads
Download

newer

older

Upcoming Wikidata & Wikibase...

Help us review the second round of...

Kingsley Idehen

18 Jul 2022 18 Jul '22

3:29 p.m.

All, We've released a copy of the Wikidata Snapshot that we host to the AWS Cloud. Our hosted Wikidata Snapshot access points include: * https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint * https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing The AWS snapshot release enables the following: 1. Immediate instantiation of a preloaded and pre-configured Wikidata instance for personal-, project-, or service-specific use 2. SPARQL (via SPARQL Query Service Endpoint, Jena or RDF4J providers) and SQL (via iSQL, ODBC, or JDBC) Query Access 3. Native Faceted Search & Exploration 4. Built-in DBpedia cross-references via owl:sameAs relations *Additional Information* * AWS Marketplace Page for Wikidata Snapshot Virtual Machine <https://aws.amazon.com/marketplace/pp/prodview-fi6y6lnzvs6vc> * Wikidata Snapshot (Virtuoso PAGO) EBS-backed EC2 AMI <https://community.openlinksw.com/t/wikidata-snapshot-virtuoso-pago-ebs-backed-ec2-ami/3243> * Twitter Announcement Thread <https://twitter.com/OpenLink/status/1547992642364903428> * LinkedIn Post <https://www.linkedin.com/posts/kidehen_wikidata-knowledgegraph-virtuosordbms-activity-6953752893724254208-Oj4u> -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page:http://www.openlinksw.com Community Support:https://community.openlinksw.com Weblogs (Blogs): Company Blog:https://medium.com/openlink-software-blog Virtuoso Blog:https://medium.com/virtuoso-blog Data Access Drivers Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog:https://medium.com/@kidehen Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest:https://www.pinterest.com/kidehen/ Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter:https://twitter.com/kidehen Google+:https://plus.google.com/+KingsleyIdehen/about LinkedIn:http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

Attachments:

attachment.htm (text/html — 4.3 KB)

Show replies by thread

Kingsley Idehen

11 Jan 11 Jan

5:51 p.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

All, We are pleased to announce immediate availability of an new Virtuoso-hosted Wikidata instance based on the most recent datasets. This instance comprises 17 billion+ RDF triples. Host Machine Info: Item Value CPU |2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz| Cores |24| Memory |378 GB| SSD |4x Crucial M4 SSD 500 GB| Cloud related costs for a self-hosted variant, assuming: * dedicated machine for 1 year without upfront costs * 128 GiB memory * 16 cores or more * 512GB SSD for the database * 3T outgoing internet traffic (based on our DBpedia statistics) vendor machine type memory vCPUs monthly machine monthly disk monthly network monthly total Amazon r5a.4xlarge 128 GiB 16 $479.61 $55.96 $276.48 $812.05 Google e2highmem-16 128 GiB 16 $594.55 $95.74 $255.00 $945.30 Azure D32a 128 GiB 32 $769.16 $38.40 $252.30 $1,060.06 SPARQL Query and Full Text Search service endpoints: * https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint * https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing Additional Information * Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - Announcements - OpenLink Software Community (openlinksw.com) <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> Happy New Year! -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page:http://www.openlinksw.com Community Support:https://community.openlinksw.com Weblogs (Blogs): Company Blog:https://medium.com/openlink-software-blog Virtuoso Blog:https://medium.com/virtuoso-blog Data Access Drivers Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog:https://medium.com/@kidehen Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest:https://www.pinterest.com/kidehen/ Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter:https://twitter.com/kidehen Google+:https://plus.google.com/+KingsleyIdehen/about LinkedIn:http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

Samuel Klein

6:33 p.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

A real new years treat. Thank you for this. 🌍🌏🌎🌑 On Wed, Jan 11, 2023, 2:52 PM Kingsley Idehen via Wikidata < wikidata(a)lists.wikimedia.org> wrote:

...

All, We are pleased to announce immediate availability of an new Virtuoso-hosted Wikidata instance based on the most recent datasets. This instance comprises 17 billion+ RDF triples. Host Machine Info: Item Value CPU 2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz Cores 24 Memory 378 GB SSD 4x Crucial M4 SSD 500 GB Cloud related costs for a self-hosted variant, assuming: - dedicated machine for 1 year without upfront costs - 128 GiB memory - 16 cores or more - 512GB SSD for the database - 3T outgoing internet traffic (based on our DBpedia statistics) vendor machine type memory vCPUs monthly machine monthly disk monthly network monthly total Amazon r5a.4xlarge 128 GiB 16 $479.61 $55.96 $276.48 $812.05 Google e2highmem-16 128 GiB 16 $594.55 $95.74 $255.00 $945.30 Azure D32a 128 GiB 32 $769.16 $38.40 $252.30 $1,060.06 SPARQL Query and Full Text Search service endpoints: - https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint - https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing Additional Information - Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - Announcements - OpenLink Software Community (openlinksw.com) <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> Happy New Year! -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this _______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

Dan Brickley

9:33 p.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

Really cool! :) If anyone has eg student project possibilities, it would be great to see some work on Wikidata SPARQL query portability- eg working through the list at query.wikidata.org, which tend to look like this: SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q146. # Must be of a cat SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language } which won’t work as-is outside of the current Wikidata SPARQL Blazegraph endpoint. Something like this is needed (with a filter for lang too): SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q146; rdfs:label ?itemLabel } I don’t recall where the Wikidata sample queries live (github? Wiki somewhere) but it would be lovely to hear if they could all run on an alternative backend… Dan On Wed, 11 Jan 2023 at 15:52, Kingsley Idehen via Wikidata < wikidata(a)lists.wikimedia.org> wrote:

...

All, We are pleased to announce immediate availability of an new Virtuoso-hosted Wikidata instance based on the most recent datasets. This instance comprises 17 billion+ RDF triples. Host Machine Info: Item Value CPU 2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz Cores 24 Memory 378 GB SSD 4x Crucial M4 SSD 500 GB Cloud related costs for a self-hosted variant, assuming: - dedicated machine for 1 year without upfront costs - 128 GiB memory - 16 cores or more - 512GB SSD for the database - 3T outgoing internet traffic (based on our DBpedia statistics) vendor machine type memory vCPUs monthly machine monthly disk monthly network monthly total Amazon r5a.4xlarge 128 GiB 16 $479.61 $55.96 $276.48 $812.05 Google e2highmem-16 128 GiB 16 $594.55 $95.74 $255.00 $945.30 Azure D32a 128 GiB 32 $769.16 $38.40 $252.30 $1,060.06 SPARQL Query and Full Text Search service endpoints: - https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint - https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing Additional Information - Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - Announcements - OpenLink Software Community (openlinksw.com) <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> Happy New Year! -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this _______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

Larry Gonzalez

12 Jan 12 Jan

5:39 a.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

Dear Kingsley, Let me start saying that I appreciate and thank the effort of loading complete wikidata over a graph database and make and sparql endpoint available. I know it is not an easy task to do I just tried out the new virtuoso-hosted sparql endpoint with some queries. My experiments are not exhaustive at all, but I just wanted to raise two concern that I detected Considering a (very simple) query that count all humans: ''' SELECT (count(?human) as ?c) WHERE { ?human wdt:P31 wd:Q5 . } ''' I get a result of 10396057, which is ok considering the dataset that you are using But if we try to export all instances of human (on a tsv file) with the following query: ''' SELECT ?human WHERE { ?human wdt:P31 wd:Q5 . } ''' Then I only get 100000 results. Is there a limit over the number of results that a query can have? Furthermore, if we want to get all humans ordered by id, then the endpoint times out. The following is the query: ''' SELECT ?human WHERE { ?human wdt:P31 wd:Q5 . } ORDER BY DESC(?human) ''' Thank you again for all your efforts. I am looking forward to see how this new endpoint work, :) Are you planning to update regularly the dataset? All the best! Larry https://iccl.inf.tu-dresden.de/web/Larry_Gonzalez On 11.01.23 21:51, Kingsley Idehen via Wikidata wrote: > All, > > We are pleased to announce immediate availability of an new > Virtuoso-hosted Wikidata instance based on the most recent datasets. > This instance comprises 17 billion+ RDF triples. > > Host Machine Info: > > Item Value > > CPU > > > > |2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz| > > Cores > > > > |24| > > Memory > > > > |378 GB| > > SSD > > > > |4x Crucial M4 SSD 500 GB| > > > Cloud related costs for a self-hosted variant, assuming: > > * > > dedicated machine for 1 year without upfront costs > > * > > 128 GiB memory > > * > > 16 cores or more > > * > > 512GB SSD for the database > > * > > 3T outgoing internet traffic (based on our DBpedia statistics) > > > vendor machine type memory vCPUs monthly machine monthly disk > monthly network monthly total > > Amazon > > > > r5a.4xlarge > > > > 128 GiB > > > > 16 > > > > $479.61 > > > > $55.96 > > > > $276.48 > > > > $812.05 > > Google > > > > e2highmem-16 > > > > 128 GiB > > > > 16 > > > > $594.55 > > > > $95.74 > > > > $255.00 > > > > $945.30 > > Azure > > > > D32a > > > > 128 GiB > > > > 32 > > > > $769.16 > > > > $38.40 > > > > $252.30 > > > > $1,060.06 > > > SPARQL Query and Full Text Search service endpoints: > > * > > https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services > Endpoint > > * > > https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing > > > Additional Information > > * > > Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - > Announcements - OpenLink Software Community (openlinksw.com) > <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> > > > Happy New Year! > > -- > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Home Page:http://www.openlinksw.com > Community Support:https://community.openlinksw.com > Weblogs (Blogs): > Company Blog:https://medium.com/openlink-software-blog > Virtuoso Blog:https://medium.com/virtuoso-blog > Data Access Drivers Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers > > Personal Weblogs (Blogs): > Medium Blog:https://medium.com/@kidehen > Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/ > http://kidehen.blogspot.com > > Profile Pages: > Pinterest:https://www.pinterest.com/kidehen/ > Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen > Twitter:https://twitter.com/kidehen > Google+:https://plus.google.com/+KingsleyIdehen/about > LinkedIn:http://www.linkedin.com/in/kidehen > > Web Identities (WebID): > Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i > :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this > > > _______________________________________________ > Wikidata mailing list -- wikidata(a)lists.wikimedia.org > Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… > To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

Kingsley Idehen

7:45 p.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

On 1/12/23 3:39 AM, Larry Gonzalez wrote:

...

Yes, because these services are primarily for ad-hoc querying rather than wholesale data exports. If you want to export massive amounts of data then you can do so using OFFSET and LIMIT. Alternatively, you can instantiate your own instance in the Azure or AWS cloud and use as you see fit. Like what we provide regarding DBpedia, there's a server side configuration in place for enforcing a "fair use" policy :)

...

Furthermore, if we want to get all humans ordered by id, then the endpoint times out. The following is the query: ''' SELECT ?human WHERE { ?human wdt:P31 wd:Q5 . } ORDER BY DESC(?human) '''

If you set the query timeout to a value over 1000 msecs, the Virtuoso Anytime Query feature will provide you with a partial solution which you can use in conjunction with OFFSET and LIMIT to creative an interactive cursor (or scrollable cursor). Beyond that, its back to the "fair use" policy and option to instantiate your own service-specific instance using our cloud offerings. Regards, Kingsley

...

Thank you again for all your efforts. I am looking forward to see how this new endpoint work, :) Are you planning to update regularly the dataset? All the best! Larry https://iccl.inf.tu-dresden.de/web/Larry_Gonzalez On 11.01.23 21:51, Kingsley Idehen via Wikidata wrote:

-- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

Jerven Tjalling Bolleman

13 Jan 13 Jan

6:01 a.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

...

On 1/12/23 3:39 AM, Larry Gonzalez wrote:

Furthermore, if we want to get all humans ordered by id, then the endpoint times out. The following is the query: ''' SELECT ?human WHERE { ?human wdt:P31 wd:Q5 . } ORDER BY DESC(?human) '''

Sandra Fauconnier

8 a.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

Samuel Klein

18 Jan 18 Jan

3:55 p.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

...

Hi All, Regarding these FAIR use settings. They are tuneable and maybe turned off, so the specific values that Openlink uses may or may not be used if wikidata would host itself a virtuoso instance. e.g. for sparql.uniprot.org you are unlikely to run into these limits (as the values are set very high indeed) and are more likely to suffer from settings around the http layer that limit query run time due to connection issues. Regards, Jerven On 1/12/23 11:45 PM, Kingsley Idehen via Wikidata wrote: On 1/12/23 3:39 AM, Larry Gonzalez wrote: Dear Kingsley, Let me start saying that I appreciate and thank the effort of loading complete wikidata over a graph database and make and sparql endpoint available. I know it is not an easy task to do I just tried out the new virtuoso-hosted sparql endpoint with some queries. My experiments are not exhaustive at all, but I just wanted to raise two concern that I detected Considering a (very simple) query that count all humans: ''' SELECT (count(?human) as ?c) WHERE { ?human wdt:P31 wd:Q5 . } ''' I get a result of 10396057, which is ok considering the dataset that you are using But if we try to export all instances of human (on a tsv file) with the following query: ''' SELECT ?human WHERE { ?human wdt:P31 wd:Q5 . } ''' Then I only get 100000 results. Is there a limit over the number of results that a query can have? Yes, because these services are primarily for ad-hoc querying rather than wholesale data exports. If you want to export massive amounts of data then you can do so using OFFSET and LIMIT. Alternatively, you can instantiate your own instance in the Azure or AWS cloud and use as you see fit. Like what we provide regarding DBpedia, there's a server side configuration in place for enforcing a "fair use" policy :) Furthermore, if we want to get all humans ordered by id, then the endpoint times out. The following is the query: ''' SELECT ?human WHERE { ?human wdt:P31 wd:Q5 . } ORDER BY DESC(?human) ''' If you set the query timeout to a value over 1000 msecs, the Virtuoso Anytime Query feature will provide you with a partial solution which you can use in conjunction with OFFSET and LIMIT to creative an interactive cursor (or scrollable cursor). Beyond that, its back to the "fair use" policy and option to instantiate your own service-specific instance using our cloud offerings. Regards, Kingsley Thank you again for all your efforts. I am looking forward to see how this new endpoint work, :) Are you planning to update regularly the dataset? All the best! Larry https://iccl.inf.tu-dresden.de/web/Larry_Gonzalez On 11.01.23 21:51, Kingsley Idehen via Wikidata wrote: All, We are pleased to announce immediate availability of an new Virtuoso-hosted Wikidata instance based on the most recent datasets. This instance comprises 17 billion+ RDF triples. Host Machine Info: Item Value CPU |2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz| Cores |24| Memory |378 GB| SSD |4x Crucial M4 SSD 500 GB| Cloud related costs for a self-hosted variant, assuming: * dedicated machine for 1 year without upfront costs * 128 GiB memory * 16 cores or more * 512GB SSD for the database * 3T outgoing internet traffic (based on our DBpedia statistics) vendor machine type memory vCPUs monthly machine monthly disk monthly network monthly total Amazon r5a.4xlarge 128 GiB 16 $479.61 $55.96 $276.48 $812.05 Google e2highmem-16 128 GiB 16 $594.55 $95.74 $255.00 $945.30 Azure D32a 128 GiB 32 $769.16 $38.40 $252.30 $1,060.06 SPARQL Query and Full Text Search service endpoints: * https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint * https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing Additional Information * Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - Announcements - OpenLink Software Community (openlinksw.com) <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> Happy New Year! -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page:http://www.openlinksw.com Community Support:https://community.openlinksw.com Weblogs (Blogs): Company Blog:https://medium.com/openlink-software-blog Virtuoso Blog:https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog:https://medium.com/@kidehen Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest:https://www.pinterest.com/kidehen/ Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter:https://twitter.com/kidehen Google+:https://plus.google.com/+KingsleyIdehen/about LinkedIn:http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this _______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org _______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org _______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org _______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

-- Samuel Klein @metasj w:user:sj +1 617 529 4266

Jerven Tjalling Bolleman

19 Jan 19 Jan

5:04 a.m.

New subject: Announce: New OpenLink Virtuoso hosted Wikidata Knowledge Graph Release

Hi Samuel, The current approach is not great and could be improved a lot. So we just set a very high server side http timeout. This helps some but not all clients. We don't predict anything we just let it happen. Virtuoso can predict these kind of timeouts but like all query planning this is not perfect and we found it more helpful to just try to run the queries and see how for we get. This is because sparql.uniprot.org is aimed to help people answer hard analytical style queries and these just take time and resources to run. Also we can't predict load as demand is external so a simple query might timeout because a different complicated query is using to many of the resources in the server. What we would like to do instead is more complicated. We should take the query send it to a queue for running generating a query-id. Then send a 303 redirect to the new location with the query-id. Where the query results are paged to disk/named pipe. Then just do the same with retry-after 10second headers, then increasing to one minute. Then we would know if people are actually interested in waiting. Once we have results we can reply with a 200 ok and start sending results. For us this involves a lot of engineering work so we never went that way. But it is the optimal solution, of not doing work that is not wanted. Because if the client stopped asking for the new location within the time limit plus a margin we know they went away and we can stop the query. This can also play nice with a http cache layer. The main limitation here was the requirement for a distributed key-value store to store these query-id's and results/named-pipes and the need for in between sparql server communication. We did not have time to implement this in the past, but maybe in the future :) Regards, Jerven On 1/18/23 19:55, Samuel Klein wrote:

...

Ah, thanks Jerven. How do you deal with http-layer query timeouts? Are you able to predict them for certain common queries rather than waiting for the timeout to hit? S On Fri, Jan 13, 2023 at 4:02 AM Jerven Tjalling Bolleman <jerven.bolleman(a)sib.swiss> wrote: Hi All, Regarding these FAIR use settings. They are tuneable and maybe turned off, so the specific values that Openlink uses may or may not be used if wikidata would host itself a virtuoso instance. e.g. for sparql.uniprot.org <http://sparql.uniprot.org> you are unlikely to run into these limits (as the values are set very high indeed) and are more likely to suffer from settings around the http layer that limit query run time due to connection issues. Regards, Jerven On 1/12/23 11:45 PM, Kingsley Idehen via Wikidata wrote:

On 1/12/23 3:39 AM, Larry Gonzalez wrote:

Furthermore, if we want to get all humans ordered by id, then the endpoint times out. The following is the query: ''' SELECT ?human WHERE { ?human wdt:P31 wd:Q5 . } ORDER BY DESC(?human) '''

All, We are pleased to announce immediate availability of an new Virtuoso-hosted Wikidata instance based on the most recent datasets. This instance comprises 17 billion+ RDF triples. Host Machine Info: Item Value CPU |2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz| Cores |24| Memory |378 GB| SSD |4x Crucial M4 SSD 500 GB| Cloud related costs for a self-hosted variant, assuming: * dedicated machine for 1 year without upfront costs * 128 GiB memory * 16 cores or more * 512GB SSD for the database * 3T outgoing internet traffic (based on our DBpedia statistics) vendor machine type memory vCPUs monthly machine monthly disk monthly network monthly total Amazon r5a.4xlarge 128 GiB 16 $479.61 $55.96 $276.48 $812.05 Google e2highmem-16 128 GiB 16 $594.55 $95.74 $255.00 $945.30 Azure D32a 128 GiB 32 $769.16 $38.40 $252.30 $1,060.06 SPARQL Query and Full Text Search service endpoints: * https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services Endpoint * https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing Additional Information * Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - Announcements - OpenLink Software Community (openlinksw.com <http://openlinksw.com>) <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> Happy New Year! -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page:http://www.openlinksw.com Community Support:https://community.openlinksw.com Weblogs (Blogs): Company Blog:https://medium.com/openlink-software-blog Virtuoso Blog:https://medium.com/virtuoso-blog Data Access Drivers Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog:https://medium.com/@kidehen Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest:https://www.pinterest.com/kidehen/ Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter:https://twitter.com/kidehen Google+:https://plus.google.com/+KingsleyIdehen/about LinkedIn:http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this _______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

_______________________________________________ Wikidata mailing list -- wikidata(a)lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me… To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org -- Samuel Klein @metasj w:user:sj +1 617 529 4266 _______________________________________________ Wikidata mailing list --wikidata(a)lists.wikimedia.org Public archives athttps://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/… To unsubscribe send an email towikidata-leave(a)lists.wikimedia.org

461

days inactive

646

days old

wikidata@lists.wikimedia.org

Manage subscription

9 comments

6 participants

tags (0)

participants (6)

Dan Brickley
Jerven Tjalling Bolleman
Kingsley Idehen
Larry Gonzalez
Samuel Klein
Sandra Fauconnier