This query times out:
SELECT ?item ?label
WHERE
{
?item wdt:P31 ?instance ;
rdfs:label ?label ;
rdfs:label ?enLabel .
FILTER(CONTAINS(lcase(?label), "Soriano")).
FILTER(?instance != wd:Q5).
SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100
I have this feeling that it's not actually using an index or even asking
the right question and so is slow and times out?
However the MediaWiki wbsearchentities API does seem to use an index and is
performant for label searching:
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=soriano&l…
How can I get my SPARQL query to be more performant or asking the right
question?
(BTW, once I have some answers, I intend to make this page a bit more
useful with that information for other users ...
https://www.wikidata.org/wiki/Wikidata:Data_access)
Thad
https://www.linkedin.com/in/thadguidry/
Hello all,
I'm writing at the recommendation of Mairelys Lemus-Rojas after I
approached her with the below inquiry and exchanged some emails about it.
I was wondering if anyone was familiar with a semantic/linked data capable
content management system or blog that has autofill or nanotation
capabilities. What I mean by that is, say I'm writing a blog post about
Paris, I'm looking for something that would autofill linked data 'under the
hood' by either a dropdown (a la Omeka's Value Suggest
<https://omeka.org/s/modules/ValueSuggest/>), a autofill (a la
wikidata/pedia) or something that creates semantic blog tags.
I've seen a (very) bleeding-edge technology/proof of concept called
nanotation <http://kidehen.blogspot.com/2014/07/nanotation.html> that looks
about right, but might be completely different then what I actually want,
which is to find something that incorporates linked data, autofills URIs,
and works like a blog/content management system.
So far I've explored
-
*Recogito* (https://recogito.pelagios.org/) is lovely but focused on
annotating images/maps/preexisting items.
-
*Catma* (https://catma.de/) is lovely looking but builds off preexisting
texts, not creating new texts (i.e. you'd have to write the text and then
annotate it all.). It seems to be a Voyant on steroids. Nonetheless if I
could combine Recogito and Catma, that'd be neat. The same program (?
project?) also puts out forText (https://fortext.net/), which i just
include here as it's also nice.
-
*dokie.li <http://dokie.li>* (https://dokie.li/) This seems the closest,
as it's focused on article publishing, annotations and social interactions,
but unfortunately, setting up a Solid Server remains quite the technical
hurdle for me
-
*Atomgraph* (https://atomgraph.com/) is knowledge graph oriented and
installed upon previously-existing data, not focused on content management.
Gephi on steroids.
-
*Webanno* (https://webanno.github.io/webanno/) which is specifically
targeted at linguistically annotating the internet, not really creating
content.
-
*Wikibase*: A heavily modified wikibase might be what I'm left with. In
this scenario I'd make a Mediawiki, turn it into Wikibase, and kinda hack a
blog out of it. Less than satisfying but would work if needed.
-
I also tried *wiki.js* (SUCH A NICE INTERFACE, but it doesn't support
linked data yet) and *OntoWiki* (which looks like it also builds off a
preexisting knowledge graph)
-
*Anthologize*: (https://anthologize.org/) also looks very close as a
wordpress plugin but it is not linked-data specific so I didn't explore
ways to make it so.
-
I've also explored *wordpress*
<https://wordpress.org/plugins/wp-linked-data/> and Drupal plugins (one
<https://www.drupal.org/project/ldp>, two
<https://www.drupal.org/project/linked_data>, three
<https://www.drupal.org/project/ldt>) that are all obsolete or not
maintained anymore
My longterm goal with this is to create semantic libguides and blogs. I
really do think semantic libguides are NEARLY possible—maybe an API that
pulls knowledge graphs along and wikidata visualizations, along with some
blog-type software... I think it could be done, and I have some bits and
pieces of it, but not quite the whole sandwich (so to speak).
I'm partially doing this with an ALA grant I got for www.histsex.com (soon
to be www.histsex.org just in case you're clicking that in a week or so!).
This "bibliography" is all in omeka and it works effectively *like* a
libguide, but will need further plugins to make it all work as desired, so
I continue to investigate alternatives.
Perhaps this is something that a grant will be needed to do in a broader
way? Or is there something obvious I've missed here?
Thank you all for your time!
--
*BRIAN M. WATSON *they/them
twitter <https://twitter.com/brimwats> - website <https://brimwats.com/>
PhD: UBC SLAIS <https://slais.ubc.ca/>
Director: HistSex.org <https://histsex.com/>
Editorial Board: Homosaurus <http://homosaurus.org/about>
Hi,
I have to launch 2 million queries against a Wikidata instance.
I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with RAID 0).
The queries are simple, just 2 types.
select ?s ?p ?o {
?s ?p ?o.
filter (?s = ?param)
}
select ?s ?p ?o {
?s ?p ?o.
filter (?o = ?param)
}
If I use a Java ThreadPoolExecutor takes 6 hours.
How can I speed up the queries processing even more?
I was thinking :
a) to implement a Virtuoso cluster to distribute the queries or
b) to load Wikidata in a Spark dataframe (since Sansa framework is
very slow, I would use my own implementation) or
c) to load Wikidata in a Postgresql table and use Presto to distribute
the queries or
d) to load Wikidata in a PG-Strom table to use GPU parallelism.
What do you think? I am looking for ideas.
Any suggestion will be appreciated.
Best,
Hello all!
We are happy to announce the availability of Wikimedia Commons Query
Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons
(SDoC) dataset. This endpoint can federate with WDQS. More work is needed
as we iterate on the service, but feel free to begin using the endpoint.
Known limitations are listed below:
* The service is a beta endpoint that is updated via weekly dumps. Some
caveats include limited performance, expected downtimes, and no interface,
naming, or backward compatibility stability guarantees.
* The service is hosted on Wikimedia Cloud Services, with limited
resources and limited monitoring. This means there may be random unplanned
downtime.
The data will be reloaded weekly from dumps. The service will be down
during data reload. With the current amount of SDoC data, downtime will
last approximately 4 hours, but this may increase as SDoC data grows.
* Due to an issue with the dump format, the data currently only dates
back to July 5th. We’re working on getting more up-to-date data and hope to
have a solution soon. (https://phabricator.wikimedia.org/T258507 and
https://phabricator.wikimedia.org/T258474)
* The MediaInfo concept URIs (e.g.
http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may
change these to HTTPS in the near future. Please comment on T258590 if you
have concerns about this change.
* The service is restricted behind OAuth authentication, backed by Commons.
You will need an account on Commons to access the service. This is so that
we can contact abusive bots and/or users and block them selectively as a
last resort if needed.
* Please note that to correctly logout of the service, you need to use
the logout link in WCQS - logging out of just Wikimedia Commons will not
work for WCQS. This limitation will be lifted once we move to production.
* No documentation on the service is available yet. In particular, no
examples are provided yet. You can add your own examples at
https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exa…
following the format at
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
.
* Please use the SPARQL template. Note that while there is currently a
bug that doesn’t allow us to change the “Try it!” link endpoint, the
examples will be displayed correctly on the WCQS GUI.
* WCQS is a work in progress and some bugs are to be expected, especially
related to generalizing WDQS to fit SDoC data. For example, current bugs
include:
* URI prefixes specific for SDoC data don’t yet work - you need to use
full URIs if you want to query using them. Relations and Q items are
defined by Wikidata’s URI prefixes, so they work correctly.
* Autocomplete for SDoC items doesn’t work - without prefixes they’d be
unusable anyway, but additional work will be required after we inject SDoC
URI prefixes into WCQS GUI.
* If you find any additional bugs or issues, please report them via
Phabricator with the tag wikidata-query-service.
* We do plan to move the service to production, but we don’t have a
timeline on that yet. We want to emphasize that while we do expect a SPARQL
endpoint to be part of a medium to long-term solution, it will only be part
of that solution. Even once the service is production-ready, it will still
have limitations in terms of timeouts, expensive queries, and federation.
Some use cases will need to be migrated, over time, to better solutions -
once those solutions exist.
Have fun!
Guillaume
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET
Hello!
Following up on my previous email request for information on linked data
content managers (collected here https://www.wikidata.org/wiki/User:Sj/LDCM
).
A number of folks expressed interest in the possibility of working on a
plugin for wiki.js that would use javascript and such. Additionally, I
discovered that the MSUL digital repository created a AJAX script to fetch
wiki information to their subject headings in their content manager.
MAUL has graciously provided me a copy.
Li Song (copied here) and I have been laying out some ideas and development
for the possibility of creating a wiki.js plugin that would use javascript
and fetch information. I thought I would follow up here for anyone that was
interested in contributing/collaborating on code or ideas.
Feel free to reach me at bmwatson1989(a)gmail.com!
b
--
*BRIAN M. WATSON *they/them
twitter <https://twitter.com/brimwats> - website <https://brimwats.com/>
PhD: UBC SLAIS <https://slais.ubc.ca/>
Director: HistSex.org <https://histsex.com/>
Editorial Board: Homosaurus <http://homosaurus.org/about>
Hi all,
We experienced WDQS service disruptions on 2020/07/23. As a result there
was a full outage (inability to respond to all queries) for a period of
several minutes, and a more extended period of intermittently degraded
service (inability to respond to a subset of queries) for 1-2 hours.
The full incident report is available here:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20200723-wdqs-ou…
Ultimately, we traced the proximate cause to a series of non-performant
queries, which caused a deadlock in blazegraph, the backend for WDQS. We
have placed a temporary block on the IP address in question and are taking
steps to better define service availability expectations as well as
processes to make detection of these events more streamlined going forward.
Greetings everyone,
The PCC<https://www.loc.gov/aba/pcc/>, an international cooperative cataloging effort for library collections, is launching a Wikidata Pilot to further advance the movement toward identity management. Stated broadly in its Strategic Directions document, the PCC hopes to “Accelerate the movement toward ubiquitous identifier creation and identity management at the network level … attain an environment where identity management work activity is characterized by much greater proportions and numbers of entities receiving identifiers … strategic partnerships and collaboration existing among cultural heritage organizations, rights management agencies, Wikidata, and others … collaborate with other identity management communities to facilitate and promote the use of unique identifiers.”
More specifically, this Pilot is anticipated to involve
• Comparing ease of use and benefits of Wikidata to other registries (LCNAF, ISNI)
• Assessing the productivity and quality assurance tools that exist (or should exist)
• Learning about the culture of the Wikidata community
The upcoming Pilot was featured in the LD4 Wikidata Affinity Group meeting of June 16 and more background information and discussion can be found in the presentation recording<https://stanford.zoom.us/rec/share/_eAtNuzb_HNLcK_97GzcBJ95MN2-T6a8hHRI-PYO…>, slides<https://docs.google.com/presentation/d/1NpkAQdGGft1Wi2vX0zgMtIxwXWjPq96NtXx…>, and notes<https://docs.google.com/document/d/1z1SSAp4c4tftOGW3BbJ6Fxfd8oRIhfzveh0zjeb…>.
Participants can choose to experiment in a range of focus areas based on what is of interest to their own institution, sharing their findings without each being required to delve into all the areas that are covered by the pilot. Projects of any size, however small or large, and at any stage of progress are welcome. The PCC invites interested institutions (both PCC and non-PCC) to participate by completing a short survey<https://forms.gle/5VEHS8sbQbG1JyQa9> describing their project and the issues of interest to them. Initial expressions of interest by the end of July will allow the Pilot to get underway with a kick off meeting in early August. We will solicit firm commitments for ongoing participation at a later date.
The pilot is anticipated to last about 12 months. If you have questions, please write to John Riemer<mailto:jriemer@library.ucla.edu> or Michelle Durocher<mailto:durocher@fas.harvard.edu>.
Hilary Thorsen, on behalf of the PCC Task Group on Identity Management in NACO
Hilary Thorsen
Resource Sharing Librarian
Stanford Libraries
thorsenh(a)stanford.edu
650-285-9429