Wikidata-tech November 2015

wikidata-tech@lists.wikimedia.org

6 participants
8 discussions

Re: [Wikidata-tech] Wikidata-tech Digest, Vol 31, Issue 8

by Christopher Johnson

OK, I try to make this clear. The use case is to be able to simply identify with a SPARQL query and/or count "unreferenced statements" using fastRangeCount. How do we do this with the current implementation? What this gains is a useful method for measuring data quality that does not exist now (as far as I can understand). And, it could also provide a stable URI (UUID) for references that would fix the questionable use of the "unstable" reference hash as a resource URI and facilitate the implementation of reusability for them. Thanks, Christopher On 30 November 2015 at 13:00, <wikidata-tech-request(a)lists.wikimedia.org> wrote: > Send Wikidata-tech mailing list submissions to > wikidata-tech(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > or, via email, send a message with subject or body 'help' to > wikidata-tech-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wikidata-tech-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata-tech digest..." > > > Today's Topics: > > 1. Re: Wikidata-tech Digest, Vol 31, Issue 5 (Stas Malyshev) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 29 Nov 2015 16:54:29 -0800 > From: Stas Malyshev <smalyshev(a)wikimedia.org> > To: Wikidata technical discussion <wikidata-tech(a)lists.wikimedia.org> > Subject: Re: [Wikidata-tech] Wikidata-tech Digest, Vol 31, Issue 5 > Message-ID: <565B9E45.6090106(a)wikimedia.org> > Content-Type: text/plain; charset=utf-8 > > Hi! > > > In Blazegraph, this could be supported by Quads or RDR (Reification Done > > Right). > > We considered using RDR but decided against it because RDR is not > standard and existing tools and libraries would not understand it. So in > the interest of better data integration we decided to use regular RDF > representation that can be queries by standard SPARQL. > > > One possible approach using triples for the use case could be to assign > > a blank node to a reference placeholder and introduce the valid range > > class for prov:wasDerivedFrom (prov:entity) with the canonical reference > > UUID like this: > > I'm not sure I understand - what would doing this earn us? This looks > like just adding one more join to the lookups. > -- > Stas Malyshev > smalyshev(a)wikimedia.org > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Wikidata-tech mailing list > Wikidata-tech(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > > > ------------------------------ > > End of Wikidata-tech Digest, Vol 31, Issue 8 > ******************************************** >

8 years, 4 months

Re: [Wikidata-tech] Wikidata-tech Digest, Vol 31, Issue 5

by Christopher Johnson

Thank you for the explanation. The content negotion for an Item IRI is clear. Any request for http://www.wikidata.org/entity/Q... requires an Accept application/rdf+xml header in order to get the RDF. The default response is JSON and Accept text/html returns a 200 response delivering the UI page. For statement resolution in the Item RDF, is not this a fragment? So in the Item context, the IRI for a statement resource would be http://www.wikidata.org/entity/Q16521#Statement_UUID. Otherwise, the statement IRI http://www.wikidata.org/entity/statement/Statement_UUID could just return the statement as a separate entity. On the topic of references, a use case is to measure data quality by counting the number of "unreferenced statements". At https://phabricator.wikimedia.org/T117234#1834728, I propose the possibility of using blank reference nodes to identify these "bad" statements. Having an object to count greatly expedites the query process because of the estimated cardinality feature of Blazegraph. The only alternative to this is to count distinct statements with the prov:wasDerivedFrom predicate, and this is extremely slow (in fact, it may not be possible without a huge amount of memory). I do not know what would be involved in implementing blank reference nodes and what performance consequences may also occur. It seems to me that the pairing of statements and references is a core feature of the data model, and it is odd that there can exist statements that have no associated reference node in the RDF. Cheers, Christopher On 27 November 2015 at 13:00, <wikidata-tech-request(a)lists.wikimedia.org> wrote: > Send Wikidata-tech mailing list submissions to > wikidata-tech(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > or, via email, send a message with subject or body 'help' to > wikidata-tech-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wikidata-tech-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata-tech digest..." > > > Today's Topics: > > 1. RDF Item, Statement and Reference IRI Resolution? > (Christopher Johnson) > 2. Re: RDF Item, Statement and Reference IRI Resolution? > (Markus Krötzsch) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 27 Nov 2015 07:21:10 +0100 > From: Christopher Johnson <christopher.johnson(a)wikimedia.de> > To: wikidata-tech(a)lists.wikimedia.org, wikimedia-de-tech > <wikimedia-de-tech(a)wikimedia.de> > Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI > Resolution? > Message-ID: > <CACzuuKvGK1dM1+dn4ypocjhO= > psuk4LLtWngZp1yFVP6wmVqFA(a)mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > After looking at the RDF format closely, I am asking if the item, statement > and reference IRIs could/should be directly resolvable to XML/JSON > formatted resources. > > It seems that currently http://www.wikidata.org/entity/.... redirects to > the UI at https://www.wikidata.org/wiki/ which is not what a machine > reader > would expect. > Without a simple method to resolve the IRIs (perhaps a RESTful API?), these > RDF data objects are opaque for parsers. > > Of course, with wbgetclaims, it is possible to get the statement like this: > > https://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&claim=Q209… > > but the API expected GUID format does not match the RDF UUID representation > (there is a $ or "%24" after the item instead of a -) and it returns both > the statement and the references. > > Since the reference is its own node in the RDF, it can be queried > independently. For example, to ask "return all of the statements where > reference R is bound." But then, the return value is a list of statement > IDs and a subquery or separate query is then required to return the > associated statement node. > > I am also wondering why item, statement and reference "UUIDs" are not in > canonical format in the RDF. This is a question of compliance with IETF > guidelines, which may or may not be relevant. > > Item: Q20913766 > Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 > Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9 > > See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format > See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml > and http://tools.ietf.org/html/rfc4122 for information on urn:uuid > guidelines. > > Thanks for your feedback, > Christopher >

8 years, 4 months

RDF Item, Statement and Reference IRI Resolution?

by Christopher Johnson

Hi, After looking at the RDF format closely, I am asking if the item, statement and reference IRIs could/should be directly resolvable to XML/JSON formatted resources. It seems that currently http://www.wikidata.org/entity/.... redirects to the UI at https://www.wikidata.org/wiki/ which is not what a machine reader would expect. Without a simple method to resolve the IRIs (perhaps a RESTful API?), these RDF data objects are opaque for parsers. Of course, with wbgetclaims, it is possible to get the statement like this: https://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&claim=Q209… but the API expected GUID format does not match the RDF UUID representation (there is a $ or "%24" after the item instead of a -) and it returns both the statement and the references. Since the reference is its own node in the RDF, it can be queried independently. For example, to ask "return all of the statements where reference R is bound." But then, the return value is a list of statement IDs and a subquery or separate query is then required to return the associated statement node. I am also wondering why item, statement and reference "UUIDs" are not in canonical format in the RDF. This is a question of compliance with IETF guidelines, which may or may not be relevant. Item: Q20913766 Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9 See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml and http://tools.ietf.org/html/rfc4122 for information on urn:uuid guidelines. Thanks for your feedback, Christopher

8 years, 5 months

Using the Role Object Pattern to represent derived information in the data model

by Daniel Kinzler

Hi all! For weeks and months now, we have been discussion how to best represent "extra" information in (or associated with) the wikibase data model. After some more discussion and a bit of research, I think I have found what we need: The Role Object Pattern aka Role Class Model, see <https://en.wikipedia.org/wiki/Role_Class_Model>. Please have a look at https://phabricator.wikimedia.org/T118860 and let me know if you have any objections. If not, let's use this sprint to discuss the details of the implementations, and do a task breakdown. PS: I came across quite a few famous names when during my research. Looks like we are not first in having this need... -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

8 years, 5 months

EntityStore 1.0 and TermStore 1.0 released

by Jeroen De Dauw

Hey all, I've created two new small PHP libraries that provide persistence and lookup services for Wikibase data. You can read about them here http://www.bn2vs.com/blog/2015/11/14/entitystore-and-termstore-for-wikibase… Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate ~=[,,_,,]:3

8 years, 5 months

Dispatch Lag dashboard

by Addshore

Hi all! I posted this on twitter and IRC already! But I made a lovely new dashboard showing all things dispatchy on Wikidata. https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch Enjoy! And if you want anything added let me know! -- Addshore

8 years, 5 months

Re: [Wikidata-tech] [Wikidata] how to map other identifiers to Wikidata entity IDs

by Daniel Kinzler

Am 09.11.2015 um 03:26 schrieb S Page: > I think these other identifiers are all "Wikidata property representing a unique > identifier" and there are about 350 of them [2] But surprisingly, I couldn't > find an easy way to look up a Wikidata item using these other identifiers. We discussed some loose plans for implementing this in Currus when Stas was in Berlin a few weeks ago. On Special:Search, you would ask for property:P212:978-2-07-027437-6, and that would find the item with that ISBN. Stas: do we have a ticket for this somewhere? All I can find are the notes in the etherpad. > Also, is this a temporary thing? Will Wikidata eventually have items for every > book published, every musical recording, etc. and become a superset of all those > unique identifiers? It's highly unlikely that wikidata will become a superset of any and all vocuabularies in existance. Better integration of external identifiers is high on our priority list right now. The first step will however be to property expose URIs for them, so we are no longer a dead end in the linked data web. But since we need to work on Cirrus integration anyway, I expect that we will have search-by-property soonish, too. I certrainly hope so. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

8 years, 5 months

Wikidata JSON Dump Reader

by Jeroen De Dauw

Hey all, I've created a small PHP library for reading from the JSON dumps. http://www.bn2vs.com/blog/2015/11/08/wikidata-wikibase-json-dump-reader/ https://github.com/JeroenDeDauw/JsonDumpReader Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate ~=[,,_,,]:3

8 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech November 2015