Re: [Wikidata] ntriples dump?

27 Aug 2016

I also support the creation of .nt dumps as they are far easier to process than .ttl
ones.

If compressed .nt dumps are less than 20% bigger than .ttl ones I don't see the point
of keeping .ttl dumps as Ntriples files could be parsed with a Turtle parser.

Having provenance informations as suggested by Aidan is definitely a good idea. For
triples shared by multiple pages (probably only complex data values descriptions because
property descriptions could have as context the document describing the property) there
are two possible ways:
- use as context the one of the entity using it. It would lead to have as many quads as
entities using the value.
- use a "special" context. As the description of values should not change, there
is no need to be able to retrieve new content about them in the future. But it leads to
the creation of a probably not dereferenceable context IRI.

Cheers,

Thomas

...
  Le 27 août 2016 à 11:05, Dimitris Kontokostas
&lt;jimkont(a)gmail.com&gt; a écrit :

 Hi Stats,

 out of curiosity, can you give an example of triples that do not originate from a single
wikidata item / property?

 for me turtle dumps are process-able only by RDF tools while nt-like dumps both by rdf
tools and other kind of scripts and I fild the former redundant

 On Fri, Aug 26, 2016 at 11:52 PM, Stas Malyshev &lt;smalyshev(a)wikimedia.org&gt; wrote:
 Hi!

  Of course if providing both is easy, then
there's no reason not to
 provide both.  
 Technically it's quite easy - you just run the same script with
 different options. So the only question is what is useful.

  It is useful in such applications to know the
online RDF documents in
 which a triple can be found. The document could be the entity, or it
 could be a physical location like:

 http://www.wikidata.org/entity/Q13794921.ttl  
 That's where the tricky part is: many triples won't have specific
 document there since they may appear in many documents. Of course, if
 you merge all these documents in a dump, the triple would appear only
 once (we have special deduplication code to take care of that) but it's
 impossible to track it back to a specific document then. So I understand
 the idea, and see how it may be useful, but I don't see a real way to
 implement it now.

 --
 Stas Malyshev
 smalyshev(a)wikimedia.org

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

 --
 Kontokostas Dimitris
 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] ntriples dump?