Hi Tom,
On 10/08/13 15:55, Tom Morris wrote:
Given your "educating" people about software engineering principles, this may fall on deaf ears, but I too have a strong preference for the format with an independent line per triple.
No worries. The eventual RDF export of Wikidata will most certainly have this (and any other standard format one could want). If you need NTriples export earlier, but do not want to use a second tool for this, then you could modify the triple writing methods in the python script as I suggested a few emails ago.
On Sat, Aug 10, 2013 at 8:35 AM, Markus Krötzsch <markus.kroetzsch@cs.ox.ac.uk mailto:markus.kroetzsch@cs.ox.ac.uk> wrote:
On 10/08/13 12:18, Sebastian Hellmann wrote: By the way, you can always convert it to turtle easily: curl http://downloads.dbpedia.org/__3.8/ko/mappingbased___properties_ko.ttl.bz2 <http://downloads.dbpedia.org/3.8/ko/mappingbased_properties_ko.ttl.bz2> | bzcat | head -100 | rapper -i turtle -o turtle -I - - file If conversion is so easy, it does not seem worthwhile to have much of a discussion about this at all.
The point of the discussion to advocate for a format that is most useful to the maximum number of people out of the box. Rapper isn't installed by default on systems. A file format with independent lines can be processed using grep and other simple command line tools without having to find and install additional software.
I think the rapper command you refer to was only for expanding prefixes, not for making line-by-line syntax. Prefixes should not create any greping inconveniences.
Anyway, if you restrict yourself to tools that are installed by default on your system, then it will be difficult to do many interesting things with a 4.5G RDF file ;-) Seriously, the RDF dump is really meant specifically for tools that take RDF inputs. It is not very straightforward to encode all of Wikidata in triples, and it leads to some inconvenient constructions (especially a lot of reification). If you don't actually want to use an RDF tool and you are just interested in the data, then there would be easier ways of getting it.
Out of curiosity, what kind of use do you have in mind for the RDF (or for the data in general)?
Cheers,
Markus