On 27.08.2016 07:18, Sumit Asthana wrote:
Hi,
I'm trying to use offline wikidata dump
<https://dumps.wikimedia.org/wikidatawiki/entities/20160822/> but when I
run an example from Wikidata Toolkit - EntityStatisticsProcessor
<https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/examples/EntityStatisticsProcessor.java>,
I hit the following error -
https://dpaste.de/TNpd.
Apparently it is unable to parse the dump but I can't seem to figure it
out. Help would be appreciated :)
This happens if your dump download was incomplete. It seems that
(recently) the download is sometimes interrupted and needs to be resumed
to get the whole file. Our implementation is not smart enough to fix
this and ends up with an incomplete dump.
You can download the dump in any way you like, including using a browser
with "safe as". I prefer to use wget. You just need to put it into the
right directory where WDTK also puts dumps. When you start WDTK, it
reports the file to be downloaded and the place where it puts the
download, so this is one way to find out.
Dump files are the ones found at
https://dumps.wikimedia.org/other/wikidata/ (with the file names used
there). They go into the directory named like
./dumpfiles/wikidatawiki/json-20160801 (for the dump
https://dumps.wikimedia.org/other/wikidata/20160801.json.gz). The
dumpfiles directory is under the directory from where you run your program.
Best,
Markus
-Thanks,
Sumit
On Sat, Aug 27, 2016 at 1:18 AM, Stas Malyshev <smalyshev(a)wikimedia.org
<mailto:smalyshev@wikimedia.org>> wrote:
Hi!
For example "I want to know the number of
statements on an average with
dead external reference links".
Since there are over a million links in references, you probably may
want to use dump - either JSON or RDF, and looking for references there.
It would be relatively easy to find those in reference statements.
However, checking a million links might require some careful planning :)
--
Stas Malyshev
smalyshev(a)wikimedia.org <mailto:smalyshev@wikimedia.org>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata