On 27.08.2016 07:18, Sumit Asthana wrote:
Hi,
I'm trying to use offline wikidata dump https://dumps.wikimedia.org/wikidatawiki/entities/20160822/ but when I run an example from Wikidata Toolkit - EntityStatisticsProcessor https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/examples/EntityStatisticsProcessor.java, I hit the following error - https://dpaste.de/TNpd.
Apparently it is unable to parse the dump but I can't seem to figure it out. Help would be appreciated :)
This happens if your dump download was incomplete. It seems that (recently) the download is sometimes interrupted and needs to be resumed to get the whole file. Our implementation is not smart enough to fix this and ends up with an incomplete dump.
You can download the dump in any way you like, including using a browser with "safe as". I prefer to use wget. You just need to put it into the right directory where WDTK also puts dumps. When you start WDTK, it reports the file to be downloaded and the place where it puts the download, so this is one way to find out.
Dump files are the ones found at https://dumps.wikimedia.org/other/wikidata/ (with the file names used there). They go into the directory named like ./dumpfiles/wikidatawiki/json-20160801 (for the dump https://dumps.wikimedia.org/other/wikidata/20160801.json.gz). The dumpfiles directory is under the directory from where you run your program.
Best,
Markus
-Thanks, Sumit
On Sat, Aug 27, 2016 at 1:18 AM, Stas Malyshev <smalyshev@wikimedia.org mailto:smalyshev@wikimedia.org> wrote:
Hi! > For example "I want to know the number of statements on an average with > dead external reference links". Since there are over a million links in references, you probably may want to use dump - either JSON or RDF, and looking for references there. It would be relatively easy to find those in reference statements. However, checking a million links might require some careful planning :) -- Stas Malyshev smalyshev@wikimedia.org <mailto:smalyshev@wikimedia.org> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata