The issue was fixed in master now. I also added some more INFO-type
messages that will report about the dump files found online and locally.
Cheers,
Markus
On 18.01.2015 14:26, Markus Krötzsch wrote:
On 18.01.2015 10:58, Egon Willighagen wrote:
On Sat, Jan 17, 2015 at 11:04 PM, Markus
Krötzsch
<markus(a)semantic-mediawiki.org> wrote:
It is easy to fix this (though I will not fix it
tonight, but
tomorrow) by
just adjusting the HTML strings we parse for.
Sure! I have subscribed to the bug report.
As an intermediate workaround for me, what file name pattern is used
in the local cache?
I had manually downloaded a file (and made it available as torrent
because it was only at about 1 MB/s, [0]) and put this in the folder,
but it was not recognized... the file on the server is:
http://dumps.wikimedia.org/other/wikidata/20150112.json.gz
But as 20150112.json.gz it is not detected... I noted the the json-*
pattern in the code, but json-20150112.json.gz didn't work either...
The dump files are put into subdirectories of the current directory
("."), for example:
./dumpfiles/wikidatawiki/json-20150105/20150105.json.gz
(JSON dump)
./dumpfiles/wikidatawiki/current-20141009/wikidatawiki-20141009-pages-meta-current.xml.bz2
(current revision XML dump)
If you create a directory of this form and put a file in there with the
file name as found online, then the tool will find it.
BTW, a second question, is there a way to list all local (JSON) dumps
using the WDTK api?
Yes, though it's not very convenient right now. To restrict to local
files, you can use the DumpProcessingController in offline mode (then it
only looks at local files):
DumpProcessingController dumpProcessingController =
new DumpProcessingController("wikidatawiki");
dumpProcessingController.setOfflineMode(true);
List<MwDumpFile> localJsonDumps =
dumpProcessingController.
getWmfDumpFileManager().
findAllDumps(DumpContentType.JSON);
This gives you a list of MwDumpFile objects that you can access to get
their date (getDateStamp()) and also to access the file contents.
I think we should log some additional messages about the files that are
found and used.
Cheers,
Markus
We should also improve our error reporting for
this case, obviously.
Yeah, that's an art no software I ever worked with mastered... it's
hard! But it's important... I was completely looking in the wrong
place... mind you, monitoring logging messages can be hard too, when
WDTK is used in other environments, such as Bioclipse, and you cannot
rely on those message to show up :(
Thanks for immediately looking into it and looking forward to pointers
for my two questions,
greetings,
Egon