Hi Markus et al.
Thank you for the answer. I have a few follow-up questions as I'm not quite
grasping the toolkit.
Alternative 1:
So, if I'd like to do 1) I need a dump file, I've downloaded a *-current
dump (
)
and am trying to process it using the DumpProcessingController class -
which I'm assuming is the wrong way to go about this.
Is there a guide on how to parse local dumps?
Alternative 2:
I've been looking at the FetchOnlineDataExample and this seems to do pretty
much what I need, except for retrieving interlanguage links for a page
given the entity title - not the id. Is this possible, or is there a
possibility of getting the entity id given a page title in a given language?
Thanks
Alan
--
Alan Said
Recorded Future
e: alansaid(a)acm.org
t: @alansaid
w:
On Fri, Apr 17, 2015 at 5:17 PM, Markus Krötzsch <
markus(a)semantic-mediawiki.org> wrote:
Hi Alan,
The SitelinksExample shows how to get the basic language-links data. In
Wikidata, sites are encoded by IDs such as "enwiki" or
"frwikivoyage". To
find out what they mean in terms of URLs, you need to get the interlanguage
information first. The example shows you how to do this.
The site link information for a particular item can be found in the
ItemDocument for that item. There are two ways of getting an ItemDocument:
(1) You process the dump file to process all items one by one (in the
order in which they appear in the dump). This is best if you want to look
at very many items, or if you must work completely in offline mode.
(2) You fetch individual items from the Web API individually (random
access). This is best if you only need the links for a few selected items
only (fetching hundreds from the API is quick, fetching millions is
infeasible).
You can find many examples for doing things along the lines of (1) with
WDTK. For (2), see the example FetchOnlineDataExample (this is only part of
the development version of v0.5.0 so far, which you can find on github).
In either case, you can direclty read out any sitelink from the
ItemDocument object. It will give you the article title, the site id
("enwiki" etc.), and the list of badges (if any). To turn this into a URL,
you would use code as in the SitelinksExample.
Cheers,
Markus
On 17.04.2015 15:18, Alan Said wrote:
Hi all,
I am trying to use the Wikidata Toolkit to extract interlanguage links
for certain pages from Wikipedia.
So far, I've tried different attempts based on the code provided in
SiteLinksExample
(
https://github.com/Wikidata/Wikidata-Toolkit/blob/master/wdtk-examples/src/…
)
without any success. I've realized that this is likely not the correct
approach.
Optimally I'd like to do this while processing a local file, I've
downloaded a pages-meta-current.xml.bz2 file, but I can't really get my
head around how to go ahead with this.
Any pointers are appreciated.
Best,
Alan
--
Alan Said
Recorded Future
e: alansaid(a)acm.org <mailto:alansaid@acm.org>
t: @alansaid
w:
www.alansaid.com <http://www.alansaid.com>
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l