Hi Alan,
The SitelinksExample shows how to get the basic language-links data. In Wikidata, sites are encoded by IDs such as "enwiki" or "frwikivoyage". To find out what they mean in terms of URLs, you need to get the interlanguage information first. The example shows you how to do this.
The site link information for a particular item can be found in the ItemDocument for that item. There are two ways of getting an ItemDocument:
(1) You process the dump file to process all items one by one (in the order in which they appear in the dump). This is best if you want to look at very many items, or if you must work completely in offline mode.
(2) You fetch individual items from the Web API individually (random access). This is best if you only need the links for a few selected items only (fetching hundreds from the API is quick, fetching millions is infeasible).
You can find many examples for doing things along the lines of (1) with WDTK. For (2), see the example FetchOnlineDataExample (this is only part of the development version of v0.5.0 so far, which you can find on github).
In either case, you can direclty read out any sitelink from the ItemDocument object. It will give you the article title, the site id ("enwiki" etc.), and the list of badges (if any). To turn this into a URL, you would use code as in the SitelinksExample.
Cheers,
Markus
On 17.04.2015 15:18, Alan Said wrote:
e: alansaid@acm.org <mailto:alansaid@acm.org>Hi all,
I am trying to use the Wikidata Toolkit to extract interlanguage links
for certain pages from Wikipedia.
So far, I've tried different attempts based on the code provided in
SiteLinksExample
(https://github.com/Wikidata/Wikidata-Toolkit/blob/master/wdtk-examples/src/main/java/org/wikidata/wdtk/examples/SitelinksExample.java)
without any success. I've realized that this is likely not the correct
approach.
Optimally I'd like to do this while processing a local file, I've
downloaded a pages-meta-current.xml.bz2 file, but I can't really get my
head around how to go ahead with this.
Any pointers are appreciated.
Best,
Alan
--
Alan Said
Recorded Future
t: @alansaid
w: www.alansaid.com <http://www.alansaid.com>
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l