On 04/24/2014 05:24 AM, Daan Kuijsten wrote:
On 23-Apr-14 21:29, wikitech-l-request@lists.wikimedia.org wrote:
Re: API attribute ID for querying wikipedia pages
@Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/List_of_circulating_currencies, I would make an API call like this:https://en.wikipedia.org/w/api.php?action=parse&page=List%20of%20circula.... This returns 5 sections with "numbers" which I can use as reference points, but I would rather have a "number" for the table in the section. A section can have multiple tables.
Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable.
We (the Parsoid team) are actually working on this, see https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Element_IDs
Besides making it possible to reference content, our goal is to use these ids as a key that lets us associate additional metadata with each element in the DOM.
We expect stable element ids to be available in Parsoid output by this summer.
Gabriel