On Wed, 23 Apr 2014 09:48:17 +0200, Daan Kuijsten daankuijsten@gmail.com wrote:
A better way for fetching content via an API is to assign a unique ID to a section, a paragraph, a table, an image etc. This way we could simply fetch a part of the content of wikipedia via this ID.
Such ids already exist, and they are present in the page HTML as 'id' attributes on the headings. They are constructed simply based on heading text, with unique identifiers appended if duplicates happen. You can access these via the API too, using action=parse&prop=sections [1] (the 'anchor' property), then map them to the numerical identifiers other API modules use (the 'number' property).
[1] https://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&pro...