On 23-Apr-14 21:29, wikitech-l-request@lists.wikimedia.org wrote:
Re: API attribute ID for querying wikipedia pages
@Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/List_of_circulating_currencies, I would make an API call like this:https://en.wikipedia.org/w/api.php?action=parse&page=List%20of%20circula.... This returns 5 sections with "numbers" which I can use as reference points, but I would rather have a "number" for the table in the section. A section can have multiple tables.
Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable.
On Thu, Apr 24, 2014 at 2:24 PM, Daan Kuijsten daankuijsten@gmail.comwrote:
On 23-Apr-14 21:29, wikitech-l-request@lists.wikimedia.org wrote:
Re: API attribute ID for querying wikipedia pages
@Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/ List_of_circulating_currencies, I would make an API call like this: https://en.wikipedia.org/w/api.php?action=parse&page= List%20of%20circulating%20currencies&prop=sections&format=jsonfm. This returns 5 sections with "numbers" which I can use as reference points, but I would rather have a "number" for the table in the section. A section can have multiple tables.
Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable.
I see where you are coming from, but this implies that these are stable properties over multiple revisions, which they aren't. If I have a table in revision 1, remove it in revision 2, and add it back in in revision 3, is it still the same table? What if I slightly change it? How much do I have to change it before its identity changes?
A wiki(pedia) page is by its very nature a dynamic construct, and assigning stable identifiers to elements would make this at least extremely impractical.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, 24 Apr 2014 14:24:08 +0200, Daan Kuijsten daankuijsten@gmail.com wrote:
Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable.
You want Semantic MediaWiki[1] then (which the Wikipedias don't use) or Wikidata[2], which is one of Wikipedia's sister projects and has been growing very fast. Wikipedia was never intended to be machine-readable in the way you propose (although it does provide access to MediaWiki's awesome API).
[1] https://www.mediawiki.org/wiki/Extension:Semantic_MediaWiki [2] https://www.wikidata.org/
Hoi, I totally agree that you should be able to do this. However, would it not make more sense to get structured information from Wikidata? Thanks, GerardM
On 24 April 2014 14:24, Daan Kuijsten daankuijsten@gmail.com wrote:
On 23-Apr-14 21:29, wikitech-l-request@lists.wikimedia.org wrote:
Re: API attribute ID for querying wikipedia pages
@Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/ List_of_circulating_currencies, I would make an API call like this: https://en.wikipedia.org/w/api.php?action=parse&page= List%20of%20circulating%20currencies&prop=sections&format=jsonfm. This returns 5 sections with "numbers" which I can use as reference points, but I would rather have a "number" for the table in the section. A section can have multiple tables.
Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 04/24/2014 05:24 AM, Daan Kuijsten wrote:
On 23-Apr-14 21:29, wikitech-l-request@lists.wikimedia.org wrote:
Re: API attribute ID for querying wikipedia pages
@Matma Rex: This is way to general, I think it would be a lot better when this would be in more detail. For example when I want to fetch a table with all currencies on https://en.wikipedia.org/wiki/List_of_circulating_currencies, I would make an API call like this:https://en.wikipedia.org/w/api.php?action=parse&page=List%20of%20circula.... This returns 5 sections with "numbers" which I can use as reference points, but I would rather have a "number" for the table in the section. A section can have multiple tables.
Querying specific (structured) data from Wikipedia is still very difficult in my opinion. My suggestion is that every paragraph, image, link and table get a unique identifiable number. This way Wikipedia gets more machine readable.
We (the Parsoid team) are actually working on this, see https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Element_IDs
Besides making it possible to reference content, our goal is to use these ids as a key that lets us associate additional metadata with each element in the DOM.
We expect stable element ids to be available in Parsoid output by this summer.
Gabriel
wikitech-l@lists.wikimedia.org