On 19 January 2011 14:29, Rajarshi Guha rajarshi.guha@gmail.com wrote:
Hi, I was trying to extract some information from the protein target infobox on protein target pages (eg http://en.wikipedia.org/wiki/Calreticulin or http://en.wikipedia.org/wiki/Hsp90).
However when I export the page via http://en.wikipedia.org/w/api.php?action=query&pageids=7120&export=&... the XML page does not seem to contain the information that I can see when viewing the page in the browser. For example, the XML export for Calreticulin does not contain the links to the rendering of the structure or the PDB identifiers and so on.
Is my export URL wrong? Or is there a reason that the infobox information is not exported and if so, is there a way to access it via export?
The XML output is mainly the "plain" wikitext code of the page, rather than the rendered text version. As a result, you don't get the rendered version of the infobox, you just get the snippet of code calling it:
{{PBB|geneid=811}}
This template is surprisingly simple - it takes the "geneid" number and directs to a pre-generated specific subpage, in this case
http://en.wikipedia.org/wiki/Template:PBB/811
The gallery box at the bottom works in the same way:
{{PDB Gallery|geneid=811}}
directs you to
http://en.wikipedia.org/wiki/Template:PDB_Gallery/811
I am not immediately sure why these are seperate rather than integrally part of the article, which is normal for infoboxes - perhaps because it dissuades well-meaning but erroneous passing alterations to the data, or because it simplifies maintenance. As you've noticed, while it's transparent to the user, it's a little confusing to working with!
It should be possible for you to pick the geneid number out of your export and then run an additional export on Template:PBB/$number and Template:PBB_Gallery/$number. Would that be sufficient?