Re: [WikiEN-l] extracting protein target infobox information via page export

19 Jan 2011


      On 19 January 2011 14:29, Rajarshi Guha rajarshi.guha@gmail.com wrote:
...
Hi, I was trying to extract some information from the protein target
infobox on protein target pages (eg
http://en.wikipedia.org/wiki/Calreticulin or
http://en.wikipedia.org/wiki/Hsp90).
However when I export the page via
http://en.wikipedia.org/w/api.php?action=query&pageids=7120&export=&...
the XML page does not seem to contain the information that I can see
when viewing the page in the browser. For example, the XML export for
Calreticulin does not contain the links to the rendering of the
structure or the PDB identifiers and so on.
Is my export URL wrong? Or is there a reason that the infobox
information is not exported and if so, is there a way to access it via
export?
The XML output is mainly the "plain" wikitext code of the page, rather
than the rendered text version. As a result, you don't get the
rendered version of the infobox, you just get the snippet of code
calling it:
{{PBB|geneid=811}}
This template is surprisingly simple - it takes the "geneid" number
and directs to a pre-generated specific subpage, in this case
http://en.wikipedia.org/wiki/Template:PBB/811
The gallery box at the bottom works in the same way:
{{PDB Gallery|geneid=811}}
directs you to
http://en.wikipedia.org/wiki/Template:PDB_Gallery/811
I am not immediately sure why these are seperate rather than
integrally part of the article, which is normal for infoboxes -
perhaps because it dissuades well-meaning but erroneous passing
alterations to the data, or because it simplifies maintenance. As
you've noticed, while it's transparent to the user, it's a little
confusing to working with!
It should be possible for you to pick the geneid number out of your
export and then run an additional export on Template:PBB/$number and
Template:PBB_Gallery/$number. Would that be sufficient?
-- 
- Andrew Gray
  andrew.gray@dunelm.org.uk

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] extracting protein target infobox information via page export