Afonso Arantes wrote:
I'm currently writing a custom script that will translate the english language wikipedia dump into html. Everything seems ok except that the Notes: sections with external links seem to be missing.
As an example, the article on Autism (id 25) goes straight from "See also" to "References". In the http://en.wikipedia.org/wiki/Autism page there is an extensive "Notes" section between these two with extensive commentary and numbered links to external sources. I cannot seem to find this material anywhere on the page, nor are there template directives that might include them. Links to the notes are made throughout the article but are not present in the xml.
Am I missing something obvious or do should I download another file with this extra information? I thought that maybe this was just an older dump but other articles seem to be affected as well.
Thank you for your help.
I haven't seen the dump myself, but the Autism article utilizes <ref> and <references /> tags [1], which means that Wikipedia is generating the Notes section from the many <ref> tags in the article body. This is thanks to the Cite.php extension [2]. My guess is that the dumps haven't done anything with the <ref> tags, so you should be able to piece together a Notes section with your script.
[1] http://en.wikipedia.org/w/index.php?title=Autism&action=edit%C2%A7ion=28 [2] http://meta.wikimedia.org/wiki/Cite/Cite.php