Hello,
my name is Norbert Kurz and I am a student of applied computer science in Germany.
I downloaded the 7.8GB XML dump of the german wikipedia and splittet it into article files.
Now I wanted to parse the Text in the text tag (<text>) into an html page, my Problem is, that there is a special syntax for tables, lists, links etc.
My question is: Is there a definition of the XML syntax, so it is easily possible to write a XML to HTML script?
E.g. Zu den Regisseuren, die das Pseudonym benutzt haben, gehören: * [[Don Siegel]] und [[Robert Totten]] (für [[Frank Patch – Deine Stunden sind gezählt]]), * [[David Lynch]] (für die dreistündige Fernsehfassung von [[Der Wüstenplanet (Film)|Der Wüstenplanet]]), * [[Chris Christensen]] (The Omega Imperative), * [[Stuart Rosenberg]] (für [[Let’s Get Harry]]), * [[Richard C. Sarafian]] (für [[Starfire]]), * [[Dennis Hopper]] (für [[Catchfire]]), * [[Arthur Hiller]] (für [[An Alan Smithee Film: Burn Hollywood Burn]]), * [[Rick Rosenthal]] (Birds II) und * [[Kevin Yagher]] ([[Hellraiser IV – Bloodline]]). * Der Pilotfilm der Serie [[MacGyver]] führt einen Alan Smithee als Regisseur <ref>http://www.imdb.com/title/tt0165375/ </ref>
The asterix means, that there is a list, the two brackets [[ means, that there is a link the pipe: [[ LINKNAME | SHOWN_NAME ]]
Is there a file that descripes all of these special cases and the latex stuff written in the XML files ( \longrightarrow ) and the tables?
Now I want to thank you all for your great work, I am happy that you make the effort to export the whole wikipedia, so other people can download it and play around. Please keep up your good work.
Thanks in advance for your help.
Best regards
Norbert Kurz, Stuttgart Germany