On 14/03/12 19:02, Tomasz Finc wrote:
2012/3/14 Neil Harrisneil@tonal.clara.co.uk:
On 14/03/12 14:26, toni hernández wrote:
Hi all,
I have beem loooking at the wikipedia database scheme and I haven't found any field that suggest that some contents are geographical located. Am I wrong?
If it is possible I would like to download the geographical located contents of Wikipedia to do something similar to what googleearth does with the wikipedia layer Is that possible?
Thanks in advanced.
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Hi Toni,
The geographical locations are stored in the article text, using templates, and not (yet) available in the main database, although I believe there is long term work planned to remedy this.
However, there is a project that parses and consolidates all this data across all wikipedia languages. The simplest way to get this data is to get a toolserver account, and to access the kolossus database on the Wikipedia Toolserver. See http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Wikiped... for more detials.
Regards,
CC'ing Max as this is pretty close to what he's been working on for us. In short were looking at adding a parser hook to store DB coordinates in a separate part of the db so that we can query them much faster through our API.
--tomas
Hi Tomas,
Parsing the coordinates directly from the page source can be a bit awkward, because many coordinates are generated indirectly via chains of templates, and there are a large number of variations in the syntax used to manage coordinates in templates across article sub-projects and Wikipedia languages.
However, since they all end up generating links to the geohack page in a fairly simple format that is standardized across all Wikipedia editions, you can find these coordinates quite easilty either by parsing the rendered HTML for a page, or (more efficiently, if you have direct database access or can download Wikipedia dumps) by looking at the links generated in the externallinks table.
You can find the spec for the geohack syntax here: https://wiki.toolserver.org/view/GeoHack
Alternatively, you might want to take a look at using DBpedia, which does a lot of this for you, although I'm not sure how fresh or accurate their data is currently: see http://dbpedia.org/About for more on this project.
-- Neil