On 14/03/12 19:02, Tomasz Finc wrote:
2012/3/14 Neil Harris<neil(a)tonal.clara.co.uk>uk>:
On 14/03/12 14:26, toni hernández wrote:
Hi all,
I have beem loooking at the wikipedia database scheme and I haven't found
any field that suggest that some contents are geographical located.
Am I wrong?
If it is possible I would like to download the geographical located
contents of Wikipedia to do something similar to what googleearth does with
the wikipedia layer
Is that possible?
Thanks in advanced.
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Hi Toni,
The geographical locations are stored in the article text, using templates,
and not (yet) available in the main database, although I believe there is
long term work planned to remedy this.
However, there is a project that parses and consolidates all this data
across all wikipedia languages. The simplest way to get this data is to get
a toolserver account, and to access the kolossus database on the Wikipedia
Toolserver. See
http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Wikipe…
for more detials.
Regards,
CC'ing Max as this is pretty close to what he's been working on
for
us. In short were looking at adding a parser hook to store DB
coordinates in a separate part of the db so that we can query them
much faster through our API.
--tomas
Hi Tomas,
Parsing the coordinates directly from the page source can be a bit
awkward, because many coordinates are generated indirectly via chains of
templates, and there are a large number of variations in the syntax used
to manage coordinates in templates across article sub-projects and
Wikipedia languages.
However, since they all end up generating links to the geohack page in a
fairly simple format that is standardized across all Wikipedia editions,
you can find these coordinates quite easilty either by parsing the
rendered HTML for a page, or (more efficiently, if you have direct
database access or can download Wikipedia dumps) by looking at the links
generated in the externallinks table.
You can find the spec for the geohack syntax here:
https://wiki.toolserver.org/view/GeoHack
Alternatively, you might want to take a look at using DBpedia, which
does a lot of this for you, although I'm not sure how fresh or accurate
their data is currently: see
http://dbpedia.org/About for more on this
project.
-- Neil