(I sent this to xmldatadumps-l yesterday but just realised that this
might be a more suitable place.)
Hallo,
I'm looking at the data dumps for all Wikipedia languages and noticed
that for some larger wikis, the geo_tags.sql.gz dump file does not
include any geotags found in articles. Is it possible to determine why
this is, and for which languages this is the case?
For example, the geotags dump file for Indonesian (a wiki with
400,000 articles) is only 7kb large, and all geotags in
it are from
user pages, file uploads, or file templates, but not from articles:
https://dumps.wikimedia.org/idwiki/20181020/
Yet it doesn't take much effort to find pages that are geotagged, such
as this one (see the infobox):
https://id.wikipedia.org/wiki/London
I realise that there are a number of alternative geotagging
conventions. Does idwiki possibly use a geotagging scheme that is not
supported by some part of this data ingestion/export process? Which
other wikis/languages may fall in this category?
I tried to find the script(s) that populate the geo_tags table from
page content but so far had no luck, as I'm not sufficiently familiar
with WP's software architecture; if someone can point me in the right
direction I'd be happy to investigate myself.
Many thanks!
m.