Based on the experience learned from WikiProject Geographical coordinates, I have prepared three Wikimedia extensions that you may find useful. The extensions can be enabled individually, but the concept is certainly more powerful when they are all enabled. I will briefly outline the extensions here:
---------------------------------------------------------------------------------------- A. The geo tag extension
The <geo> tag allows entry of geographical coordinates something in style with RFC1876. For example:
<geo>48 46 36 N 121 48 51 W</geo>.
It is designed to be flexible, and easy to use.
Variations of the above allows specification with decimals in various form, allowing less or more precision. Additional meta-data can also be specified as attributes for the location, like this:
<geo>48 46 36 N 121 48 51 W type:mountain region:US scale:100000</geo>
In the rendered article, the tag will be replaced with 48*°*46'36''N 121*°*48'51''W, which is also a Wikilink to a page of map resources for that point.
The main geo tag advantages are:
1. Consistent markup for coordinates. 2. Consistent rendering of coordinates. 3. Wikipedia articles with coordinates will get a 'geo.position' meta tag, making it compatible with Internet geographic resources, such as geourl.org. 4. Serves as an enabler for the two other extensions.
---------------------------------------------------------------------------------------- B. The map sources extension
The map sources extension is the target of the <geo> tag wikilink, and provides a page of available Internet map resources, in a manner much like the ISBN resource page. The extension provides functionality to 'preload' external URLs with coordinates, so that most maps are essentially one click away.
There are currently 30 different built in replacement strings, supporting various form of specification of scaling and coordinates, such as UTM, OSGB36 and CH1903.
There exists specialized versions of the map sources page for various regions (like US and GB). For the global version, there are at present preloaded pointers to around 20 different map engines.
In addition to the maps, there is a pointer to GeoURL.org, which lists nearby resources on the Internet.
There is also a direct link for the open source NASA World Wind software, allowing a new, interactive way of experiencing for aerial imagery and topological data. World Wind has a plug-in layer for Wikipedia articles that are tagged with a geographic coordinate.
Assuming the enabling of extension C, there is also a pointer to neighborhood articles in Wikipedia, listing the articles with Wikilinks, and their distance and direction from the present point.
---------------------------------------------------------------------------------------- C. The geo database extension
The geo database keeps track of all articles in Wikipedia with geographic coordinates, and provides the data source for the neighborhood information, as well as the data source for other external mechanisms taking advantage of the Wikipedia geographical information, such as the NASA World Wind Wikipedia overlay.
Additionally, the geo database will provide data for the future Wikimap, so that the maps produced by Wikimaps will contain all the relevant information from Wikipedia as clickable points. For this, geo attributes are crucial: Airports really should appear as airports on the map, mountains as mountains, and cities as cities, with the right magnitude.
---------------------------------------------------------------------------------------- For further information, see also http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Geographical_coordinates
---------------------------------------------------------------------------------------- Status
Currently extensions A and B are quite well developed, and have been tested for a time on an external server. Additionally, a few thousand articles in the en: Wikipedia are marked with geographic coordinates using an interim solution with templates, as a proof-of-concept. They will be converted to the geo tag if this extension gets enabled for the English Wikipedia. That will also solve the current problem with coordinates as arguments for infobox templates. Collection of data points have been done with an interim, external solution based on some Perl scripts.
Extension C has also been implemented, but I would want to discuss a few issues of performance and security before committing the code. More about this later.
Magnus Manske has been extremely helpful in the work on integration with Wikimaps, and that work will continue.
---------------------------------------------------------------------------------------- Questions
Starting with extensions A and B, I have some questions:
1. How should translations for extensions be handled? It would of course not be a problem adding to the existing resources in phase3/languages, but does translations for extensions belong there?
2. Should extensions be put in the extensions module or in the phase3/extensions directory?
3. Should these 3 extensions be put in the same place (requiring only one include in Localsettings.php to enable), or in 3 different directories?. I am currently using 3 different extensions, but I think having just one is better. I am also wondering about naming and policy. "Geo" seems to be taken.
---------------------------------------------------------------------------------------- Finally, I would like to give a big thank you to all participants in WikiProject Geographical coordinates who have helped immensely with suggestions practical work.
Regards, Egil Kvaleberg en:User:Egil
--- Egil Kvaleberg egil@kvaleberg.no wrote:
The geo database keeps track of all articles in Wikipedia with geographic coordinates, and provides the data source for the neighborhood information, as well as the data source for other external mechanisms taking advantage of the Wikipedia geographical information, such as the NASA World Wind Wikipedia overlay.
Additionally, the geo database will provide data for the future Wikimap, so that the maps produced by Wikimaps will contain all the relevant information from Wikipedia as clickable points. For this, geo attributes are crucial: Airports really should appear as airports on the map, mountains as mountains, and cities as cities, with the right magnitude.
I work with mappable data every day (I'm a GIS specialist) and I think what you guys have so far in this set of features is way, way cool.
Which reminds me - I should take a closer look at Magnus' WikiGIS module to see if it is going in the right direction.
-- mav
__________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/
Egil Kvaleberg wrote:
- How should translations for extensions be handled? It would of course
not be a problem adding to the existing resources in phase3/languages, but does translations for extensions belong there?
You can hook in to the main localization system by defining messages like this: $wgMessageCache->addMessage("searchnumber", "<strong>Results $1-$2 of $3</strong>");
but there afaik isn't a clean standard way of defining different defaults for each language yet.
- Should extensions be put in the extensions module or in the
phase3/extensions directory?
Extensions that go into our main CVS should go in the extensions module. The MediaWiki distribution bundle is built directly from the phase3 module and should have a clean, empty extensions subdirectory.
At some point it would be nice to have more accessible releases of individual extensions...
- Should these 3 extensions be put in the same place (requiring only
one include in Localsettings.php to enable), or in 3 different directories?. I am currently using 3 different extensions, but I think having just one is better. I am also wondering about naming and policy. "Geo" seems to be taken.
If they go together well and you wouldn't expect to enable/disable them separately (for instance, for different security needs etc) then it might make sense to bundle them together for convenience.
extensions/geo currently holds some experimental stuff Magnus was working on; I'm not sure if this is current.
-- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Brion Vibber schrieb:
extensions/geo currently holds some experimental stuff Magnus was working on; I'm not sure if this is current.
Egil has kindly taken over the whole maps/coordinates department, of which my geo/wikimaps thing will (hopefully) be a part. This gave me the time to overhaul the validation feature.
My "geo" extension is working nicely; Egil has improved it, but you can still see a slightly older version at [1]. There's also a howto about SVG-enabling your Windows-Firefox :-)
Magnus
[1] http://magnusmanske.de/wikimaps/index.php/Main_Page
Extensions that go into our main CVS should go in the extensions module. The MediaWiki distribution bundle is built directly from the phase3 module and should have a clean, empty extensions subdirectory.
At some point it would be nice to have more accessible releases of individual extensions...
I am still planning on evangalizing the mediawiki-agora (http://sourceforge.net/projects/mediawiki-agora), as a place where developers without core cvs access can develop extensions centrally. If anyone wants cvs access there, just let me know.
/Jonah
Brion Vibber wrote:
- Should extensions be put in the extensions module or in the
phase3/extensions directory?
Extensions that go into our main CVS should go in the extensions module.
OK, I did that. One caveat is that I could not get an include into "../extensions/gis/geo.php" to work (some security concern?), so I wrote in the instructions that extensions/gis needs to be moved into phase3/extensions/gis before use. Seems like a hack; there is probably something I didn't understand.
I have now included support for the geo database, and included support for finding neighboring articles as well as support for generating points for Wikimaps. The database table is defined as follows:
CREATE TABLE wikipedia_gis ( gis_id int(8) unsigned NOT NULL, gis_latitude_min real NOT NULL, gis_latitude_max real NOT NULL, gis_longitude_min real NOT NULL, gis_longitude_max real NOT NULL, gis_region char(2) binary default '', gis_type char(12) binary default '', gis_type_arg char(12) binary default '',
KEY gis_id (gis_id), INDEX gis_latitude_min (gis_latitude_min), INDEX gis_latitude_max (gis_latitude_max), INDEX gis_longitude_min (gis_longitude_min), INDEX gis_longitude_max (gis_longitude_max) );
Comments invited, I am totally stupid wrt databases. I added the table manually, but I believe there must be a fancy way in the maintenance module to do this without shell access. If anyone can tell me how, that would be appreciated.
Database entry is implemented as ArticleSaveComplete and ArticleDelete hooks. The hook will first delete all entries belonging to the article ID, and then, based on Parser::extractTags( 'geo',...), add new entries as required.
I have used article ID instead of title as key, because I believe that is more efficient. The downside is that I have to look up the title from the ID upon retrieval. This is currently done locally by:
$name_dbkey = $this->db->selectField( 'page', 'page_title', array( 'page_id' => $id), $fname ); return str_replace( '_', ' ', $name_dbkey );
This creates an unnecessary version dependency, I believe. So any suggestions for a better approach are highly welcome.
I am running a test on a private server with a couple of thousand points which seem to work fine, but if it is possible with a more public test, that would be highly appreciated.
I will update the documentation, but here is a couple of sample requests. To get a list of all Wikipedia articles within a radius of 5 km from the White House in Washington:
http:/phase3/extensions/gis/index.php?near=38.8954_N_-77.0366_E&dist=5
My sample implementation gives a list of 7 locations for the above request. (For the time being, I'm embedding this link as the first in "Map sources". It should probably appear in the 'navigation' box instead/also)
For a Wikimap style list of all airports in the world:
http:/phase3/extensions/gis/index.php?maparea=90_N_180_E_to_90_S_180_W_type:airport
The Wikimap lists are of course meant to be used indirectly in Wikimaps, but I believe the format should also be suitable as a basis for export also, e.g. to World Wind.
en:User:Egil
Egil Kvaleberg wrote:
I have now included support for the geo database, and included support for finding neighboring articles as well as support for generating points for Wikimaps.
Following up my previous mail, I've removed the gis_region column, and instead added a gis_globe column, so that it is easy to make this work in other worlds too, e.g. "Moon", "Mars" and I guess, "Sky" for the star maps.
The definition then becomes:
CREATE TABLE wikipedia_gis ( gis_id int(8) unsigned NOT NULL, gis_latitude_min real NOT NULL, gis_latitude_max real NOT NULL, gis_longitude_min real NOT NULL, gis_longitude_max real NOT NULL, gis_globe char(12) binary, gis_type char(12) binary, gis_type_arg char(12) binary,
KEY gis_id (gis_id), INDEX gis_latitude_min (gis_latitude_min), INDEX gis_latitude_max (gis_latitude_max), INDEX gis_longitude_min (gis_longitude_min), INDEX gis_longitude_max (gis_longitude_max) );
where gis_globe is NULL for Earth.
Again, comments very much invited.
Also, but anyone could take time to look at my other questions, that would be really, really apprecieated. To summarize:
How to make addition of this table possible without shell access via the maintenance module?
Should the table use article ID or title as key?
How to rerieve article title based on ID in a portable manner?
Any chance to test this extension on a broader scale?
Regards, en:User:Egil
Egil Kvaleberg wrote:
Following up my previous mail, I've removed the gis_region column, and instead added a gis_globe column, so that it is easy to make this work in other worlds too, e.g. "Moon", "Mars" and I guess, "Sky" for the star maps.
Neat! :)
CREATE TABLE wikipedia_gis ( gis_id int(8) unsigned NOT NULL, gis_latitude_min real NOT NULL, gis_latitude_max real NOT NULL, gis_longitude_min real NOT NULL, gis_longitude_max real NOT NULL, gis_globe char(12) binary, gis_type char(12) binary, gis_type_arg char(12) binary,
KEY gis_id (gis_id), INDEX gis_latitude_min (gis_latitude_min), INDEX gis_latitude_max (gis_latitude_max), INDEX gis_longitude_min (gis_longitude_min), INDEX gis_longitude_max (gis_longitude_max));
How do you expect this table to be queried? Individual indexes on the lat/long fields seem like they'd be kind of awkward, since it can only really use one in a given query.
How to make addition of this table possible without shell access via the maintenance module?
Provide an .sql file with the creation commands. Those poor souls without shell access generally have phpMyAdmin.
Should the table use article ID or title as key?
Assuming the referenced article is the one containing the item, use the page_id number; this is preserved when pages are renamed.
Don't call it gis_id, however; under our naming conventions that would be the name of a unique record ID within the gis table. gis_page would be better.
How to rerieve article title based on ID in a portable manner?
Assuming you're pulling a bunch of records, something like:
select page_namespace,page_title, gis_blah,etc from page,gis where page_id=gis_page and whatever_conditions;
Any chance to test this extension on a broader scale?
When I get a few spare hours I'm going to set up a 1.5 test wiki on my site; I'll set it up there.
-- brion vibber (brion @ pobox.com)
Just as a hint for implementation:
1) MySQL allows REGEX searches - perhaps this helps ! 2) You can also HAVE REGEXes AS table entries, which is a very, very, very interesting feature not everybody knows of. Both could help for storing or finding coordinates.
Tom
Brion Vibber schrieb:
Egil Kvaleberg wrote:
Following up my previous mail, I've removed the gis_region column, and instead added a gis_globe column, so that it is easy to make this work in other worlds too, e.g. "Moon", "Mars" and I guess, "Sky" for the star maps.
Neat! :)
CREATE TABLE wikipedia_gis ( gis_id int(8) unsigned NOT NULL, gis_latitude_min real NOT NULL,
I don't think it's good practice to use NULL to mean anything more than "unknown or missing value". Perhaps gis_globe should be "Earth" for Earth? Then there's no ambiguity.
Alan
On Fri, 25 Mar 2005 19:14:00 +0100, Egil Kvaleberg egil@kvaleberg.no wrote:
Egil Kvaleberg wrote:
[...]
where gis_globe is NULL for Earth.
[...]
Brion Vibber wrote:
CREATE TABLE wikipedia_gis ( gis_id int(8) unsigned NOT NULL, gis_latitude_min real NOT NULL, gis_latitude_max real NOT NULL, gis_longitude_min real NOT NULL, gis_longitude_max real NOT NULL, gis_globe char(12) binary, gis_type char(12) binary, gis_type_arg char(12) binary,
KEY gis_id (gis_id), INDEX gis_latitude_min (gis_latitude_min), INDEX gis_latitude_max (gis_latitude_max), INDEX gis_longitude_min (gis_longitude_min), INDEX gis_longitude_max (gis_longitude_max));
How do you expect this table to be queried? Individual indexes on the lat/long fields seem like they'd be kind of awkward, since it can only really use one in a given query.
The main use is a SELECT where
$condition = "gis_latitude_max >= " . $latmin . " AND gis_latitude_min <= " . $latmax . " AND gis_longitude_max >= " . $lonmin . " AND gis_longitude_min <= " . $lonmax . " AND gis_globe = '" . $globe . "'";
So perhaps the indexes are of no use in this case? (As mentioned, I'm totally stupid wrt. databases).
Sometimes there is also a condition for gis_type added.
Don't call it gis_id, however; under our naming conventions that would be the name of a unique record ID within the gis table. gis_page would be better.
Will do.
Egil
On Fri, Mar 25, 2005 at 11:45:15PM +0100, Egil Kvaleberg wrote:
Brion Vibber wrote:
How do you expect this table to be queried? Individual indexes on the lat/long fields seem like they'd be kind of awkward, since it can only really use one in a given query.
The main use is a SELECT where
$condition = "gis_latitude_max >= " . $latmin . " AND gis_latitude_min <= " . $latmax . " AND gis_longitude_max >= " . $lonmin . " AND gis_longitude_min <= " . $lonmax . " AND gis_globe = '" . $globe . "'";So perhaps the indexes are of no use in this case? (As mentioned, I'm totally stupid wrt. databases).
Probably only the first dimension would use the index. It might or might not speed things enough.
Indexes used for multidimentional data (in gis and graphics software) are very different from unidimensional indexes found in typical rdbmses.
Some dbs (= Postgres) have 2d and multi-d indexes exactly for stuff like that. It should be much faster to use them.
Tomasz Wegrzanowski wrote:
Indexes used for multidimentional data (in gis and graphics software) are very different from unidimensional indexes found in typical rdbmses.
Point taken.
Since you mention this, I suddenly discovered that MySQL in fact has a geospatial extensions with spatial indexes! Wow! I will investigate further, but it certainly seems to be just the thing for this. (Although it seems that compatibility with Postgres is also something to consider).
The other select I forgot to mention is by page ID, but that is a exact match on an indexed item, it should be unproblematic.
Egil
Egil Kvaleberg wrote:
Egil Kvaleberg wrote:
I have now included support for the geo database, and included support for finding neighboring articles as well as support for generating points for Wikimaps.
Following up my previous mail, I've removed the gis_region column, and instead added a gis_globe column, so that it is easy to make this work in other worlds too, e.g. "Moon", "Mars" and I guess, "Sky" for the star maps.
The definition then becomes:
CREATE TABLE wikipedia_gis ( gis_id int(8) unsigned NOT NULL, gis_latitude_min real NOT NULL, gis_latitude_max real NOT NULL, gis_longitude_min real NOT NULL, gis_longitude_max real NOT NULL, gis_globe char(12) binary, gis_type char(12) binary, gis_type_arg char(12) binary,
KEY gis_id (gis_id), INDEX gis_latitude_min (gis_latitude_min), INDEX gis_latitude_max (gis_latitude_max), INDEX gis_longitude_min (gis_longitude_min), INDEX gis_longitude_max (gis_longitude_max));
where gis_globe is NULL for Earth.
Again, comments very much invited.
You've missed adding a field for storing the geodetic reference (for example, WGS 84), which is particularly important for high-precision measurements. Some kind of height field would be good too (again, the nature of the height field would be defined by the geodetic reference).
-- Neil
Neil Harris wrote:
Egil Kvaleberg wrote:
You've missed adding a field for storing the geodetic reference (for example, WGS 84), which is particularly important for high-precision measurements.
No, there is no such field on purpose. I've set the requirement that the reference system should be WGS84, and that coordinates in other reference systems be converted to WGS84. I believe the latest US and EU reference systems are close enough (or equivalent to) WGS84.
Allowing different reference frames into the same database would not allow searches to for instance bounding boxes in a managable manner. It would also severly complicate the neighborhood calculations.
Obviously, with the precision required for many (but not all) of the articles in Wikipedia, choice of reference system is really an academic exercise.
For the map resources that refer to other references, the strategy is to calulate from the WGS84 reference as needed.
Some kind of height field would be good too (again, the nature of the height field would be defined by the geodetic reference).
I have in purpose tried to limit the information as much as possible. Presumable, given the WGS84 location, topologcial data can be found by other means if required.
Egil
The gis extension is now compatible with both MediaWiki version 1.4 and version 1.5. I spent some time testing with 1.4 this weekend, and saw no issues.
I have updated the documentation at meta:Gis
To recap, one can enable support for the geo tag, for the list of map sources and for the neighborhood article list in steps. Only the neighborhood list requires enabling of the database.
For the database mode, I looked at the MySQL spatial index support, but that requires version 4.1. So I've stuck with the vanilla type so far. Enabling the database will of course also enable support for articles being represented in the Wikimaps, but although the database is in place, the maps and map engine still requires a good deal of work.
Egil
wikitech-l@lists.wikimedia.org