Based on the experience learned from WikiProject Geographical
coordinates, I have prepared three Wikimedia extensions that you may
find useful. The extensions can be enabled individually, but the concept
is certainly more powerful when they are all enabled. I will briefly
outline the extensions here:
----------------------------------------------------------------------------------------
A. The geo tag extension
The <geo> tag allows entry of geographical coordinates something in
style with RFC1876. For example:
<geo>48 46 36 N 121 48 51 W</geo>.
It is designed to be flexible, and easy to use.
Variations of the above allows specification with decimals in various
form, allowing less or more precision.
Additional meta-data can also be specified as attributes for the
location, like this:
<geo>48 46 36 N 121 48 51 W type:mountain region:US
scale:100000</geo>
In the rendered article, the tag will be replaced with 48*°*46'36''N
121*°*48'51''W, which is also a Wikilink to a page of map resources for
that point.
The main geo tag advantages are:
1. Consistent markup for coordinates.
2. Consistent rendering of coordinates.
3. Wikipedia articles with coordinates will get a 'geo.position' meta
tag, making it compatible with Internet geographic resources, such as
geourl.org.
4. Serves as an enabler for the two other extensions.
----------------------------------------------------------------------------------------
B. The map sources extension
The map sources extension is the target of the <geo> tag wikilink, and
provides a page of available Internet map resources, in a manner much
like the ISBN resource page. The extension provides functionality to
'preload' external URLs with coordinates, so that most maps are
essentially one click away.
There are currently 30 different built in replacement strings,
supporting various form of specification of scaling and coordinates,
such as UTM, OSGB36 and CH1903.
There exists specialized versions of the map sources page for various
regions (like US and GB). For the global version, there are at present
preloaded pointers to around 20 different map engines.
In addition to the maps, there is a pointer to GeoURL.org, which lists
nearby resources on the Internet.
There is also a direct link for the open source NASA World Wind
software, allowing a new, interactive way of experiencing for aerial
imagery and topological data. World Wind has a plug-in layer for
Wikipedia articles that are tagged with a geographic coordinate.
Assuming the enabling of extension C, there is also a pointer to
neighborhood articles in Wikipedia, listing the articles with Wikilinks,
and their distance and direction from the present point.
----------------------------------------------------------------------------------------
C. The geo database extension
The geo database keeps track of all articles in Wikipedia with
geographic coordinates, and provides the data source for the
neighborhood information, as well as the data source for other external
mechanisms taking advantage of the Wikipedia geographical information,
such as the NASA World Wind Wikipedia overlay.
Additionally, the geo database will provide data for the future Wikimap,
so that the maps produced by Wikimaps will contain all the relevant
information from Wikipedia as clickable points. For this, geo attributes
are crucial: Airports really should appear as airports on the map,
mountains as mountains, and cities as cities, with the right magnitude.
----------------------------------------------------------------------------------------
For further information, see also
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Geographical_coordinates
----------------------------------------------------------------------------------------
Status
Currently extensions A and B are quite well developed, and have been
tested for a time on an external server. Additionally, a few thousand
articles in the en: Wikipedia are marked with geographic coordinates
using an interim solution with templates, as a proof-of-concept. They
will be converted to the geo tag if this extension gets enabled for the
English Wikipedia. That will also solve the current problem with
coordinates as arguments for infobox templates. Collection of data
points have been done with an interim, external solution based on some
Perl scripts.
Extension C has also been implemented, but I would want to discuss a few
issues of performance and security before committing the code. More
about this later.
Magnus Manske has been extremely helpful in the work on integration with
Wikimaps, and that work will continue.
----------------------------------------------------------------------------------------
Questions
Starting with extensions A and B, I have some questions:
1. How should translations for extensions be handled? It would of course
not be a problem adding to the existing resources in phase3/languages,
but does translations for extensions belong there?
2. Should extensions be put in the extensions module or in the
phase3/extensions directory?
3. Should these 3 extensions be put in the same place (requiring only
one include in Localsettings.php to enable), or in 3 different
directories?. I am currently using 3 different extensions, but I think
having just one is better. I am also wondering about naming and policy.
"Geo" seems to be taken.
----------------------------------------------------------------------------------------
Finally, I would like to give a big thank you to all participants in
WikiProject Geographical coordinates who have helped immensely with
suggestions practical work.
Regards,
Egil Kvaleberg
en:User:Egil
Has somebody written a script to convert mediawiki markup to html? I'd
need something which works with embedded latex. Looking at the
mediawiki source it doesn't seem like it should be *that* hard but I
haven't tried yet.
Frederik
--
http://ofb.net/~frederik/
>
> As the change from 1.4 to 1.5 db will be a big step anyway, two
> additions come to mind:
>
> * Language links
> Should we finally put these into a real table? At least, additionally to
> keep them in the text? With an interwiki link table up and current, we
> could then switch to "real" interwiki management at a later stage.
>
Have there been any considerations to add support
for the following ideas:
http://meta.wikimedia.org/wiki/Reviewed_article_version
Schewek
--
______________________________________________
Check out the latest SMS services @ http://www.linuxmail.org
This allows you to send and receive SMS through your mailbox.
Powered by Outblaze
In a similar vein, what reasons are there for not hosting each project
on its own server? Especially if we have extra small boxes lying
around. It would be most gratifying if only one project went down at
a time, and if smaller projects were never slowed down by database
locking issues on the larger ones.
SJ
On Mon, 28 Mar 2005 19:14:33 -0800, Michael Snow
<wikipedia(a)earthlink.net> wrote:
> I'm close to being out of my depth on some of this discussion, but
> perhaps somebody would be able to explain for me. A few people have
> raised the possibility that what is technically desirable on one project
> may not be so on another.
--
+sj+
I have some experience with this sort of thing, so thought I would add
my 2p to the information pool being shared here.
1) In general, there is no such thing as a universal format.
Having a data mediation format that spans versions is often an
intractable problem to solve. Essentially, if we can find a format that
is agnostic to any version of the application, then we would just use
that format as the data schema and not worry about data migrations for
any version change because every version uses the same format. Finding
such a format nearly always subsumes the possibility of future
application innovation.
2) An existing standard can be settled upon that meets core needs.
In this case, the stakeholders identify a standard format that has some
level of widespread use and agree to always have the capability to
export and import in that format. This is how we individually overcome
limits in the applications we use daily. Specifically, we often search
for a Save-As format from a source application that we know is
accommodated by a destination application. The problem with this is
that although it can be lossy, it is more likely to be gainful - meaning
that the importing application has to make assumptions in order to fill
in missing data that it might need.
This solution is not ideal, primarily because there may be a
data requirement of the importing application that cannot be
algorithmically determined. As a result, human intervention might be
required for each unit of data imported. This is certainly not a
reasonable solution for even moderately sized datasets of just a few
hundred elements.
3) Look-ahead designs are used before features are implemented.
In this case, a very heavy-weight design effort attempts to
prognosticate the data design well ahead of code implementation. This
actually can be done if innovation is buffered and features are queued
and agreed upon well in advance. This is about as un-agile as software
development gets, however; and, as most software engineers know, it is
brutally difficult to design something to this level of detail so far
ahead of implementation (and indeed it almost always fails in my
experience).
4) Create a migration mechanism for each release.
This is typically what is done. The reasons are simple, the source
application data formats are well known and the destination data formats
are well known. The only thing needed is an intelligent mapping from
one to the other. As Lee has pointed out, the problem with this is that
it places a burden on the user community to stay abreast of development
whenever a migration is required.
I am sure there are other analyses in the solution domain, but the above
is off the top of my head. Although certainly not empirical, I
conjecture that an industry best practice is to provide 4) as a minimum,
and support a collection of widespread formats for 2).
Sorry for rambling on about this, but this has been a problem that has
been around for a long time in software engineering circles. Comments
and criticisms welcome.
Thanks,
George
-----Original Message-----
From: mediawiki-l-bounces(a)Wikimedia.org
[mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Lee Daniel
Crocker
Sent: Monday, March 28, 2005 11:26 AM
To: Wikimedia developers
Cc: Mediawiki List
Subject: [Mediawiki-l] Re: [Wikitech-l] Long-term: Wiki import/export
format
On Mon, 2005-03-28 at 17:51 +0200, Lars Aronsson wrote:
> It sounds so easy. But would you accept this procedure if it requires
> that Wikipedia is unavailable or read-only for one hour? for one day?
> for one week? The conversion time should be a design requirement.
> ...
> Not converting the database is the fastest way to cut conversion time.
> Perhaps you can live with the legacy format? Consider it.
A properly written export shouldn't need to have exclusive access to the
database at all. The only thing that would need that is a complete
reinstall and import, which is only one application of the format and
should be needed very rarely (switching to a wholly new hardware or
software base, for example). In those few cases (maybe once every few
years or so), Wikipedia being uneditable for a few days would not be
such a terrible thing--better than it being down completely because the
servers are overwhelmed.
--
Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/>
<http://creativecommons.org/licenses/publicdomain/>
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Problems: The frequently-changing database schema in which the wiki
information is stored makes it difficult to maintain data across
upgrades (requiring conversion scripts), offers no easy backup
functionality, makes it difficult to access the data with other tools,
and is generally fragile.
Proposed solution: Let's create a standardized file format (probably
something XML-ish) for storing the information contained in a wiki.
All the text, revisions, meta-data, and so on would be stored in a
well-defined format, so that, for example, upgrading the wiki software
(from any version to any other--no need to do one at a time!) could
be done by exporting the wiki into this format and then importing it
into the new installation. The export format would be publishable
and easier to use for other applications, would be a simple file
system for which commonly-available backup tools could be used. A
periodic export/import would serve to clean the database of any
reference errors and fragmentation. Tools could be created to work
with the new format to create subsets, mirrors, and so on.
I already have some idea of what is needed, but I solicit input.
--
Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/>
<http://creativecommons.org/licenses/publicdomain/>
Hi
I added a comment to Bug 1012 and I thought I'd explain it here little bit
more. The problem:
When requesting category one will get the raw Wiki-text as a response which
will be like this in the case of
http://en.wikipedia.org/wiki/Category:Geography_by_country?action=raw (or
edit):
[[Category:Geography|*Country, Grouped by]]
[[Category:Categories by country]]
{{catAZ}}
... some few lines more...
But what an API or a crawler would expect in this case is the fully
evaluated Wiki-text (or XML). This will be the same content of the HTML
result page but with no style and as a - potentially large - contiguous list
of countries (in Wiki-text or XML again).
I now have looked again what has been proposed until today and I found "Bug
208: API for external access" some requests at Wikitech-l like a
"Minimalistic Web-API for use by Tools nad Bots".
All alternative solutions proposed so far, like the Phyton Framework (being
a HTML screen scraper) or Perl (reading SQL-Dump) do not solve our problem.
So I really want to begin to play around with the Wikimedia-PHP code.
Any comments or help?
Stefan
I'm working on implementing (in 1.5) the feature requested in BUG
#1289 (http://bugzilla.wikipedia.org/show_bug.cgi?id=1289), which I
believe can be (at least partially) addressed by allowing for
descending category lists.
The thought I had was to implement a new magic word "__SORT_DESC__"
(of course, the actual wording is open to change). If this directive
is included in the category page text, then the category listings
would be in descending order.
I've added a bit of code to Parser.php that detects this magic word
and makes a call to the CategoryPage object to tell it to use a DESC
sort. Seems simple enough.
I've got things working, except that when the cached copy of the
parsed page is used, this call to the CategoryPage object is never
executed, and the category list is not DESC. When the cached copy of
the page is not used (or does not exist), then things happen just
right.
The actual category lists are not cached -- only the introductory
portion that people can edit manually. So when I first visit such a
page, things look great, But if I do a reload (eg, ctrl-F5 on Firefox)
the directive is lost. I run into the same problem on subsequent pages
of a multi-page category list.
Short of turning off the parserCache for all category pages, does any
know how to avoid this problem?
-- Rich Holton
en.wikipedia:User:Rholton
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
As the change from 1.4 to 1.5 db will be a big step anyway, two
additions come to mind:
* Language links
Should we finally put these into a real table? At least, additionally to
keep them in the text? With an interwiki link table up and current, we
could then switch to "real" interwiki management at a later stage.
* Templates
At German wikipedia, we have a "Personendaten" (person data) template.
It might be of use to be able to access the data put into it. I don't
have an actual application for that, though.
None of these items is essential, but IMHO there's an oppurtunity to at
least prepare for an improved metadata management.
Magnus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFCSD+JCZKBJbEFcz0RAluFAJ9HX3wfsuWhmiZW7px+RxEnZFAvnwCfR6li
YjtE6suBRh0Q74URXomtIj0=
=CGOC
-----END PGP SIGNATURE-----
Hello, i need some help please ....
how to edit the search result page into the same one like the org. wiki
search result page
(http://en.wikipedia.org/wiki/Special:Search?search=search&go=Go)
with google search and especially create an article with this title or ..... ?
thank you,
adam