Wikitech-l March 2005

wikitech-l@lists.wikimedia.org

128 participants
175 discussions

Extensions for handling geographical coordinates
by Egil Kvaleberg 29 Mar '05

29 Mar '05

Based on the experience learned from WikiProject Geographical coordinates, I have prepared three Wikimedia extensions that you may find useful. The extensions can be enabled individually, but the concept is certainly more powerful when they are all enabled. I will briefly outline the extensions here: ---------------------------------------------------------------------------------------- A. The geo tag extension The <geo> tag allows entry of geographical coordinates something in style with RFC1876. For example: <geo>48 46 36 N 121 48 51 W</geo>. It is designed to be flexible, and easy to use. Variations of the above allows specification with decimals in various form, allowing less or more precision. Additional meta-data can also be specified as attributes for the location, like this: <geo>48 46 36 N 121 48 51 W type:mountain region:US scale:100000</geo> In the rendered article, the tag will be replaced with 48*°*46'36''N 121*°*48'51''W, which is also a Wikilink to a page of map resources for that point. The main geo tag advantages are: 1. Consistent markup for coordinates. 2. Consistent rendering of coordinates. 3. Wikipedia articles with coordinates will get a 'geo.position' meta tag, making it compatible with Internet geographic resources, such as geourl.org. 4. Serves as an enabler for the two other extensions. ---------------------------------------------------------------------------------------- B. The map sources extension The map sources extension is the target of the <geo> tag wikilink, and provides a page of available Internet map resources, in a manner much like the ISBN resource page. The extension provides functionality to 'preload' external URLs with coordinates, so that most maps are essentially one click away. There are currently 30 different built in replacement strings, supporting various form of specification of scaling and coordinates, such as UTM, OSGB36 and CH1903. There exists specialized versions of the map sources page for various regions (like US and GB). For the global version, there are at present preloaded pointers to around 20 different map engines. In addition to the maps, there is a pointer to GeoURL.org, which lists nearby resources on the Internet. There is also a direct link for the open source NASA World Wind software, allowing a new, interactive way of experiencing for aerial imagery and topological data. World Wind has a plug-in layer for Wikipedia articles that are tagged with a geographic coordinate. Assuming the enabling of extension C, there is also a pointer to neighborhood articles in Wikipedia, listing the articles with Wikilinks, and their distance and direction from the present point. ---------------------------------------------------------------------------------------- C. The geo database extension The geo database keeps track of all articles in Wikipedia with geographic coordinates, and provides the data source for the neighborhood information, as well as the data source for other external mechanisms taking advantage of the Wikipedia geographical information, such as the NASA World Wind Wikipedia overlay. Additionally, the geo database will provide data for the future Wikimap, so that the maps produced by Wikimaps will contain all the relevant information from Wikipedia as clickable points. For this, geo attributes are crucial: Airports really should appear as airports on the map, mountains as mountains, and cities as cities, with the right magnitude. ---------------------------------------------------------------------------------------- For further information, see also http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Geographical_coordinates ---------------------------------------------------------------------------------------- Status Currently extensions A and B are quite well developed, and have been tested for a time on an external server. Additionally, a few thousand articles in the en: Wikipedia are marked with geographic coordinates using an interim solution with templates, as a proof-of-concept. They will be converted to the geo tag if this extension gets enabled for the English Wikipedia. That will also solve the current problem with coordinates as arguments for infobox templates. Collection of data points have been done with an interim, external solution based on some Perl scripts. Extension C has also been implemented, but I would want to discuss a few issues of performance and security before committing the code. More about this later. Magnus Manske has been extremely helpful in the work on integration with Wikimaps, and that work will continue. ---------------------------------------------------------------------------------------- Questions Starting with extensions A and B, I have some questions: 1. How should translations for extensions be handled? It would of course not be a problem adding to the existing resources in phase3/languages, but does translations for extensions belong there? 2. Should extensions be put in the extensions module or in the phase3/extensions directory? 3. Should these 3 extensions be put in the same place (requiring only one include in Localsettings.php to enable), or in 3 different directories?. I am currently using 3 different extensions, but I think having just one is better. I am also wondering about naming and policy. "Geo" seems to be taken. ---------------------------------------------------------------------------------------- Finally, I would like to give a big thank you to all participants in WikiProject Geographical coordinates who have helped immensely with suggestions practical work. Regards, Egil Kvaleberg en:User:Egil

9 15

looking for way to convert mediawiki markup to html
by Frederik Eaton 29 Mar '05

29 Mar '05

Has somebody written a script to convert mediawiki markup to html? I'd need something which works with embedded latex. Looking at the mediawiki source it doesn't seem like it should be *that* hard but I haven't tried yet. Frederik -- http://ofb.net/~frederik/

4 5

Re: [Wikitech-l] Potential additions to 1.5 database schema
by b schewek 29 Mar '05

29 Mar '05

> > As the change from 1.4 to 1.5 db will be a big step anyway, two > additions come to mind: > > * Language links > Should we finally put these into a real table? At least, additionally to > keep them in the text? With an interwiki link table up and current, we > could then switch to "real" interwiki management at a later stage. > Have there been any considerations to add support for the following ideas: http://meta.wikimedia.org/wiki/Reviewed_article_version Schewek -- ______________________________________________ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze

6 6

Re: [Foundation-l] Re: Development tasks and project needs
by Sj 29 Mar '05

29 Mar '05

In a similar vein, what reasons are there for not hosting each project on its own server? Especially if we have extra small boxes lying around. It would be most gratifying if only one project went down at a time, and if smaller projects were never slowed down by database locking issues on the larger ones. SJ On Mon, 28 Mar 2005 19:14:33 -0800, Michael Snow <wikipedia(a)earthlink.net> wrote: > I'm close to being out of my depth on some of this discussion, but > perhaps somebody would be able to explain for me. A few people have > raised the possibility that what is technically desirable on one project > may not be so on another. -- +sj+

1 0

RE: [Mediawiki-l] Re: [Wikitech-l] Long-term: Wiki import/export format
by George Stevens 28 Mar '05

28 Mar '05

I have some experience with this sort of thing, so thought I would add my 2p to the information pool being shared here. 1) In general, there is no such thing as a universal format. Having a data mediation format that spans versions is often an intractable problem to solve. Essentially, if we can find a format that is agnostic to any version of the application, then we would just use that format as the data schema and not worry about data migrations for any version change because every version uses the same format. Finding such a format nearly always subsumes the possibility of future application innovation. 2) An existing standard can be settled upon that meets core needs. In this case, the stakeholders identify a standard format that has some level of widespread use and agree to always have the capability to export and import in that format. This is how we individually overcome limits in the applications we use daily. Specifically, we often search for a Save-As format from a source application that we know is accommodated by a destination application. The problem with this is that although it can be lossy, it is more likely to be gainful - meaning that the importing application has to make assumptions in order to fill in missing data that it might need. This solution is not ideal, primarily because there may be a data requirement of the importing application that cannot be algorithmically determined. As a result, human intervention might be required for each unit of data imported. This is certainly not a reasonable solution for even moderately sized datasets of just a few hundred elements. 3) Look-ahead designs are used before features are implemented. In this case, a very heavy-weight design effort attempts to prognosticate the data design well ahead of code implementation. This actually can be done if innovation is buffered and features are queued and agreed upon well in advance. This is about as un-agile as software development gets, however; and, as most software engineers know, it is brutally difficult to design something to this level of detail so far ahead of implementation (and indeed it almost always fails in my experience). 4) Create a migration mechanism for each release. This is typically what is done. The reasons are simple, the source application data formats are well known and the destination data formats are well known. The only thing needed is an intelligent mapping from one to the other. As Lee has pointed out, the problem with this is that it places a burden on the user community to stay abreast of development whenever a migration is required. I am sure there are other analyses in the solution domain, but the above is off the top of my head. Although certainly not empirical, I conjecture that an industry best practice is to provide 4) as a minimum, and support a collection of widespread formats for 2). Sorry for rambling on about this, but this has been a problem that has been around for a long time in software engineering circles. Comments and criticisms welcome. Thanks, George -----Original Message----- From: mediawiki-l-bounces(a)Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Lee Daniel Crocker Sent: Monday, March 28, 2005 11:26 AM To: Wikimedia developers Cc: Mediawiki List Subject: [Mediawiki-l] Re: [Wikitech-l] Long-term: Wiki import/export format On Mon, 2005-03-28 at 17:51 +0200, Lars Aronsson wrote: > It sounds so easy. But would you accept this procedure if it requires > that Wikipedia is unavailable or read-only for one hour? for one day? > for one week? The conversion time should be a design requirement. > ... > Not converting the database is the fastest way to cut conversion time. > Perhaps you can live with the legacy format? Consider it. A properly written export shouldn't need to have exclusive access to the database at all. The only thing that would need that is a complete reinstall and import, which is only one application of the format and should be needed very rarely (switching to a wholly new hardware or software base, for example). In those few cases (maybe once every few years or so), Wikipedia being uneditable for a few days would not be such a terrible thing--better than it being down completely because the servers are overwhelmed. -- Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/> <http://creativecommons.org/licenses/publicdomain/> _______________________________________________ MediaWiki-l mailing list MediaWiki-l(a)Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

1 0

Long-term: Wiki import/export format
by Lee Daniel Crocker 28 Mar '05

28 Mar '05

Problems: The frequently-changing database schema in which the wiki information is stored makes it difficult to maintain data across upgrades (requiring conversion scripts), offers no easy backup functionality, makes it difficult to access the data with other tools, and is generally fragile. Proposed solution: Let's create a standardized file format (probably something XML-ish) for storing the information contained in a wiki. All the text, revisions, meta-data, and so on would be stored in a well-defined format, so that, for example, upgrading the wiki software (from any version to any other--no need to do one at a time!) could be done by exporting the wiki into this format and then importing it into the new installation. The export format would be publishable and easier to use for other applications, would be a simple file system for which commonly-available backup tools could be used. A periodic export/import would serve to clean the database of any reference errors and fragmentation. Tools could be created to work with the new format to create subsets, mirrors, and so on. I already have some idea of what is needed, but I solicit input. -- Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/> <http://creativecommons.org/licenses/publicdomain/>

10 29

Add evaluation param to action=raw and Special:Export page
by Stefan F. Keller 28 Mar '05

28 Mar '05

Hi I added a comment to Bug 1012 and I thought I'd explain it here little bit more. The problem: When requesting category one will get the raw Wiki-text as a response which will be like this in the case of http://en.wikipedia.org/wiki/Category:Geography_by_country?action=raw (or edit): [[Category:Geography|*Country, Grouped by]] [[Category:Categories by country]] {{catAZ}} ... some few lines more... But what an API or a crawler would expect in this case is the fully evaluated Wiki-text (or XML). This will be the same content of the HTML result page but with no style and as a - potentially large - contiguous list of countries (in Wiki-text or XML again). I now have looked again what has been proposed until today and I found "Bug 208: API for external access" some requests at Wikitech-l like a "Minimalistic Web-API for use by Tools nad Bots". All alternative solutions proposed so far, like the Phyton Framework (being a HTML screen scraper) or Perl (reading SQL-Dump) do not solve our problem. So I really want to begin to play around with the Wikimedia-PHP code. Any comments or help? Stefan

1 0

Running afoul the parser cache...
by Richard Holton 28 Mar '05

28 Mar '05

I'm working on implementing (in 1.5) the feature requested in BUG #1289 (http://bugzilla.wikipedia.org/show_bug.cgi?id=1289), which I believe can be (at least partially) addressed by allowing for descending category lists. The thought I had was to implement a new magic word "__SORT_DESC__" (of course, the actual wording is open to change). If this directive is included in the category page text, then the category listings would be in descending order. I've added a bit of code to Parser.php that detects this magic word and makes a call to the CategoryPage object to tell it to use a DESC sort. Seems simple enough. I've got things working, except that when the cached copy of the parsed page is used, this call to the CategoryPage object is never executed, and the category list is not DESC. When the cached copy of the page is not used (or does not exist), then things happen just right. The actual category lists are not cached -- only the introductory portion that people can edit manually. So when I first visit such a page, things look great, But if I do a reload (eg, ctrl-F5 on Firefox) the directive is lost. I run into the same problem on subsequent pages of a multi-page category list. Short of turning off the parserCache for all category pages, does any know how to avoid this problem? -- Rich Holton en.wikipedia:User:Rholton

1 0

Potential additions to 1.5 database schema
by Magnus Manske 28 Mar '05

28 Mar '05

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 As the change from 1.4 to 1.5 db will be a big step anyway, two additions come to mind: * Language links Should we finally put these into a real table? At least, additionally to keep them in the text? With an interwiki link table up and current, we could then switch to "real" interwiki management at a later stage. * Templates At German wikipedia, we have a "Personendaten" (person data) template. It might be of use to be able to access the data put into it. I don't have an actual application for that, though. None of these items is essential, but IMHO there's an oppurtunity to at least prepare for an improved metadata management. Magnus -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCSD+JCZKBJbEFcz0RAluFAJ9HX3wfsuWhmiZW7px+RxEnZFAvnwCfR6li YjtE6suBRh0Q74URXomtIj0= =CGOC -----END PGP SIGNATURE-----

2 1

create an article with this title ?
by Adam Julius Angel 28 Mar '05

28 Mar '05

Hello, i need some help please .... how to edit the search result page into the same one like the org. wiki search result page (http://en.wikipedia.org/wiki/Special:Search?search=search&go=Go) with google search and especially create an article with this title or ..... ? thank you, adam

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2005