Wikitech-l September 2005

wikitech-l@lists.wikimedia.org

126 participants
160 discussions

by Tim Starling

Just a short note for developers: I've delayed the initialisation of $wgIP, so you can no longer use that variable, except for debugging. Instead, use wfGetIP(). It's cached, so feel free to call it regularly. See the commit notice for more details: http://mail.wikipedia.org/pipermail/mediawiki-cvs/2005-September/011072.html -- Tim Starling

18 years, 8 months

A small Ultimate Wiktionary demo

by Erik Moeller

A while ago, Gerard posted this on Meta: http://meta.wikimedia.org/wiki/Using_Ultimate_Wiktionary_for_Commons It was a short explanation how UW could be used to internationalize categories on the Wikimedia Commons. I've now hacked together a small mock-up that demonstrates (hopefully) more clearly how this could work in practice: http://epov.org/uwd/index.php?title=Tag:Dog&action=edit (Further demos will be posted on http://epov.org/uwd/ in the coming weeks and months.) It should work in Firefox and IE. The only active component are the radio buttons you can click. Essentially, what this shows is: 1) A new tag for images of dogs is created. (In this demo, I call categories "tags", because I hope this will be what they are eventually called.) 2) The user can choose from the languages they speak to clarify which language this tag name is written in. 3) Based on the tag name and language, a lookup on UW is performed, which fetches all the associated meanings for this word. 4) The user selects one of these meanings. 5) Automagically, another lookup is performed to determine the available translations, if any. After saving the tag, it is then instantly available under these names in the other languages. In the demo, the first two meanings have translations available, while the other two do not. Why is this so powerful? Because, if UW itself is successful and contains many words, it almost instantly makes the entire media repository on Commons available to speakers of all languages. (Now, hopefully, you can see why we've been excited about getting millions of translations for free from the Logos project.) No need to create many different tags - just select the right meaning. Furthermore, it builds bridges from other projects to UW. The language work we are constantly doing will no longer be redundant, but focused on one place. A 14-year-old Italian kid can then use the tag "cane" to look for photos of dogs, while a Maori girl from New Zealand can use "kurii". Moreover, the same category hierarchy can be used to browse in different languages (based on user perferences, a fallback hierarchy would be queried to determine the language that should be used should no translation be available). We could also automatically make use of synonyms, plurals and inflections (though this requires further changes to the category code beyond internationalization). Given that we are mapping one of multiple meanings to a single tag, there will be tag collisions -- those will have to be dealt with through disambiguation. But this is not important: Try to see the tag name merely as a key to a meaning. What this key is called is secondary. The key principle of selecting a meaning and then performing automatic translations can be used in many different contexts. For example, in Wikidata, one could use the same principle to internationalize field names such as "Country", "Flag" and "Population". This application also shows that UW must contain everything from words to names to phrases. There is no limit to the scope of it. This makes it a potentially massively useful tool for both human and machine translation. The category internationalization functionality will not be part of the first release of Ultimate Wiktionary, but we believe we can get funding to work on this later. I believe that UW, in combination with better tagging features in general, could make our tagging system the most advanced one available. Flickr, for example, has no localization, is unlikely to ever get semi-automatic localization, and apparently supports no synonyms either. See the demo footnotes for further explanations. Feedback is welcome. (I'll be away until Wednesday.) Best, Erik

18 years, 8 months

MURCIAN WIKIPEDIA: Wich is the next step.

by Miquel Turra

User Assarbe (Ain_xaitan(a)yahoo.es) says: I have left a proposal for a new wikipedia in murcian (a language spoken by more than 300.000 persons) in the south-east of Spain, between the castilian and catalan) two weeks ago.I have already got the support of more than 6 persons. I am sure more people will be interested. Well, now I want to know wich is the next step to do. Can I do thing something more at the moment? Please, help me :) Thanks --------------------------------- Correo Yahoo! Comprueba qué es nuevo, aquí http://correo.yahoo.es

18 years, 8 months

Re: [Foundation-l] Re: We made it !!!!!!

by Neil Harris

Paweł Dembowski wrote: >>by the way, the frame issue was several times discussed on irc. >>It seems most browsers do a total redirect. Only a couple of editors >>reported the framing issue. >>I do not know if *we* can do something on this. >>Ant >> >> > >I get a frame both in IE and in Firefox. > > > Could this be because the frame-breaking code in http://fr.wikipedia.org/skins-1.5/common/wikibits.js is executed in a <head> context, rather than a <body> context? http://www.thesitewizard.com/archive/framebreak.shtml specifically states that the frame-breaker code must be executed in the <body>, not the <head>. -- Neil

18 years, 8 months

Benchmarking of PHP 4.4.0 on gcc 4.0.1

by Tim Starling

I did some benchmarking of PHP compiled with gcc 4.0.1. Results and methodology are described at: http://wp.wikidev.net/GCC_benchmarking I eventually settled on -O3 with profile-guided optimisation. This gave a 5-10% improvement over the old PHP 4.3.11 build on all the benchmarks except preg_replace. The preg_replace problem is probably because the benchmark was atypical compared to the profiling data, which was gathered using refreshLinks.php. -- Tim Starling

18 years, 8 months

Re: [Wikitech-l] Grammar for mediawiki markup?

by Edward Z. Yang

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I'm afraid you'll have to work off the code (which isn't always 'right') Never underestimate the convolutedness of parser.php. If I were you, I'd start simple (like grabbing all wikilinks off a page) and experiment. Handling wikitext is like a black magic at times. I've done some elementary classes for parsing pages, if you're interested. - -- Edward Z. Yang Personal: edwardzyang(a)thewritingpot.com SN:Ambush Commander Website: http://www.thewritingpot.com/ GPGKey:0x869C48DA http://www.thewritingpot.com/gpgpubkey.asc 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (MingW32) iD8DBQFDGcbpqTO+fYacSNoRAih4AJ4zA/5S26KB16XsF5PvN6G74meX0gCfa2rI oWmvA1QM1hRX8qsJAAgvpU4= =yua+ -----END PGP SIGNATURE-----

18 years, 8 months

Dump progress

by Brion Vibber

There's been some questions recently about public backup dumps (or the lack thereof). I've been working for the last few days on getting the dump generation infrastructure up and running in a more consistent, effective fashion. Here's what's currently on my plate and the status thereof: * Title corrections: some of the databases contain invalid page titles left over from old bugs. This can sometimes break the import or export process, so I'm writing a fixup script to find and rename them. STATUS: Finding done, fixing to come. Should be done with this later today. * Dump filtering/processing: currently the dump has to run twice to produce the current-only and all-revisions dump files. I'm working on a postprocessing tool which will be able to split these two from a single runthrough, as well as produce a filtered dump with the talk and user pages removed. Producing the split versions from one run should also mean that the dump can run without having replication stopped the whole time. It can also produce SQL for importing a dump directly into a database in either 1.4 or 1.5 schema, for those using software based on the old database layout. (We probably won't be hosting such files on our server but you can run the program locally to filter XML-to-MySQL.) STATUS: Mostly done. Some more testing and actually hooking up multiple simultaneous outputs remains. Should be done tonight or tomorrow. * Progress and error reporting: The old backup script was a hacky shell script with no error detection or recovery, requireing manually stopping replication on a database server and reconfiguring the wiki cluster for the duration. If something went awry, maybe nobody noticed... the hackiness of this is a large part of why we've never just let it run automatically on a cronjob. I want to rework this for better automation and to provide useful indications of what it's doing, where it's up to, and if something went wrong. STATUS: Not yet started. Hope to have done tomorrow or Friday. * Clean up download.wikimedia.org further, make use of status files left by the updated backup runner script. STATUS: Not yet started. (Doesn't have to be up before the backup starts.) -- brion vibber (brion @ pobox.com)

18 years, 8 months

Spell checking in MediaWiki

by Jeffrey McGee

Howdy, I'd like to add a spell checker to MediaWiki using the pspell library. (Pspell is part of php and it uses libaspell.) It doesn't help that I'm new to the MediaWiki code base and PHP isn't exactly my favorite language. (I wouldn't even call it my _third_ favorite.) Anyway, I'd like to get a little feedback and advice on where to go from here. I know a few people have proposed working on spell check before: http://mail.wikimedia.org/pipermail/wikitech-l/2004-March/021358.html But, as best I can tell no one has gone anywhere. Does anyone know what happened to User:Archivist's spellchecker? Right now I have a proof-of-concept running on my computer. You can see it at http://66.205.125.240/spell/index.php/Special:Spellcheck/Main_Page It is a SpecialPage that reads the article from the database, spell checks it, lets the user choose the words from the drop-down box, and then sends a FauxRequest to EditPage. Eventually I'd like to add it to EditPage, but I started out with a special page so that I did not have to deal with the complexity of EditPage. Here's how I'd like the final version to work: # There's a button at the bottom of an EditPage beside 'Show Perview' and 'Show Changes' labeled 'Spell Check'. # When the user clicks 'Spell Check', they get a preview of their edit where misspelled words are replaced with drop-down boxes. # The user changes the words they think are mispelled to one of the suggestions or leaves it as is. When they click 'Show Preview', they go back to the preview page. A few questions: Do I not need to deal with multi-byte character functions like mb_substr since all the languages use utf-8? Should the user spell check a preview or the wikitext? If a word is misspelled in several places, should the user be asked once for the word or should the user be asked everytime the word appears? Thanks, Jeff McGee

18 years, 8 months

Re: [MediaWiki-CVS] phase3/includes Image.php, 1.116, 1.117

by Brion Vibber

Ævar Arnfjörð Bjarmason wrote: > Modified Files: > Image.php > Log Message: > * Reverting back to 1.115, not inserting {{ and }} automatically means that the > license selector can be used to insert arbitary text, not just templates, > this doesn't break it either since you just have to change the entries in > MediaWiki:Licenses from e.g: > * GFDL|GNU Free Documentation License > to: > * {{GFDL}}|GNU Free Documentation License > to get the same functionality as before No you can't, because brace replacement happens at wfMsg() time, which is before the text is picked apart and added to a list. -- brion vibber (brion @ pobox.com)

18 years, 8 months

Parsing new XML dumps into the db?

by John Grohol

Hi, excuse my ignorance... But after searching for information on this for the past 2 hours, I can't seem to find any FAQ or simple instruction set on what to do with with the new XML dumps provided at http://download.wikimedia.org/ The sole link provided on the page mentions nothing of the new XML format, or what to do with it. Searching through this mailing list hasn't shed much more light. Any help would be very much appreciated (even a pointer to a page that explains how to import these files into a 1.3.x or 1.4.xsystem...)! Thanks, John

18 years, 8 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2005