Wikitech-l February 2002

wikitech-l@lists.wikimedia.org

10 participants
70 discussions

by Magnus Manske

I just committed some changes so each namespace can get its own background color. The colors are stored in wikiTextEn.php as an array. For the sake of our eyesight, please, help me change the colors to something nice but distinguishable! Magnus

22 years, 2 months

Blocking IPs

by Magnus Manske

Dear sirs :) I just committed a change enabling sysops to block IPs for edit. This was a request from Larry. Sysops (and only those!) will have a "Block this IP" link next to each IP on the history page of an article. That seemed like the logical place to put it, since IP blocking will result from some form of editing. NOTE: Currently, there is no way to block a logged-in user that way; not a pressing issue at the time, though. That blocked IP will go to "Wikipedia:Blocked IPs". NOTE: That page should be protected, otherwise a troll could just go there and remove his IP from the list! NOTE: Currently, the only way to set an IP free again is to manually delete the line on "Wikipedia:Blocked IPs". There's a timestamp so we can implement time-based removal later. Even blocked IPs can go to the edit page, but on pressing "Save", blocked IPs will get a message ("Your IP has been blocked..."). This will even work if the troll signs up with a user name after getting the block as an IP. Sneaky, eh? ;) Let's get Larry his sysop rights back ASAP! Magnus

22 years, 2 months

Re: [Intlwiki-l] The Distributed Wikipedia

by Jimmy Wales

(I set the reply-to for wikitech-l(a)nupedia.com, since that's the appropriate place for technical discussions.) Tomasz Wegrzanowski wrote: > Easiest distributed editing architecture: > There is main server and other servers > > Every server handle read request itself. > On all servers 'edit this page' links point to main server. > Main server sends all changes to all subscribed servers, > so they are always up to date. > > This won't require many changes in design, while > may allow reasonable distribution of load. But, there's no problem with load right now, and I stand ready to supply whatever hardware we need here. In this way, we don't have to deal with complex distribution schemes. --Jimbo

22 years, 2 months

Re: [Wikitech-l] New case conversion functions

by Brion Vibber

On mer, 2002-02-20 at 11:33, lcrocker(a)nupedia.com wrote: > No! No! The text stored in the database is _always_ single-byte > ISO-8859-1, no exceptions, even for the foreign wikis. Some of > those ISO-8859-1 characters may spell out HTML entity references > to Unicode characters outside the set, but the database should not > know or care about that. I'm sorry you feel that way, but that is in fact NOT TRUE. Please take a look at the non-English non-ISO-8859-1 wikipedias sometime. Hundreds of pages, with correct charset headers: ISO-8859-2: http://pl.wikipedia.com/ UTF-8 with a custom conversion function for certain character sequences: http://eo.wikipedia.com/ Stubs: CP-1251 http://ru.wikipedia.com/ Shift-JIS http://ja.wikipedia.com/ GB-2312 with a few character references thrown in: http://zh.wikipedia.com/ Not sure which encodings, but certainly not ISO-8859-1: http://ar.wikipedia.com/ http://he.wikipedia.com/ Now, if you honestly think that people are going to edit text that consists *entirely* of HTML character entity references, you're obviously not concerned about anything like "ease of use". On top of which, the consensus seems to be to not allow &s (and thus character entities) into page titles, which would effectively require all page titles to be in ASCIIized roman characters. Can you imagine this being acceptable on, say, the Chinese wiki if anyone actually used it? Gee, maybe someone *would* use it if they could use an appropriate character set for their language! > This policy might have to be changed for the Asian wikis if something > like shift-JIS is universal enough and dealing with HTML entities > problematic enough to make working with it difficult, The mind boggles that you might imagine the situation to be otherwise. > but in that > case we'll still standardize on one and only one internal character > representation for that particular wiki. For all others, that > internal representation (and also the encoding which is served via > HTTP) is ISO-8859-1. Bullshit. Ask the Poles if they'd like to convert their wikipedia to ISO-8859-1 with HTML character entities. > If you need to "uppercase" words in titles (as our consensus on > canonization of titles specifies), go ahead and hard-code the > function to deal with ISO-8859-1. Gee, that would be great if such a function would do anything at all for anything other than ISO-8859-1 characters. But, somehow I can't quite see a function hardcoded to deal with ISO-8859-1 being the slightest bit useful for anything else. -- brion vibber (brion @ pobox.com) > You Wrote: > >I've noticed that the traditional locale-based case conversion > functions > >(ucfirst(), strtolower(), etc) aren't too reliable for anything but > >English. Even when they do work, it's very dependant on the system > >configuaration, and thus isn't really transparently portable. > > > >So, I've added new case conversion functions ucfirstIntl(), > >strtoupperIntl(), and strtolowerIntl() which can more or less > properly > >convert cases in a system-independent manner. For single-byte > character > >encodings this is very simple, based on the PHP strtr() function; > just > >define strings $wikiUpperChars containing all the uppercase > characters > >and $wikiLowerChars containing all the lowercase chars. (See example > for > >iso-8859-1 in wikiTextEn.php) > > > >For multibyte character sets it's a little more complex, using the > same > >function in an array mode that associates byte sequences. Most > multibyte > >character sets are for Asian languages which don't have a case > >distinction, so it's not likely to come up often except for those > using > >UTF-8. I've included conversion arrays for UTF-8 in utf8Case.php > which > >should cover just about everything, so any future 'pedias that may > use > >UTF-8 need just include that (as does wikiTextEo.php). > > > >Also, it should be possible to extend ucfirstIntl() a bit to allow > for > >multiple-character first letter sequences (for instance treating ij- > >IJ > >as one letter, which I believe is the officially correct behavior for > >Dutch). > > > >-- brion vibber (brion @ pobox.com) > > > >_______________________________________________ > >Wikitech-l mailing list > >Wikitech-l(a)ross.bomis.com > >http://ross.bomis.com/mailman/listinfo/wikitech-l > >0

22 years, 2 months

New case conversion functions

by Brion Vibber

I've noticed that the traditional locale-based case conversion functions (ucfirst(), strtolower(), etc) aren't too reliable for anything but English. Even when they do work, it's very dependant on the system configuaration, and thus isn't really transparently portable. So, I've added new case conversion functions ucfirstIntl(), strtoupperIntl(), and strtolowerIntl() which can more or less properly convert cases in a system-independent manner. For single-byte character encodings this is very simple, based on the PHP strtr() function; just define strings $wikiUpperChars containing all the uppercase characters and $wikiLowerChars containing all the lowercase chars. (See example for iso-8859-1 in wikiTextEn.php) For multibyte character sets it's a little more complex, using the same function in an array mode that associates byte sequences. Most multibyte character sets are for Asian languages which don't have a case distinction, so it's not likely to come up often except for those using UTF-8. I've included conversion arrays for UTF-8 in utf8Case.php which should cover just about everything, so any future 'pedias that may use UTF-8 need just include that (as does wikiTextEo.php). Also, it should be possible to extend ucfirstIntl() a bit to allow for multiple-character first letter sequences (for instance treating ij->IJ as one letter, which I believe is the officially correct behavior for Dutch). -- brion vibber (brion @ pobox.com)

22 years, 2 months

update: the new MostWanted page has arrived

by Jan Hidders

Gentlemen, I've just committed to CVS the new implementation of the MostWanted page based on the new tables 'linked' and 'unlinked'. I've left caching out but it still takes quite a while to compute. If this turns out to be a problem I will program it back in. -- Jan Hidders

22 years, 2 months

Re: [Intlwiki-l] Moving intl wikis to the PHP script?

by Brion Vibber

On mar, 2002-02-19 at 16:26, Jimmy Wales wrote: > Brion Vibber wrote: > > Jimbo, is there any chance we can move the Esperanto wikipedia over to > > the PHP script soon? I've been promising people we'd be upgrading to the > > new software (which will fix a number of annoying bugs in the old) for a > > while, and the natives are getting restless. :) > > Yes! > > > I'm going to check in a couple more character set and case-conversion > > fixes tonight, after which we should be ready anytime. At this point any > > additional problems are only going to be discovered by having real users > > bang at the real site with real non-English non-ISO-8859-1 text... > > How about this -- tomorrow morning, I will install this. Will you be around > (in email) tomorrow for questions? Sounds great! If I remember correctly, we are both on Pacific time (UTC-8), yes? > Best thing to do -- send me simple step by step instructions, > including instructions about the conversion script. I'll back > everything up, run the conversion, install the new software, and > it'll work perfectly the first try! (ha ha!) It's relatively simple (famous last words). For everybody's reference: 1. Edit convertWiki2SQL.php to set some options there. Right now it's a little rough, eventually it may or may not get smoother. Basically, uncomment the special settings for the target language, and set the $rootDir variable to point to the "page" subdirectory of the usemod db that the data is being sucked from. 2. Run "php convertWiki2SQL.php" as a user with write permission to the wiki directory. This should spit out a bunch of article titles and create a big file called "newiki.sql" which contains the SQL commands to insert everything into the database. Note that it's normal to see a few errors about not being able to open a directory -- that just means there's a letter of the alphabet that no article titles start with. 3. Create the database. I've been calling my test database "wikieo", but whatever sounds appropriate should work just as well. Something like: mysql -e "create database wikieo;" should do it if I recall correctly. You might also have to specify the proper username and password, I don't know how you guys have mysql set up. 4. Initialize the tables and enter the data. Something like: mysql wikieo < wikipedia.sql mysql wikieo < newiki.sql (After this step "newiki.sql" shouldn't be necessary anymore.) I'm also going to put together a file with SQL commands to fix some articles with extra uppercase letters in the titles which you can run here: mysql wikieo < titlefix.sql 5. Edit wikiLocalSettings.php to set the language and database name. Roughly: $wikiLanguage = "eo" ; $wikiSQLServer = "wikieo" ; and if the hostname isn't being automatically picked up: $wikiCurrentServer = "http://eo.wikipedia.com" ; In theory everything should automagically work after that... (Assuming of course that the apache rewrite rules are set up properly, etc.) > If it isn't working perfectly right out of the box, then I'll back out > the change, revert to the Usemod script, and we'll do a dry run in a > "safer" way with test.wikipedia.com or whatever. Great, I'll warn the others. :) -- brion vibber (brion @ pobox.com)

22 years, 2 months

Moving day

by Jimmy Wales

Brion Vibber wrote: > Jimbo, is there any chance we can move the Esperanto wikipedia over to > the PHP script soon? I've been promising people we'd be upgrading to the > new software (which will fix a number of annoying bugs in the old) for a > while, and the natives are getting restless. :) Yes! > I'm going to check in a couple more character set and case-conversion > fixes tonight, after which we should be ready anytime. At this point any > additional problems are only going to be discovered by having real users > bang at the real site with real non-English non-ISO-8859-1 text... How about this -- tomorrow morning, I will install this. Will you be around (in email) tomorrow for questions? Best thing to do -- send me simple step by step instructions, including instructions about the conversion script. I'll back everything up, run the conversion, install the new software, and it'll work perfectly the first try! (ha ha!) If it isn't working perfectly right out of the box, then I'll back out the change, revert to the Usemod script, and we'll do a dry run in a "safer" way with test.wikipedia.com or whatever. > There are a few switches at the top of convertWiki2SQL.php for selecting > language-specific processing options, and wikiLocalSettings.php needs to > select the proper $wikiLanguage and $wikiSQLServer, but that's about it. > Other than those two, the PHP source files can be shared 100% between > various language wikipedias. (I've already included the Esperanto > message-localization file in the CVS repository, and I assume others > will be added as they are converted & translated.) Cool! I don't know anything about the conversion script, though. Jason did it for the main site, and he's out of town. Oh, wait, he'll be back tomorrow. But still, if you have information, let me know. :-) ----- End forwarded message -----

22 years, 2 months

Moving intl wikis to the PHP script?

by Brion Vibber

Jimbo, is there any chance we can move the Esperanto wikipedia over to the PHP script soon? I've been promising people we'd be upgrading to the new software (which will fix a number of annoying bugs in the old) for a while, and the natives are getting restless. :) I'm going to check in a couple more character set and case-conversion fixes tonight, after which we should be ready anytime. At this point any additional problems are only going to be discovered by having real users bang at the real site with real non-English non-ISO-8859-1 text... There are a few switches at the top of convertWiki2SQL.php for selecting language-specific processing options, and wikiLocalSettings.php needs to select the proper $wikiLanguage and $wikiSQLServer, but that's about it. Other than those two, the PHP source files can be shared 100% between various language wikipedias. (I've already included the Esperanto message-localization file in the CVS repository, and I assume others will be added as they are converted & translated.) -- brion vibber (brion @ pobox.com)

22 years, 2 months

update: two new extra tables with linking information

by Jan Hidders

Dear fellow programmers, I have extended the database schema with two new tables: 'linked' and 'unlinked'. As usual the SQL for this addition can be found in updSchema.sql. However, in this case the contents of the tables cannot be generated by SQL alone, so there is an extra upLinks.php script in PHP that contains the PHP code to do so. Read this file for further instructions. The intention of these tables is to replace the 'cur_linked_links' and 'cur_unlinked_links' columns in the cur table. This will make it possible to give the special pages that query linking information reasonable response times, so they won't have to be cached. Right now I've only added the code to keep these tables up-to-date and they are not used yet. From this moment on the usage of the 'cur_linked_links' and 'cur_unlinked_links' columns is depreciated, and I will remove them as soon as all the code that uses them has been replaced by code that uses the new tables. -- Jan Hidders

22 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2002