Wikitech-l May 2005

wikitech-l@lists.wikimedia.org

136 participants
194 discussions

by Jeremy Dunck

I'm creating the DB for the first time, and I saw a note here: http://cvs.sourceforge.net/viewcvs.py/wikipedia/phpwiki/newcodebase/docs/sc… "The script "buildtables.sql" always contains the latest database schema for wikipedia software." Where can I find buildtables.sql?

19 years

Invalid XML in the codebase

by Ævar Arnfjörð Bjarmason

I often come across invalid XML in the codebase, such as unquoted attributes or unclosed tags, this is a request my fellow developers to help ensure that our output meets the XHTML spec at all times, one way to do help ensure this is to add: $wgMimeType = 'application/xhtml+xml'; To LocalSettings.php, if you're using a Gecko browser for testing this will activate its strict mode causing it to trow parsing errors if finds invalid XML.

19 years

1.5 upgrade test

by Brion Vibber

I set up a copy of nl.wikipedia.org on my test PC from the public dumps, and ran the updater to upgrade it to 1.5. This is a medium-sized wiki, in Latin-1 encoding. The good news: * It worked -- the updater ran through to completion without exploding. * After setting $wgLegacyEncoding = 'windows-1252', it seems to properly convert article text encoding to UTF-8 on page load. The bad news: * The UTF-8 converstion necessary for the other database fields (titles, usernames, comments etc) hasn't been quite finished yet, so wasn't auto-run. * The updater ran for a few minutes shy of 10 hours. Most of that time was spent shuffling cur entries into the old table, where they eventually become plain old text entries. The pulling of revision data out of old (by now renamed to text) seemed to take a smaller portion of the time, but I foolishly didn't time the individual steps. Most CPU time was spent in I/O wait state in the MySQL server. This machine has IDE disks purchased for size & cost, not speed, has relatively little memory (512M), I haven't attempted to optimize the MySQL configuration for memory usage, and I kept doing things like installing Debian in VMWare in the foreground... ;) It probably ought to go faster on the big Wikimedia servers, but I can't say just how much. There may be ways to further optimize the conversion process; dropping some of the indexes first, for instance, might be an overall win if it makes the importing faster. Even in the ideal case it'll be kinda slow to run these, but it really is necessary... at least the schema change should make future changes less painful. For the final live updates we'll probably want to do them one at a time, keeping all other wikis open for editing, and the in-conversion one open for read-only on a backup. With the way we've got shared document roots this might require some odd configuration shuffling to load up either 1.4 or 1.5 code depending on update state, but I think it should be possible. -- brion vibber (brion @ pobox.com)

19 years

Collation problems

by Jason Lane

Hi list, I'm trying to install MediaWiki -1.4.3 and during installation I'm getting the following collation errors :( Logging table has correct title encoding. Initialising "MediaWiki" namespace... A database error has occurred Query: SELECT cur_title,cur_is_new,cur_user_text FROM `cur` WHERE cur_namespace=8 AND cur_title IN('1movedto2','1movedto2_redir'...'Yourvariant','Zhconversiontable') Function: Error: 1271 Illegal mix of collations for operation ' IN ' (localhost) Backtrace: GlobalFunctions.php line 507 calls wfBacktrace() Database.php line 383 calls wfDebugDieBacktrace() Database.php line 333 calls DatabaseMysql::reportQueryError() InitialiseMessages.inc line 150 calls DatabaseMysql::query() InitialiseMessages.inc line 78 calls initialiseMessagesReal() updaters.inc line 205 calls initialiseMessages() index.php line 539 calls do_all_updates() I've even gone to the measure of creating the `wikidb` manually: mysql> CREATE DATABASE wikidb DEFAULT CHARACTER SET Latin1 COLLATE Latin1_bin; and then running the install script. We are using PHP 5.0.1 and MySQL 4.0.18, any ideas would be greatly appreciated. TIA Jason __________ Jason Lane (Development) ONSPEED "Faster Internet" Direct: +44 (0)20 7952 4035 General: +44 (0)8707 585 859 Fax: +44 (0)870 705 1393 http://www.onspeed.com PLEASE NOTE ONSPEED IS NOT LIABLE FOR ANY DAMAGES, MALFUNCTION, OR LOSS OF DATA, CAUSED AS A RESULT OF FOLLOWING ADVICE ENCLOSED IN THIS EMAIL. ANY CHANGES SHOULD BE CARRIED OUT AT YOUR OWN RISK. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. The opinions expressed in this mail are those of the author and do not necessarily represent the views of the company. If you have received this email in error please notify service(a)onspeed.com.

19 years

Re: VfD needs to be internally wired

by David Gerard

Tim Starling wrote: : VFD puts great strain on the server as it is, because the server is : forced to regularly render the whole page, consisting of a few megabytes : of HTML. If you can break it down into small sections, I think that will : be a win for performance, even if there is less opportunity for caching. Tim, this is being bitterly resisted by VFD habitues on VFD talk. Please stop by and give numbers if needed to convince: http://en.wikipedia.org/wiki/Wikipedia_talk:Votes_for_deletion#Can_we_pleas… Jamesday may also care to weigh in - VFD is composed of metatemplates, which are a known noxious entity. Please let them know that insisting their 1.5 megabyte page be the *default* deletion portal is actually a bad idea. - d.

19 years

RE: Longterm hosting strategy [warning tome]

by Eden Akhavi

> Belnet/Belgium -- 1 rack of space, unlimited bandwidth, they are ready > to go Monday, they can do full hands-on, etc., including replacing > borken hard drives and so on like that. They are excited to move > forward quickly. In this case, we must supply the hardware. We can > either buy hardware (with the German money?) or I can ask someone to buy > it for us (see Big Company X, below). > Amsterdam - a large NGO wants to do a big press announcement when I'm > there in Holland at the end of this month. They are providing a set of > servers which have already been ordered. I do not know the exact > specifications, perhaps someone else can tell me? If we are talking Europe, I think the key here is to consider where the traffic comes from and who has good connectivity to that audience. Belnet is an educational network; so it will be able to provide the best connectivity to the Belgian educational users (approx 500,000 users); and connectivity to other networks via BNIX which is generally used by networks in the Benelux countries. Outside of this it is going to be slower. And I would expect they would want a cap on their external connectivity - they are using Cable & Wireless and Cogent amongst others. The dutch NGO, I cannot comment on but again this seems very Benelux centric; I think it is important to consider where the primary hotspots of traffic are: - UK - Germany - Sweden - Nederlands If you study each NAP across Europe; you will see that the largest in terms of traffic is LINX (London). If you then look at the participants on that NAP you will then see that its not UK centric but a number of US, UK, German, Dutch and Scandinavian networks are connected there. If you equate this back in terms of number of users then this one NAP alone gives you many millions of users; and the majority of tier 1 networks across Europe. Then looking at the German market; most of the traffic/users are on Deutsche Telecom's network, they also have multiple interconnects with LINX. The tier-1 German networks are also well connected internationally. In Sweden you have a different situation; the majority of traffic (general not Wikipedia specific) I have measured seems to remain within the region (SE/NO/DK/FI). This also seems to be how Bredbandsbolaget (main broadband provider) seems to have dimensioned their network. This could also be down to language - the Swedish/Norwegian/Danish language is similar enough for their neighbors content to be of interest to them as well. Statistics ---------- Population Users Germany 82726188 46312662 UK 59889407 35179141 Italy 58608565 28610000 France 60293927 24848009 Spain 43435136 14590180 Netherlands 16316019 10806328 Poland 38133891 10600000 Sweden 9043990 6656716 Belgium 10443012 5100000 Austria 8163782 4630000 Greece 11212468 3800000 Denmark 5411596 3720000 Portugal 10463170 3600000 Czech 10230271 3530000 Finland 5246920 3260000 Hungary 10083477 3050000 Ireland 4027303 2060000 Slovakia 5379455 1820000 Latvia 2306489 936000 Slovenia 1956916 800000 Lithuania 3430836 695000 Estonia 1344840 621000 Cyprus 950947 250000 Luxembourg 455581 170000 Malta 384594 120000 In terms of Penetration: Population Users Sweden 9043990 6656716 Denmark 5411596 3720000 Netherlands 16316019 10806328 Finland 5246920 3260000 UK 59889407 35179141 Austria 8163782 4630000 Germany 82726188 46312662 Ireland 4027303 2060000 Italy 58608565 28610000 Belgium 10443012 5100000 Estonia 1344840 621000 France 60293927 24848009 Slovenia 1956916 800000 Latvia 2306489 936000 Luxembourg 455581 170000 Czech 10230271 3530000 Portugal 10463170 3600000 Greece 11212468 3800000 Slovakia 5379455 1820000 Spain 43435136 14590180 Malta 384594 120000 Hungary 10083477 3050000 Poland 38133891 10600000 Cyprus 950947 250000 Lithuania 3430836 695000 Assuming that the demand of Wikipedia content is the similar in terms of percentage of population (its not but I do not have any figures to comment on that yet), I would strongly consider the following first: UK: LINX and XchangePoint. This will give access to all the tier 1 networks in Europe http://green.linx.net/cgi-bin/peering_matrix2.cgi Sweden: DGIX. This will give access to all the tier 1/2 networks in Scandinavia. http://www.netnod.se/connected.htm If anyone has any details traffic analysis for Europe or would like to set one up, please let me know. //Eden

19 years

Lucene in C#

by Brion Vibber

While the Java-based Lucene search server is giving pretty good performance on search results compiled with GCJ, the index builder performs pretty slowly: 3-10x slower than the same code running under Sun's JDK. Since I've gotten relatively good performance out of the C# version running under Mono, I've gone ahead and imported that into CVS and started bringing it up to date, and plan to use it at least for the indexing. It's in the 'mwsearch' module in CVS, should anyone feel like taking a look. -- brion vibber (brion @ pobox.com)

19 years

Article validation feature in 1.5?

by David Gerard

I thought [[m:Article validation feature]] would be showing up, at least for data gathering, in 1.5. Is it on test.leuksman.com, is there no interface to it unless you know where it is? - d.

19 years

Test wiki

by Brion Vibber

I've got a 1.5 test wiki now up for you kind folks to use and abuse: http://test.leuksman.com/ -- brion vibber (brion @ pobox.com)

19 years

Bot not logging in on test wiki

by Andre Engels

When I try to login my bot to the test wiki at http://test.leuksman.com/, it fails. The bot gets a "400 Bad Request" answer back. Does anybody have an idea what is the problem? Andre Engels

19 years

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2005