Wikitech-l June 2004

wikitech-l@lists.wikimedia.org

111 participants
189 discussions

Ariel as a slow query server
by Tim Starling 22 Jun '04

22 Jun '04

I wrote some code for DB load balancing back in January, but we haven't had a slave server to test it on until now. I'm happy to announce that it's now running. Ariel is being used as a slow query server. Currently, it is handling watchlist queries and miser mode queries (although only with the magic parameter). Selecting which queries to send to Ariel is currently ad-hoc, but something more permanent should find its way into CVS in the next week or two. The obvious application for this is enabling full-text search. However we still need to rebuild the searchindex table. -- Tim Starling

2 1

RE: [Wikitech-l] UTF-8 on English Wikipedia?
by Constans, Camille (C.C.) 22 Jun '04

22 Jun '04

>> >> We'd love to, but we need to either a) take it offline for a >few days or >> b) invent a way to convert the database without data loss or damage >> while keeping it online. >> >> -- brion vibber (brino @ pobox.com) >> > >I suppose most of the time will be taken to convert old. >Shouldn't be possible to convert only cur while either leave >old unconvert >or mark each entrey in old as unconverted/still in iso-8859-1 >and convert >these entries when they are needed or by a very low priority job? >(of course the soft will need to handle the conversion flag >when viewing on old version >of an article, doing a diff, ...) > >Is this doable or still too complex? > It's possible. Just need to change a bit the software :) Just need to add a flag UTF-8 as the soft add a flag gzip. And tell the soft to read as it is. Shaihulud

6 10

UTF-8 conversion: a look at the facts
by Brion Vibber 22 Jun '04

22 Jun '04

I finally got my linux box's big drive cleared off and a backup dump of en imported so I can get ready to run some conversion tests. First, quick statistics from checking for the presence of high characters in the 2004-06-16 dump: 10.4% of cur entries need their page content fixed 1.9% of cur entries need their titles fixed Smaller portions are affected by their comment fields or usernames. [Exact proportion of old entry text can't be checked easily due to compression.] 1.7% of old revisions need their titles fixed. Smaller portions are affected by their comment fields or usernames. 1.8% of watchlist entries need their titles fixed. 0.4% of registered usernames need to be fixed. 0.7% of images need to be renamed 1.4% of images need their upload comments fixed (This is not an exhaustive list of fields needing conversion.) This makes it pretty clear that a 'sparse' conversion that only updates that which needs to be updated should speed things up tremendously over the basic 'dump everything, convert, and load it back in' approach we used on fr. Less than 2% of titles & usernames need to be fixed; this step can be done relatively quickly on all affected tables (cur, old, brokenlinks, categorylinks, watchlist, user, image, oldimage) to provide consistency for queries which must key on *_title or *_user_text and thus can't allow for different places containing different forms of the data. It should be possible as some have suggested to use either heuristics or explicit marking to do run-time conversion of cur_text and old_text, and perhaps cur_comment, old_comment, and similar bits. In this case we'd want to do the conversion at data load time since we need the real encoding for parsing to match up to titles. This would avoid downtime for the conversion of the 10.4% of cur_text material that needs it (45,862 rows), but requires changes to MediaWiki itself that need to be coded and tested. The remaining latin-1 wikis will have rather larger incidences of high chars than English does, but should still benefit from this approach by skipping the bulk text recoding. I'd hoped to have some conversion test results by now but had some false starts with the database setup that used up the weekend. :( I'll try to get the code ready and running in the next few days. -- brion vibber (brion @ pobox.com)

2 1

Concatenation of Image and old English table files
by Claudio V 22 Jun '04

22 Jun '04

Hello, For a couple of days now I have been trying to concatenate the Wikipedia image and English dump files but without any success. I am using Windows XP as an operating system but I also have Knoppix when I want to use Linux environment. For some reason that I do not understand when I do knoppix@ttyp0[hdg1]$ cat 20040609_upload.tar.aa 20040609_upload.tar.ab > test.tar the new file created does not contain any part of the second file (.ab) yet the file size is the addition of both files. The same is true with the English dump files (xaa, xab, xac, xad, xae). It seem only the first file of each split data are valid. Does anybody have any suggestion? Thanks, Claudio ______________________________________________________________________ Post your free ad now! http://personals.yahoo.ca

5 8

Breton Wikipedia
by Makoto & Guillaume 21 Jun '04

21 Jun '04

Hi all, It seems Breton Wikipedia was lost during the big crash (few weeks ago). If I didn't misunderstand, a version may be on a hard disk backup (80Go) but it's would be long and hard to try to retrieve it. My questions are: 1) Does someone have a local backup of Breton Wikipedia? 2) Does someone know how many pages there were on this Wikipedia? If there were only few paged it's perhaps faster to set a new Wikipedia rather than try to retrieve old one. Thanks for help. Aoineko

1 0

update record
by Gerard.Meijssen 21 Jun '04

21 Jun '04

LS, I have been trying for two days to change one record in nl:wiktionary. The update ends with ****** Sorry- we have a problem... The wikimedia web server didn't return any response to your request. To get information on what's going on you can visit #wikipedia. An "offsite" status page is hosted on OpenFacts. Generated Tue, 15 Jun 2004 06:52:10 GMT by wikipedia.org (squid/2.5.STABLE4-20040219) ******* the new content should be ====''[[WikiWoordenboek:Zelfstandig naamwoord|Zelfstandig naamwoord]]''==== the article is [[Sjabloon:-noun-]] Thanks, GerardM7d41c42515401cc--

5 7

Planned hardware expenses for the next 6 month
by Jens Frank 21 Jun '04

21 Jun '04

Anthere asked how much money we need to run the system for the rest of the year, so we took a look at the servers we have and at the growth we have seen over the last years. The results can be seen at http://meta.wikipedia.org/wiki/Hardware_provisional_budget and I invite eveyone to have a look at it and provide comments. Regards, JeLuF

4 4

texvc doesn't work
by Viktor Rosenfeld 21 Jun '04

21 Jun '04

Hi, I'm trying to get math input working in my own wiki. Unfortunately something is messed up with my texvc-installation. I know that latex, dvips, gs and convert are in my apache path. Instead, I think texvc isn't really working here. When I do: # mkdir tmp out # echo | ./texvc tmp out 'x \not\in \Sigma' iso-8859-1 +5e6c5975facb1fd4f0895accca92d451- I found the file tmp/8435_5e6c5975facb1fd4f0895accca92d451.tex in my tmp directory. Poking around with strace -f it appears, that latex cannot find the temporary texfile and consequently texvc gives up. Here's the relevant bits: [pid 8446] access("./tmp/8444_bc30a6e8e7394384dc79e75a32f251f3.tex", R_OK) = -1 ENOENT (No such file or directory) in the thread that executes latex. I double checked, the file does exist. [1] The error message I get is: Parser-Fehler (PNG conversion failed; check for correct installation of latex, dvips, gs, and convert). I'm really stuck here. TIA, Viktor [1] Although the PID is different, never mind that. -- GnuPG-Fingerprint: E292 4D89 A5F1 16EC 2795 35AC 9162 34E8 2331 4340

2 3

Wikimedia servers network proposal
by Ashar Voultoiz 21 Jun '04

21 Jun '04

Hello, I invite you to have a look at my little network proposal on meta: http://meta.wikipedia.org/wiki/Wikimedia_servers_network_proposal -- Ashar Voultoiz

1 0

Update of current DB replication plans
by Jens Frank 21 Jun '04

21 Jun '04

Hi, During the last few days, MySQL database replication has been activated between suda and ariel. suda is master, ariel is slave. To do this, a downtime of suda was needed. If there are writes to a MySQL slave, the replication will break. In that case the replication has to be set up again from scratch. To avoid downtime in that case, we're currently setting up a second slave. This slave is not going to be used for any queries. If the first slave fails, a copy of the second slave can be made without taking down the master. The latest database dump has been generated from ariel, without any problems. During the copying of the data files from suda to ariel, ariel served an old DB copy, without any problems. The second slave will be set up on zwinger and will most likely only replicate once per day in off-peak hours. To have enough disk space available, log files are currently being moved to yongle and yongle will be used to compute access statistics from the squid logs. As soon as the second slave is working, ariel will be used for service. Regards, JeLuF

1 0

← Newer
1
2
3
4
5
6
7
8
...
19
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l June 2004