I wrote some code for DB load balancing back in January, but we haven't
had a slave server to test it on until now. I'm happy to announce that
it's now running.
Ariel is being used as a slow query server. Currently, it is handling
watchlist queries and miser mode queries (although only with the magic
parameter). Selecting which queries to send to Ariel is currently
ad-hoc, but something more permanent should find its way into CVS in the
next week or two.
The obvious application for this is enabling full-text search. However
we still need to rebuild the searchindex table.
-- Tim Starling
>>
>> We'd love to, but we need to either a) take it offline for a
>few days or
>> b) invent a way to convert the database without data loss or damage
>> while keeping it online.
>>
>> -- brion vibber (brino @ pobox.com)
>>
>
>I suppose most of the time will be taken to convert old.
>Shouldn't be possible to convert only cur while either leave
>old unconvert
>or mark each entrey in old as unconverted/still in iso-8859-1
>and convert
>these entries when they are needed or by a very low priority job?
>(of course the soft will need to handle the conversion flag
>when viewing on old version
>of an article, doing a diff, ...)
>
>Is this doable or still too complex?
>
It's possible. Just need to change a bit the software :) Just need to add a flag UTF-8 as the soft add a flag gzip. And tell the soft to read as it is.
Shaihulud
I finally got my linux box's big drive cleared off and a backup dump of
en imported so I can get ready to run some conversion tests. First,
quick statistics from checking for the presence of high characters in
the 2004-06-16 dump:
10.4% of cur entries need their page content fixed
1.9% of cur entries need their titles fixed
Smaller portions are affected by their comment fields or usernames.
[Exact proportion of old entry text can't be checked easily due to
compression.]
1.7% of old revisions need their titles fixed.
Smaller portions are affected by their comment fields or usernames.
1.8% of watchlist entries need their titles fixed.
0.4% of registered usernames need to be fixed.
0.7% of images need to be renamed
1.4% of images need their upload comments fixed
(This is not an exhaustive list of fields needing conversion.)
This makes it pretty clear that a 'sparse' conversion that only updates
that which needs to be updated should speed things up tremendously over
the basic 'dump everything, convert, and load it back in' approach we
used on fr.
Less than 2% of titles & usernames need to be fixed; this step can be
done relatively quickly on all affected tables (cur, old, brokenlinks,
categorylinks, watchlist, user, image, oldimage) to provide consistency
for queries which must key on *_title or *_user_text and thus can't
allow for different places containing different forms of the data.
It should be possible as some have suggested to use either heuristics or
explicit marking to do run-time conversion of cur_text and old_text, and
perhaps cur_comment, old_comment, and similar bits. In this case we'd
want to do the conversion at data load time since we need the real
encoding for parsing to match up to titles. This would avoid downtime
for the conversion of the 10.4% of cur_text material that needs it
(45,862 rows), but requires changes to MediaWiki itself that need to be
coded and tested.
The remaining latin-1 wikis will have rather larger incidences of high
chars than English does, but should still benefit from this approach by
skipping the bulk text recoding.
I'd hoped to have some conversion test results by now but had some false
starts with the database setup that used up the weekend. :( I'll try to
get the code ready and running in the next few days.
-- brion vibber (brion @ pobox.com)
Hello,
For a couple of days now I have been trying to
concatenate the Wikipedia image and English dump
files but without any success. I am using Windows XP
as an operating system but I also have Knoppix when I
want to use Linux environment.
For some reason that I do not understand when I do
knoppix@ttyp0[hdg1]$ cat 20040609_upload.tar.aa
20040609_upload.tar.ab > test.tar
the new file created does not contain any part of the
second file (.ab) yet the file size is the addition of
both files.
The same is true with the English dump files (xaa,
xab, xac, xad, xae).
It seem only the first file of each split data are
valid.
Does anybody have any suggestion?
Thanks,
Claudio
______________________________________________________________________
Post your free ad now! http://personals.yahoo.ca
Hi all,
It seems Breton Wikipedia was lost during the big crash (few weeks ago).
If I didn't misunderstand, a version may be on a hard disk backup (80Go) but
it's would be long and hard to try to retrieve it.
My questions are:
1) Does someone have a local backup of Breton Wikipedia?
2) Does someone know how many pages there were on this Wikipedia?
If there were only few paged it's perhaps faster to set a new Wikipedia
rather than try to retrieve old one.
Thanks for help.
Aoineko
LS, I have been trying for two days to change one record in nl:wiktionary. The update ends with
******
Sorry- we have a problem...
The wikimedia web server didn't return any response to your request.
To get information on what's going on you can visit #wikipedia.
An "offsite" status page is hosted on OpenFacts.
Generated Tue, 15 Jun 2004 06:52:10 GMT by wikipedia.org (squid/2.5.STABLE4-20040219)
*******
the new content should be
====''[[WikiWoordenboek:Zelfstandig naamwoord|Zelfstandig naamwoord]]''====
the article is [[Sjabloon:-noun-]]
Thanks,
GerardM7d41c42515401cc--
Anthere asked how much money we need to run the system
for the rest of the year, so we took a look at the
servers we have and at the growth we have seen over the
last years.
The results can be seen at
http://meta.wikipedia.org/wiki/Hardware_provisional_budget
and I invite eveyone to have a look at it and provide comments.
Regards,
JeLuF
Hi,
I'm trying to get math input working in my own wiki. Unfortunately
something is messed up with my texvc-installation. I know that latex,
dvips, gs and convert are in my apache path. Instead, I think texvc
isn't really working here. When I do:
# mkdir tmp out
# echo | ./texvc tmp out 'x \not\in \Sigma' iso-8859-1
+5e6c5975facb1fd4f0895accca92d451-
I found the file tmp/8435_5e6c5975facb1fd4f0895accca92d451.tex in my tmp
directory. Poking around with strace -f it appears, that latex cannot
find the temporary texfile and consequently texvc gives up. Here's the
relevant bits:
[pid 8446] access("./tmp/8444_bc30a6e8e7394384dc79e75a32f251f3.tex", R_OK) = -1
ENOENT (No such file or directory)
in the thread that executes latex. I double checked, the file does
exist. [1]
The error message I get is: Parser-Fehler (PNG conversion failed; check
for correct installation of latex, dvips, gs, and convert).
I'm really stuck here.
TIA,
Viktor
[1] Although the PID is different, never mind that.
--
GnuPG-Fingerprint: E292 4D89 A5F1 16EC 2795 35AC 9162 34E8 2331 4340
Hi,
During the last few days, MySQL database replication has
been activated between suda and ariel. suda is master,
ariel is slave. To do this, a downtime of suda was needed.
If there are writes to a MySQL slave, the replication will
break. In that case the replication has to be set up again
from scratch.
To avoid downtime in that case, we're currently setting up
a second slave. This slave is not going to be used for
any queries. If the first slave fails, a copy of the
second slave can be made without taking down the master.
The latest database dump has been generated from ariel,
without any problems. During the copying of the data files
from suda to ariel, ariel served an old DB copy, without
any problems.
The second slave will be set up on zwinger and will most
likely only replicate once per day in off-peak hours.
To have enough disk space available, log files are
currently being moved to yongle and yongle will be
used to compute access statistics from the squid logs.
As soon as the second slave is working, ariel will be
used for service.
Regards,
JeLuF