Hi,
Is there a way to run fast queries (perhaps using
indexes) using the "old" table in my replication of
the English Wikipedia database (MySQL 4.0.20)? When
I've tried queries that include the field "old_title",
the simplest queries run for up to an hour on my AMD
Athlon XP 2400+ and 512 MB RAM on Windows XP
Professional. The more complex ones I'm trying to run
take dozens of hours--one query ran for five days,
before I finally quit it.
I've configured MySQL using the configuration options
that I list at the botton of this e-mail, but things
are still unbearably slow. An example of the type of
query I've been trying to run would be:
select distinct old_user, old_user_text
from old
where old_user > 0
and old_title like 'History_of_%';
I'm surprised that this sort of query runs so slowly,
considering that when on the live Wikipedia, you click
on "User Contributions" and get a list of responses
with lightning speed. Is there a faster way to search
on all the historic changes to Wikipedia that I'm
overlooking? I'd really appreciate any help anyone
could give me.
Thanks,
Claudio
[mysqld]
basedir=D:/www/mysql
#bind-address=192.168.2.22
datadir=D:/www/mysql/data
#language=D:/www/mysql/share/your language directory
#slow query log#=
#tmpdir#=
#port=3306
set-variable=key_buffer=128M
set-variable= max_allowed_packet=1M
set-variable=table_cache=256
set-variable=sort_buffer_size=4M
set-variable=read_buffer_size=4M
set-variable=myisam_sort_buffer_size=64M
set-variable = thread_cache=8
set-variable = thread_concurrency=2
set-variable =tmp_table_size=64M
set-variable=innodb_buffer_pool_size=140M
set-variable=innodb_additional_mem_pool_size=10M
[mysqldump]
quick
set-variable = max_allowed_packet=16M
[isamchk]
set-variable = key_buffer=128M
set-variable = sort_buffer=128M
set-variable = read_buffer=2M
set-variable = write_buffer=2M
[myisamchk]
set-variable = key_buffer=128M
set-variable = sort_buffer=128M
set-variable = read_buffer=2M
set-variable = write_buffer=2M
______________________________________________________________________
Post your free ad now! http://personals.yahoo.ca
I added a generic message called 'sitenotice' to Language.php that
replaces fundraising_notice and shows up in Special:Allmessages. It's
parsed as normal wiki syntax now. The code that does this is a live
tweak in Setup.php on Zwinger currently, imo it would make sense to
replace $wgSiteNotice with it, with a default of '-' = switched off.
Opinions?
--
Gabriel Wicke
Hi list!
Here is some stuff about the new servers.
I installed the three 1U Celeron 600 servers with Debian testing, after a
short talk on #mediawiki about what to install. The configuration is the
following :
- 20Gb HD divided in 3 parts : 100Mb for /boot (ext2), 2Gb for swap and the
rest for / (xfs)
- minimum install. I just added ntp to keep the time and ssh to allow
developers to set up the squid remotely
- vi and emacs both installed to avoid trolls ;)
- net access through eth1 using dhcp and my home computer connection. I
reserve eth0 for the colo
- timezone is GMT+0 and locale is US
The first thing needed to be done is to give the developers access to the
machine to set up the squid. So i need to know who needs access, and who
needs root access. For everyone i'll create an account. These people will
have to send me their public ssh key that i will put in
~/.ssh/authorized_keys . For the people with root access, i'll send the
password of each server (preferably different passwords or the same
password? Who said paranoid? ;) ) through a secure channel (anyone knows a
« GPG for the dummies? » for me?). The access to the server is made by dnat
through my computer. If some more ports need to be open to test the squids,
i can do that in 10 seconds upon request. My connection is 5.5Mbps/384kbps,
i hope it is sufficient. I also can't sleep with the noise of 4 computers
near my bed so i turn them off during the night (from 2h00 to 10h00, Paris
time, very approximatively), but during the remaining of the day all
computers can be turned on with no problem.
If things go well, i think it would be a good idea to buy quite fast some
more RAM before I bring the computers to the colo. Right now, they only
have 128Mb each.
About the 6-Xeon, it is laying in my room now, waiting for someone to pick
it up. I was told it runs Mandrake 10 (installed by Mandrakesoft to make
the tests) and installation of a different OS can apparently only be done
by pxe.
If i am not clear in what i said, just ask, i'm often on irc (especially in
the evening, from 22h00) #fr.wikipedia and #mediawiki . I'm also ready to
do things differently if necessary.
Med
This patch (made with cvs -Q diff -u LanguageIs.php) fixes a bug in
LanguageIs.php which causes the Icelandic wikipedia not to be
displayed in MonoBook by default unless it is set as a user option,
that is, is.wikipedia.org is now by default in the old skin. Not
pretty.
> Date: Wed, 28 Jul 2004 01:04:27 +0200
> From: Mark Bergsma <mark(a)nedworks.org>
> Subject: [Wikitech-l] Geographic DNS for wikipedia
> To: wikitech-l(a)wikimedia.org
> Message-ID: <4106DF7B.6030404(a)nedworks.org>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Hi all,
>
> Following up on the geographic DNS discussion on this list, triggered by
> the upcoming Squid servers in France, I proposed a solution to the
> developers/sysadmins on #mediawiki.
>
> About six months ago, I implemented such a system for our IRC network
> Blitzed[1]. We have several servers distributed over the globe, and we
> thought it would be nice to send our users to a physically close server.
> That's not unlike the situation with wikipedia visitors and the squids
> in the US and France.
I remember reading about a similar system of the website of the FIFA 2002 in
Korea/Japan. The problem there was that a system like this doesn't take
advantage of the time difference in different areas of the world and
therefore during the day in Asia the servers there were packed while the
ones in Europe were idling around because everybdy was asleep there.
I don't know how it works for you but I see this as a potential problem.
Tobias Hesse
Upon reading other people comments, I think that we
might have a dual solution for developers reward,
which could be at the same time
* just a thank you note
* a mean to help them get a job, or just more traffic
on a site they like
* potentially, some money
As suggested in meta, first we could set a page on the
wikimediafoundation website.
This page would be a list of all developers involved
in wikimedia technical matters (if they want to be
there *of course*).
Each developer name could be along with a short bio
(or link to a personal website), and description of
their most relevant activity in wikimedia as
developers (everyday hardware maintenance, bug
correction, mailing list handling, performance issues,
software development, liaison with board etc...).
Each developer could put here a proeminant link for
donations (paypal system).
=> This would help for developers development
recognition (as many development activities are just
invisible to most editors), and give them opportunity
to advertise this for ousiders. They might also get
some donations from editors who appreciate their
Second, once a year, we could have a couple of special
awards for developers. I was thinking of for example :
* award for hardware maintenance or improvement
* award for software development
* special award to thank a special dedication to some
features recommanded by the board.
These awards could be heavily advertised and of
course, proeminently announced on the
wikimediafoundation website. They might also result in
a little gift (such as money or hardware for example).
Hardware gifts might be gifts offered to the
foundation by hardware corporations.
It might be nice that this is the opportunity to thank
people who put a lot of time in some activities, which
are not very visible.
What do you think ???
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail
Hi all,
Following up on the geographic DNS discussion on this list, triggered by
the upcoming Squid servers in France, I proposed a solution to the
developers/sysadmins on #mediawiki.
About six months ago, I implemented such a system for our IRC network
Blitzed[1]. We have several servers distributed over the globe, and we
thought it would be nice to send our users to a physically close server.
That's not unlike the situation with wikipedia visitors and the squids
in the US and France.
As I am more or less a PowerDNS[2] developer, I implemented a powerdns
backend for it, called Geobackend. It entered the powerdns source with
the latest release, and therefore is present in most major
distributions, including Debian.
Geobackend parses a rbldnsd format zonefile from countries.nerd.dk (a
well known and reliable DNSBL that maps IPs to their corresponding
countries) into an efficient memory data structure, and uses that to
direct incoming DNS requests to specific A records through CNAMEs, which
can be configured by a so called "director map file" for each "geo
record". In short, each country can be directed to a configured A record.
As a demonstration, I have setup a georecord for wikipedia on our
servers, using our infrastructure. (This costed me about 4 mins of time
to setup, and is therefore probably a faster way for initial tests than
to set it up on multiple wikipedia servers.)
To test, do a
$ dig wikipedia.geo.blitzed.org CNAME
It will give an answer to either squids.fl.us.wikipedia.org, or
squids.fr.eu.wikipedia.org, depending on your DNS resolver's location.
These DNS targets are fictious at the moment (the .fr squids are not on
their final colo location yet), but it's just for initial testing.
Also interesting is
$ dig localhost.geo.blitzed.org A
It should respond with a dotted quad encoded 127.* ip, corresponding to
your country's ISO code number.
We've been using this on Blitzed for the past half year now without any
problems, and I think it would be useful for wikipedia as well.
More information can be found in the README[3] for this backend, and
perhaps in the brainstorming wiki pages[4] we used when writing it.
Proper documentation for it, to be included in the powerdns
documentation, is in the works.
[1] http://www.blitzed.org
[2] http://www.powerdns.com
[3]
http://cvs.blitzed.org/geo-dns/README?rev=1.8&content-type=text/vnd.viewcvs…
[4] http://wiki.blitzed.org/DNS_balancing
--
Mark
mark(a)nedworks.org
Hello,
I just wanted to announce that the german Wikipedia is going to be
converted to UTF - 8 in the night between the 29. and 30. July.
The conversion will be started at 6 PM UTC, the german Wikipedia's
database will be set read-only for ~7 hours.
Sincerly yours,
Fire
Timwi wrote:
> Don't think too far. The payment system for developers is only an
> experiment, anyway. Nobody knows if it will work out. If it *does* work
> out, there is certainly no reason not to use the same system to reward
> work on actual articles -- but let's concentrate on one side of it first.
So, I think we need to unask the question about how to pay developers to do work
for the foundation, and think about how to make MediaWiki development faster and
easier.
One thing to note is that MediaWiki _is_ a very active, vibrant project. It's
gone through a lot of changes this year, and will probably continue to.
That said, here are my suggestions for making MediaWiki a more agile piece of
software:
1. Go Open Source. I know, the software is GPL-licensed right now, but we're not
using Open Source methods to get contributions from outside Wikimedia. I think
I'm the only developer whose primary wiki project isn't a Wikimedia one. There
are a lot of wikis using MediaWiki, and we need to be drawing talent and
contributions from developers outside the Foundation framework.
2. Re-architect for comprehension. The software, as it stands, is really, really
complicated. This comes somewhat from the inherent complexity of having a wiki,
but it also comes from an accumulative development process. It's hard to figure
out how to implement a new feature or fix a bug, because it's really hard to
figure out what modules are responsible for what functionality.
3. Re-architect for extensibility. In a lot of ways, MediaWiki is more like a
first-generation "wiki script" than more modern wiki software. MediaWiki
compares unfavorably to other wiki software such as MoinMoin or Twiki in the
ability of third-party developers to create extension modules for the software.
Hell, it's hard just to change the _skin_ on MediaWiki. Extensibility means that
outside developers can doing cutting-edge experiments, and we can incorporate
(or ignore) those modules at a later time.
~ESP