Jason, could you take a look at the DNS setup as currently on zwinger?
Right now I've got it limited to just wikipedia.org; we can add back
the other domains once we're satisfied we know what we're doing. :)
If it all seems well, can we get this duplicated on joey and set the
domain record to point at us? The important thing is getting the double
A record so both squid caches get used; it seems we're currently using
RAM preferentially to disk, and there's only so much RAM available.
Sample queries:
$ dig @zwinger.wikimedia.org en.wikipedia.org
; <<>> DiG 9.2.2 <<>> @zwinger.wikimedia.org en.wikipedia.org
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55075
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 1
;; QUESTION SECTION:
;en.wikipedia.org. IN A
;; ANSWER SECTION:
en.wikipedia.org. 3600 IN A 207.142.131.235
en.wikipedia.org. 3600 IN A 207.142.131.236
;; AUTHORITY SECTION:
wikipedia.org. 3600 IN NS zwinger.wikimedia.org.
wikipedia.org. 3600 IN NS joey.bomis.com.
;; ADDITIONAL SECTION:
joey.bomis.com. 142745 IN A 130.94.122.196
;; Query time: 109 msec
;; SERVER: 207.142.131.234#53(zwinger.wikimedia.org)
;; WHEN: Thu Feb 19 04:01:03 2004
;; MSG SIZE rcvd: 142
$ dig @zwinger.wikimedia.org wikipedia.org any
; <<>> DiG 9.2.2 <<>> @zwinger.wikimedia.org wikipedia.org any
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2993
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
;; QUESTION SECTION:
;wikipedia.org. IN ANY
;; ANSWER SECTION:
wikipedia.org. 3600 IN SOA zwinger.wikimedia.org.
brion.wikimedia.org. 2004021901 10800 3600 604800 7200
wikipedia.org. 3600 IN NS joey.bomis.com.
wikipedia.org. 3600 IN NS zwinger.wikimedia.org.
wikipedia.org. 3600 IN A 207.142.131.235
wikipedia.org. 3600 IN MX 10 mail.wikimedia.org.
wikipedia.org. 3600 IN MX 50 mormo.org.
;; ADDITIONAL SECTION:
joey.bomis.com. 142897 IN A 130.94.122.196
;; Query time: 251 msec
;; SERVER: 207.142.131.234#53(zwinger.wikimedia.org)
;; WHEN: Thu Feb 19 03:58:31 2004
;; MSG SIZE rcvd: 208
-- brion vibber (brion @ pobox.com)
Um, there is, I, (pause to remove foot from mouth)
Reducing time from .99 to .97 doesn't sound very important. But I still
think that efforts to reduce page delivery time are important. Any idea,
no matter how far-fetched, should be considered.
Naturally, the proof is in the pudding. So after we try a given
optimization, we ought to measure the results and see if it really gets
the pages to the users any faster.
Ed "Sheepish" Poor
There are two projects that I am exploring that might make mediawiki more
robust in handling the emmense load being placed on it.
One, is a one or two pass wiki parser. The current parser, which performs
dozens of passes, probably degrades by the square of the file size. I have
added some thoughts to http://meta.wikipedia.org/wiki/One-pass_parser
Storing diffs in the 'old' table. This would not affect performance,
except when loading or comparing old revisions, but could drastically
reduce the size of the database, which has to benefit how manageable it
is. There already is a differencing engine in the source, though I'm not
sure how reliable it is--it may also degrade by the square of the file
difference. Here too, a sequence of diffs can be merged in one pass.
Having written such code in the past, I plan to create a write up
exploring this idea. Has this idea been discussed amongst the developers?
What are the gotcha's?
Nick Pisarro
As a programmer with well over a quarter century of experience, I
applaud this effort by Nick to identify and slay the "dragon" of
N-squared performance.
When a process on a file takes time proportion to the file size, we call
that N. When time is proportion to log N, we call that log N performance
(quicksort, which uses a partition scheme and then sorts the partitions,
has log N performance).
A process whose elapsed time grows in proportion to the SQUARE of the
file size is out of control! It's called an N-squared or N^2 process,
and is to be avoided like the plague. Insertion sort and Bubble sort are
examples, and have long been shunned for all but small numbers of items.
Nick, if you and Magnus can pull this off, to Brion's satisfaction, you
will gain the everlasting gratitude of the entire Wikipedia community.
More power to you!
Ed Poor
Professional Software Developer
-----Original Message-----
From: wikitech-l-bounces(a)Wikipedia.org
[mailto:wikitech-l-bounces@Wikipedia.org] On Behalf Of Nick Pisarro
Sent: Wednesday, February 25, 2004 9:17 AM
To: Wikimedia developers
Subject: [Wikitech-l] Two projects I'm exploring
There are two projects that I am exploring that might make mediawiki
more robust in handling the emmense load being placed on it.
One, is a one or two pass wiki parser. The current parser, which
performs dozens of passes, probably degrades by the square of the file
size. I have added some thoughts to
http://meta.wikipedia.org/wiki/One-pass_parser
Storing diffs in the 'old' table. This would not affect performance,
except when loading or comparing old revisions, but could drastically
reduce the size of the database, which has to benefit how manageable it
is. There already is a differencing engine in the source, though I'm not
sure how reliable it is--it may also degrade by the square of the file
difference. Here too, a sequence of diffs can be merged in one pass.
Having written such code in the past, I plan to create a write up
exploring this idea. Has this idea been discussed amongst the
developers? What are the gotcha's?
Nick Pisarro
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
The arrow files, Arr_r.png, Arr_d.png, and Arr_.png used in the new
'recent changes' feature, do not seem to be in phase3/images.
Nick Pisarro
P.S. Is this the right place to post a notice like this?--Nick
Hi,
the thumb creation is a bit curious:
The thumbnail of a 14 KB PNG file is more than twice as large as the
original image. Just have a look at it:
http://de.wikipedia.org/wiki/Dune_II
And 36 KB only for a 180 x 112 PNG is more than strange...
Regards,
Nils.
--
Created by 100 monkeys with 100 typewriters.
All,
I wanted to share a heads-up. Last weekend I had an hour of spare time,
so I wrote wps-e, a semi-automated and reasonably[1] intelligent spell
checker for Wikipedia. After compiling an extensive dictionary from
several known-good sources, wps-e currently operates with a 668 thousand
word base and some 2.2 thousand "automatic correction" entries
(misspellings that almost certainly should be autocorrected (i.e.
leaderhip -> leadership, prominant -> prominent)).
It's written in Perl, and you can take a look at some of the results of
its run by looking at contributions by User:Ike on en. It's
semi-automated in the sense that - other than the autocorrect list - it
doesn't do more than *propose* correct spelling, and even then it never
autocommits back to wikipedia without a user manually approving the
changes. Because the Princeton Wordnet database is free, the software
also integrates meanings of its proposed spellings into the (colored,
but console-based) interface.
I am working out some final kinks out of the software, and will be
releasing it soon. Sometime in the near future, I expect to release a
windows-based GUI app that might make this type of copyediting very
easily accessible to larger masses, which would mean cleaning up WP
pretty quickly (I was alone able to go through about 2000 articles with
wps-e in under 3 hours).
If there are a few interested fellow Perl-masochists here (Erik, Timwi
maybe?), I'd be more than happy to send them the beta to try and break
or improve before it's released. Just drop me an e-mail. If someone with
proper permissions wanted to copy-edit directly on one of the WP
servers, they would have very low per-correction time (though this
effect can be achieved by normal users by pre-fetching articles, which
I'm looking to add to wps-e anyway).
Cheers,
Ivan
[1] - it doesn't cook dinner for you, but I've had - other than last
names - very few false positives to deal with.
Jimmy, we seem to have a pb, we cant access to coronelli, the 2nd squid cache, no ping, no ssh, nothing :)
[shaihulud@suda data]$ ping coronelli.wikimedia.org
PING coronelli (207.142.131.230) 56(84) bytes of data.
>From suda (207.142.131.226) icmp_seq=1 Destination Host Unreachable
>From suda (207.142.131.226) icmp_seq=2 Destination Host Unreachable
Could you have a look or try to restart it ?
Thanks
Shaihulud
Hi.
I've registered the domain "wikipedia.no" (Norway). In Norway, such domains
must be owned by Norwegian registered companies or organizations, so it's not
possible for me to transfer it completely to the Mediawiki Foundation.
Jimbo suggested my company sit as owners until Mediawiki opens shop in Norway,
or rules here change. In the mean time, if someone felt it was necessary, we
could set up a lease contract.
Anyway, I've set up a http redirect for the moment, but I'd like to set up the
domain with Wikipedia's own DNS servers, and have the no.wikipedia.org site
answer directly to www.wikipedia.no.
How do I go about arranging this?
-- Daniel
In addition to the Safari fiasco a couple days ago, I've recently
received a complaint that Wikipedia blocks Netscape 3.0. The last entry
in the block list was for "^Mozilla/3.0", which blocks Netscape 3.0 and
some WebTV and similar browsers.
Now, I seem to remember blocking "Mozilla/3.01 (compatible;)" back in
the day, but never just plain "Mozilla/3.0". I've changed it for now to
that...
Incidentally, I managed to get Netscape Gold 3.04 up and running... It
does sort of work on Wikipedia. Sort of. :)
It gives a JavaScript error on every page view; a syntax error for a
regexp used for the toolbar code I believe. It doesn't support PNG
images, and the sidebar shows at the bottom of the page... also it
doesn't grok UTF-8. It seems to have caching problems on default
settings and doesn't reload pages automatically after edit. But, pages
(in English at least) are legible and it understands virtual hosts.
-- brion vibber (brion @ pobox.com)