Hello,
after many weeks of non-developing, I updated again my wikipedia->static
html code. The code basically works, maybe it needs a few tunings but it's
accetable as a starting point.
The main issue, as I see it, is that it's written as a completely
separated Perl program, that is, it uses none of the PHP code used on the
web site. This means that each change on the PHP side must be replicated
in the Perl program. For example, the new TOC feature is not present now
and should be coded anew.
I don't know if this will be a major problem or not. On one hand, the
Wikipedia structure seems mature enough not to expect big changes. On the
other hand, who knows what the "phase 4" software will bring along.. :-))
I didn't read the mailing list, but I saw some meta page about automatic
map generation from Blue Marble data. Veeeery interesting.
Other issues:
- TeX rendering is absent
- Size: the English wikipedia is now too big to fit into a CD-ROM, without
using compression techniques that would render it much less portable than
simple HTML files. The problem is not file size, but the huge number of
small files. A simple zip or gzip of the whole archive brings it down to
about 150MB, but now you need some sort of installation or browser
program. If images and other media are to be included, a CD-ROM is surely
not enough. A DVD would be OK. Size issues should not hamper the script as
a mirror generation tool.
- Javascript search (for single words) works quite well. It seems that I'm
not able to get string.replace() to work, so multi-word results are wrong.
Any expert on the matter?
- Time: complete running time is about 3 hours on a 1.3 Ghz Athlon, and
will increase as the main database grows. Rewriting it in C or C++ should
help, but I don't feel like it :-)
- Non-English wikipedias should be included too. Modifying the script for
this purpose should be easy.
The code is licensed under the GPL and is available to anyone who
requests it. I can put it somewhere on the meta wikipedia, or in the CVS,
if you think it's better. My time to further work on it in the following
months will be quite limited, so if anyone is willing contribute just say
so.
Cheers,
Alfio
> How come there is only 22 languages in the stats from the db-dumps?
For other languages there is no dump file at
http://download.wikipedia.org/
I believe they have not Phase III software yet.
Erik Zachte
SQL Access for admistrator could be disabled to improve speed ? Is it really a need ?
> -----Original Message-----
> From: Brion Vibber [mailto:brion@pobox.com]
> Sent: 20 August 2003 13:46
> To: wikitech-l(a)Wikipedia.org; intlwiki-l(a)Wikipedia.org
> Subject: Re: [Wikitech-l] Slow, very slow, very very slow,
> much too slow
>
>
> Constans, Camille (C.C.) wrote:
> > I agree with Anthere and Luc, sometimes, I get a page in
> half a seconde, sometimes in 2 minutes !
>
> Well, as over time the various wikis get larger and bring in bigger
> audiences, their database needs grow. The English wiki, big
> and slow, is
> about the size of the others combined; but the others have
> been running
> with all kinds of features still in place that have been
> disabled on the
> English wiki because they slow things down too much.
>
> As a temporary measure I've turned on the annoying limiter functions
> that we've got on the English wiki (no counter updates,
> several of the
> slower special pages disabled) for French, German, Japanese, Swedish,
> Dutch, and Polish. These are the biggest ones, and the one's
> I've been
> seeing a lot of slow/stuck database queries running on. (Not user-run
> queries, but special pages and searches.)
>
> I also put the French wiki in read-only mode to run checks on the
> tables, just in case there was something specifically amiss. It looks
> okay (thankfully!) and should be up and running again.
>
> -- brion vibber (brion @ pobox.com)
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)Wikipedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l
>
Hi, folks!
In SkinStandard when users login or logout, they are presented a link to
the last viewn page to quickly jump back to where they were. This
function was missing in SkinCologneBlue so I just added it.
Bye!
Matthias
Message: 9
Date: Wed, 20 Aug 2003 00:08:41 +0200
From: Luc Van Oostenryck
<luc.vanoostenryck(a)easynet.be>
Subject: Re: [Wikitech-l] fr.wikipedia.org almost dead
To: A mailing list for discussing about the technical
organization of
the Wikipedias <wikitech-l(a)Wikipedia.org>
Message-ID: <3F429FE9.7070502(a)easynet.be>
Content-Type: text/plain; charset=us-ascii;
format=flowed
Luc Van Oostenryck wrote:
> > Hi,
> > since a few days the french wikipedia was sensibly
slower than
normal,
> but now he is almost dead : it take minutes to view
a pages or the
> request timeout.
> > Is there any particular reason for this?
> > Looxix
> Of course just a few instant after that I sent the
mail,
the french pedia 'react' again normally.
Sorry for the disturbance.
Looxix
------
I rejoin Luc
In the past three days, the french wiki has quite
frequently some times when it comes to an halt
Yesterday, it has been nearly dead for half an hour
Basically, it has now been stuck for half an hour
I don't know what the other international notice, but
I do know it is bad, very bad
There is no way we can make promotion in that
situation. That would be suicidal. Where is the
problem ? Is someone launching many queries these
days ? Do we need to disconnect further stuff ? Some
of the special pages ? Or what ?
This is very serious
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
Quick fixes for a couple small things checked in and installed.
Tomos' reported problem:
* broken caching with Internet Explorer (OutputPage::sendCacheControl())
vary: accept-encoding replaced with vary: user-agent to keep IE from
refusing to cache pages
Related problem I noticed while fixing that:
* compressed pages from the file cache were being resent every time
because IE tacked some size info on its response that broke the
timestamp comparison (OutputPage::checkLastModified())
Menchi's reported problem:
* missing language links on edit previews and special pages with
Nostalgia skin (SkinNostalgia::doBeforeContent())
-- brion vibber (brion @ pobox.com)
> Message: 1
> Date: Sat, 16 Aug 2003 12:03:53 +0000 (UTC)
> From: Walter Vermeir <walter(a)wikipedia.be>
> Subject: [Wikitech-l] New Mailman :-(
> To: wikitech-l(a)wikipedia.org
> Message-ID: <bhl6j9$f8m$1(a)sea.gmane.org>
>
> I have now noticed, when i approved a posting, that
> there is a new version
> new in use of mailman.
>
>
> It is good there a update of mailman but why are the
> listadmins not
> informed about this? Or a least a warning on
> Wikitech-l?
>
> I have put a lot of work in modifying the look of
> the listpages so the
> where not so ugly. Now this is al gone.
seconded
and of course, as a big *stupid* one, I did not saved
my changes (which I could have done if I had known
beforehand an update was planned), so all is REALLY
gone.
And I won't make it again, so it will be ugly and in
english again
Walter, that is how english language will eat us.
Replacing our language over and over and over, till we
are too tired.
For those who don't believe it, look at
http://mail.wikipedia.org/mailman/listinfo/wikifr-l
especially second line on top right
and look also below the line where you confirm your
password
it is written
"Which language do you prefer to display your
messages?"
choice ?
Plainly disgusted
Anther (with no e, why bother)
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
Hi,
since a few days the french wikipedia was sensibly slower than normal,
but now he is almost dead : it take minutes to view a pages or the request timeout.
Is there any particular reason for this?
Looxix
Very nice of them to let us know!
----- Forwarded message from Warren Brown <wbrown(a)inktomi.com> -----
From: "Warren Brown" <wbrown(a)inktomi.com>
Date: Mon, 18 Aug 2003 17:25:08 -0700
To: root(a)wikipedia.org
Subject: Inktomi web crawler
The wikipedia.org server is blocking Inktomi's "Slurp" web crawler by
returning 403 errors for all access attempts. Presumably, this block
was set up because we were crawling the site too aggressively at some
time in the past. We would like to include wikipedia.org content in our
search database, and would be happy to work with you to match whatever
crawling limits you need to set.
Slurp observes /robots.txt rules for user-agent "Slurp". The crawler
access rate is normally limited to 4 pages per minute from a web server;
we can set that rate lower if you require. The Slurp access rate can
also be controlled by a "crawldelay" instruction in /robots.txt.
Inktomi search service is used MSN Search and a number of other web
portal and business sites worldwide. We are now a subsidiary company of
Yahoo!
Regards,
Warren Brown
Partner Service and Support
Inktomi, a Yahoo! Company
----- End forwarded message -----
Brion-
Yes, now the counter behavior is as you described. I tried both Win98 2nd &
XP.
Thanks,
Tomos
_________________________________________________________________
<b>MSN 8:</b> Get 6 months for $9.95/month.
http://join.msn.com/?page=dept/dialup