80.2.170.93 public2-bror2-5-cust93.bror.broadband.ntl.com
Using "WebStripper/2.19". I've blocked both the UA string "WebStripper"
(permanently) and the IP (will clear it after a few days).
Is there some kind of rate-of-connections-per-IP throttling we could do
with Apache?
-- brion vibber (brion @ pobox.com)
I have (despite some protests :-) begun to write an offline wikipedia
reader. It is C++, using wxWindows, and the free Dev-C++ development
environment.
I was pleasently surprised to find that wxWindows not only contains a
built-in HTML display component, but also a zipfile input stream, so it
can read a file directly from a packed archive, which makes it perfect
for CD-ROM use (no need to unpack the thing).
An offline reader for pre-made HTML pages could be up in a matter of
days. But I'd like to aim higher.
1. The power of C++ would make it perfect for parsing the wiki code.
2. The parser will be an object (of course), which could be used in a
Phase IV C++ software.
3. It could be used as a fast client-side reader, instead of a web
browser, for wikipedia pages. It could load the wiki code from the live
site without the server having to render it.
4. It could serve as a special wikipedia offline (or client-side) editor.
I'll try to implement some of #1 before going public, though.
Magnus
Forwaded from wikipedia-l... this seems like a good example to me of
something that could be cached. If someone requests "User
Contributions" then if they've calculated it within the last 24 hours,
say, then they should be given the cached version and told that a new
one will be generated after midnight, or similar.
That is, unless people use their User Contributions for some realtime
work.
----- Forwarded message from Brion Vibber <brion(a)pobox.com> -----
From: Brion Vibber <brion(a)pobox.com>
Date: 01 Feb 2003 17:19:30 -0800
To: wikipedia-l <wikipedia-l(a)wikipedia.org>
Subject: Re: [Wikipedia-l] Problem with "User Contributions"
On sab, 2003-02-01 at 14:33, Zoe wrote:
> For several days now, when I've clicked on "User Contributions" on my
> user page, Wikipedia has churned for several minutes, and then
> displayed the browser's "The page cannot be displayed". I can
> sometimes Refresh and eventually it comes up, but today, that's been
> completely unsuccessful.
That's cause you edit too much, Zoe. ;)
Seriously, though, yes. Certain operations that involve checking large
numbers of old page revisions are sometimes excruciatingly slow of late.
This is hitting:
* User contributions
* History of oft-edited pages with hundreds of revisions (Village pump,
mav's talk page, current events, talk:main page, vandalism alerts)
* Diff to last edit on the same. (This involves sorting to find the most
recent edit, and that seems to be holding us up.)
Until this is resolved, I'd like to ask that you _don't_ hit refresh
when one of these is churning away like mad, but rather just let it go.
The first query keeps running for a while, and it's just not going to go
anywhere until it's done (or I or Magnus or someone logs in and kills
the query), and a second request might just make it worse. :(
-- brion vibber (brion @ pobox.com)
----- End forwarded message -----
Part of my offline reader is the functionality to split a database dump
into a zillion files, one for each article. (That function will only be
used on the *coding* side, e.g., on the wikipedia server when generating
a CD-ROM version).
So I downloaded the zipped German database, which unzips to ~33MB.
Imagine my surprise when the article files (article namespace only!)
came to less than 14MB (~16.000 files).
It seems that in each database dump, we have the "search indexed"
article text as well, which contains the same text as the article, but
without special chars.
Can we not dump that field next time? It would reduce the file size (and
download time) by about 50%!
Magnus
Oh, lovely. Perhaps I should hold off on that upgrade just a bit...
-- brion vibber (brion @ pobox.com)
-----Forwarded Message-----
From: Heikki Tuuri <Heikki.Tuuri(a)innodb.com>
To: mysql(a)lists.mysql.com
Subject: BUG: InnoDB ORDER BY DESC may hang in 4.0.10
Date: 07 Feb 2003 02:40:40 +0200
Hi!
A rather serious bug was introduced to 4.0.10 in connection of another bug
fix.
If you have a composite key (col1, col2) in an InnoDB table, then a query of
type
SELECT ...
FROM ...
WHERE col1 = x ORDER BY col2 DESC;
may hang in an infinite loop.
The fix is in 4.0.11.
Best regards,
Heikki Tuuri
Innobase Oy
---
InnoDB - transactions, hot backup, and foreign key support for MySQL
See http://www.innodb.com, download MySQL-Max from http://www.mysql.com
sql query
Jason and I are taking stock of our hardware, and I'm going to find a
secondary machine to devote exclusively to doing apache for wikipedia,
i.e. with no other websites on it or anything. I'll loan the machine
to the Wikipedia Foundation until the Foundation has money to buy a
new machine later on this year.
We'll keep the MYSQL where it is, on the powerful machine. The new
machine will be no slouch, either.
Today is Friday, and I think we'll have to wait for Jason to take a
trip to San Diego next week sometime (or the week following) to get
this all setup. (The machine I have in mind is actually in need of
minor repair right now.)
By having this new machine be exclusively wikipedia, I can give the
developers access to it, which is a good thing.
This will *not* involve a "failover to read-only" mechanism, I guess,
but then, it's still going to be a major improvement -- such a
mechanism is really a band-aid on a fundamental problem, anyway.
------
Lots of people think it's a good thing to set up mirror servers all
over the Internet. It's really not that simple. There are issues of
organizational trust with user data, issues with network latency, etc.
Some things should be decentralized, some things should be
centralized.
Does the current software have a quick mechanism to make the site
read-only?
Also, is there anything that would make it easy to have "articles and
searching only". That is, no user logins or anything fancy, just the
ability to read the articles and search through them.
--
"Jason C. Richey" <jasonr(a)bomis.com>
Jimbo wrote:
>Ian Gilfillan, author of Mastering MySQL 4, has also volunteered to
>help, and if he wants access, we can give it to him too, even though I
>don't know him. He wrote a book, so he's legit.
There's no better source of legitimacy, in my eyes.
--
--------------------------------
| Sheldon Rampton
| Editor, PR Watch (www.prwatch.org)
| Author of books including:
| Friends In Deed: The Story of US-Nicaragua Sister Cities
| Toxic Sludge Is Good For You
| Mad Cow USA
| Trust Us, We're Experts
--------------------------------