Jimbo forwarded to Wikien-l what looks like an "official" takedown
notice for the article about Easter Bradford -- a man who has been, from
time to time, a contributor to Wikipedia.
See:
http://mail.wikipedia.org/pipermail/wikien-l/2004-January/009529.html
I'm not sure what we should do. If I still had ssh software on my PC, I
would probably delete the pages from the database (just to be safe).
On the other hand, it might be a good idea to back them up them
somewhere, accessible only to the head developer, in case the notice is
either (a) faked; or (b) not something we have to comply with, even if
it's real.
Cautiously,
Ed Poor
This is something we should discuss. Right now, by refusing access to
people who don't send a user agent, we may be messing some people up
who are coming from behind a proxy? That's what I understand him to
be saying.
Ideally, when we get our capacity up, we'll let the robots run free.
----- Forwarded message from Phil Howard <phil(a)ipal.net> -----
From: Phil Howard <phil(a)ipal.net>
Date: Wed, 14 Jan 2004 12:41:16 -0600
To: Jimmy Wales <jwales(a)bomis.com>
Subject: Re: Who do I contact when webmaster(a)wikipedia.org does not work?
On Wed, Jan 14, 2004 at 06:56:02AM -0800, Jimmy Wales wrote:
| I am told that this is because you don't send a user-agent string.
| We block those requests because they are usually from badly-written
| spiders that clobber our site.
Ah. That would be the case.
RFCs require that if a web cacher sends a user-agent string, each received
document must be cached separately on the basis of that entire string so
that a new request for the very same document from a different browser does
not get the one previously requested. Thus if 20 different users go to a
given web site from 20 different browsers, the cache has to access the origin
server all 20 times, and no cache benefit is obtained, even for the image
files. Given that browsers not only put major version numbers, but also
minor version numbers, and in many cases even build numbers/dates, in the
user agent string, that makes it appear to be a lot of different browsers
for the purpose of caching.
Here's what my broswer does send:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030728 Mozilla Firebird/0.6.1
I think that's excessive, but that is what is going on.
I think you need to reconsider the issue with user-agent. There are a few
commercial web caches that do this to enhance performance for businesses
that are on slower network links. And I don't see any way to make it make
an exception for certain sites.
Maybe I can hack the code in Squid (which is what we use here) to make it add
a generic user-agent string in place of the deleted one. If you decide you
cannot change your action for lack of user-agent, then clearly the step to
be taken is to fake one that can't be interpreted as any specific browser.
User-Agent: generic cache
| But I notice that the screenshot is of mozilla firebird. May I
| share that link with the developers?
Sure. If by developers you mean the mozilla developers, be sure to tell them
it's not a mozilla issue; it's a caching issue.
BTW, I had heard that some spiders specifically omitted user-agent so that
they could be sure to cache something generic for every browser. Maybe they
will be going to the fake user-agent string, too.
So why not at least put up a page that says why the site cannot be accessed?
Just don't put any links in that page for any spiders to continue following
so they cache/index only that one, and leave. Surely they are not accessing
the same page over and over and over because it didn't have an error status.
|
|
| Jimmy Wales wrote:
|
| > wow that must have been a very temporary thing, let me know if you see
| > it again.
| >
| > Phil Howard wrote:
| >
| > > On Tue, Jan 13, 2004 at 04:26:55AM -0800, Jimmy Wales wrote:
| > >
| > > | Phil Howard wrote:
| > > | > Who do I contact when webmaster(a)wikipedia.org does not work?
| > > |
| > > | Me. Hi, what's up?
| > >
| > > webmaster(a)wikipedia.org did not work, which is what I first tried when I
| > > could not access the web site.
| > >
| > > I get a 403 permission denied. Here is a screenshot:
| > > http://phil.ipal.org/wikipedia.png
| > >
| > > Here's the bounce from the original message I sent:
| > >
| > > =============================================================================
| > > [-- Attachment #1: Notification --]
| > > [-- Type: text/plain, Encoding: 7bit, Size: 0.5K --]
| > >
| > > This is the Postfix program at host vega.ipal.net.
| > >
| > > I'm sorry to have to inform you that the message returned
| > > below could not be delivered to one or more destinations.
| > >
| > > For further assistance, please send mail to <postmaster>
| > >
| > > If you do so, please include this problem report. You can
| > > delete your own text from the message returned below.
| > >
| > > The Postfix program
| > >
| > > <webmaster(a)wikipedia.org>: host mail.wikipedia.org[130.94.122.197] said: 550
| > > <webmaster(a)wikipedia.org>: User unknown in local recipient table (in reply
| > > to RCPT TO command)
| > >
| > > [-- Attachment #2: Delivery error report --]
| > > [-- Type: message/delivery-status, Encoding: 7bit, Size: 0.3K --]
| > >
| > > Reporting-MTA: dns; vega.ipal.net
| > > Arrival-Date: Mon, 12 Jan 2004 23:31:44 -0600 (CST)
| > >
| > > Final-Recipient: rfc822; webmaster(a)wikipedia.org
| > > Action: failed
| > > Status: 5.0.0
| > > Diagnostic-Code: X-Postfix; host mail.wikipedia.org[130.94.122.197] said: 550
| > > <webmaster(a)wikipedia.org>: User unknown in local recipient table (in reply
| > > to RCPT TO command)
| > >
| > > [-- Attachment #3: Undelivered Message --]
| > > [-- Type: message/rfc822, Encoding: 7bit, Size: 0.8K --]
| > >
| > > Date: Mon, 12 Jan 2004 23:31:44 -0600
| > > From: Phil Howard <phil(a)ipal.net>
| > > To: webmaster(a)wikipedia.org
| > > Subject: still not working
| > > User-Agent: Mutt/1.4.1i
| > >
| > > Here's the screenshot: http://phil.ipal.org/wikipedia.png
| > >
| > > --
| > > -----------------------------------------------------------------------------
| > > | Phil Howard KA9WGN | http://linuxhomepage.com/http://ham.org/ |
| > > | (first name) at ipal.net | http://phil.ipal.org/http://ka9wgn.ham.org/ |
| > > -----------------------------------------------------------------------------
| > > =============================================================================
| > >
| > > --
| > > -----------------------------------------------------------------------------
| > > | Phil Howard KA9WGN | http://linuxhomepage.com/http://ham.org/ |
| > > | (first name) at ipal.net | http://phil.ipal.org/http://ka9wgn.ham.org/ |
| > > -----------------------------------------------------------------------------
| > >
--
-----------------------------------------------------------------------------
| Phil Howard KA9WGN | http://linuxhomepage.com/http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------
----- End forwarded message -----
> > Dual 64 bit Opteron mainboards are available with 8 DIMM sockets
> capable
> > of taing 16Gb. eg:
> > http://www.tyan.com/products/html/thunderk8w_spec.html
>
> Right, we have one of those (though broken at the moment). Right now,
> our entire db should fit comfortably in 4 gig of Ram, which we have.
>
> But I totally agree with you about the importance of avoiding the
> hard drive as much as possible.
>
> --Jimbo
Just to throw my 2 cents-worth into the debate - any particular reason
for running on Intel? An Xserve dual G5 gives better price/performance
and is (possibly) much easier to administer. It's probably also likely
to be more reliable. I'm not an Apple apologist, but I think these
machines should be at least given fair consideration. If not, why not?
--Graham
This guy is having trouble connecting. Although I can use the
website fine with a number of browsers, I am able to replicate
this exact sequence using telnet. What's the issue here?
This fellow apparently can't see the website.
=============================================================================
phil@altair:/home/phil 96> telnet en.wikipedia.org 80
Trying 130.94.122.199...
Connected to en.wikipedia.org.
Escape character is '^]'.
GET / HTTP/1.1
Host: en.wikipedia.org
HTTP/1.1 403 Forbidden
Date: Tue, 13 Jan 2004 21:03:19 GMT
Server: Apache/1.3.28 (Unix) mod_throttle/3.1.2 PHP/4.3.2
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
10f
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>403 Forbidden</TITLE>
</HEAD><BODY>
<H1>Forbidden</H1>
You don't have permission to access /
on this server.<P>
<HR>
<ADDRESS>Apache/1.3.28 Server at en.wikipedia.org Port 80</ADDRESS>
</BODY></HTML>
0
Connection closed by foreign host.
phil@altair:/home/phil 97>
=============================================================================
On Tue, Jan 13, 2004 at 09:13:50AM -0800, Jimmy Wales wrote:
| wow that must have been a very temporary thing, let me know if you see
| it again.
|
| Phil Howard wrote:
|
| > On Tue, Jan 13, 2004 at 04:26:55AM -0800, Jimmy Wales wrote:
| >
| > | Phil Howard wrote:
| > | > Who do I contact when webmaster(a)wikipedia.org does not work?
| > |
| > | Me. Hi, what's up?
| >
| > webmaster(a)wikipedia.org did not work, which is what I first tried when I
| > could not access the web site.
| >
| > I get a 403 permission denied. Here is a screenshot:
| > http://phil.ipal.org/wikipedia.png
| >
| > Here's the bounce from the original message I sent:
| >
| > =============================================================================
| > [-- Attachment #1: Notification --]
| > [-- Type: text/plain, Encoding: 7bit, Size: 0.5K --]
| >
| > This is the Postfix program at host vega.ipal.net.
| >
| > I'm sorry to have to inform you that the message returned
| > below could not be delivered to one or more destinations.
| >
| > For further assistance, please send mail to <postmaster>
| >
| > If you do so, please include this problem report. You can
| > delete your own text from the message returned below.
| >
| > The Postfix program
| >
| > <webmaster(a)wikipedia.org>: host mail.wikipedia.org[130.94.122.197] said: 550
| > <webmaster(a)wikipedia.org>: User unknown in local recipient table (in reply
| > to RCPT TO command)
| >
| > [-- Attachment #2: Delivery error report --]
| > [-- Type: message/delivery-status, Encoding: 7bit, Size: 0.3K --]
| >
| > Reporting-MTA: dns; vega.ipal.net
| > Arrival-Date: Mon, 12 Jan 2004 23:31:44 -0600 (CST)
| >
| > Final-Recipient: rfc822; webmaster(a)wikipedia.org
| > Action: failed
| > Status: 5.0.0
| > Diagnostic-Code: X-Postfix; host mail.wikipedia.org[130.94.122.197] said: 550
| > <webmaster(a)wikipedia.org>: User unknown in local recipient table (in reply
| > to RCPT TO command)
| >
| > [-- Attachment #3: Undelivered Message --]
| > [-- Type: message/rfc822, Encoding: 7bit, Size: 0.8K --]
| >
| > Date: Mon, 12 Jan 2004 23:31:44 -0600
| > From: Phil Howard <phil(a)ipal.net>
| > To: webmaster(a)wikipedia.org
| > Subject: still not working
| > User-Agent: Mutt/1.4.1i
| >
| > Here's the screenshot: http://phil.ipal.org/wikipedia.png
| >
| > --
| > -----------------------------------------------------------------------------
| > | Phil Howard KA9WGN | http://linuxhomepage.com/http://ham.org/ |
| > | (first name) at ipal.net | http://phil.ipal.org/http://ka9wgn.ham.org/ |
| > -----------------------------------------------------------------------------
| > =============================================================================
| >
| > --
| > -----------------------------------------------------------------------------
| > | Phil Howard KA9WGN | http://linuxhomepage.com/http://ham.org/ |
| > | (first name) at ipal.net | http://phil.ipal.org/http://ka9wgn.ham.org/ |
| > -----------------------------------------------------------------------------
| >
--
-----------------------------------------------------------------------------
| Phil Howard KA9WGN | http://linuxhomepage.com/http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------
----- End forwarded message -----
We really, really need to move the database back to a machine with a
decent hard drive. The wikis are very sluggish, and a fair chunk of
it's from waiting on the database.
Ursula's sitting around with a 90% idle CPU, but everything's blocked
on disk I/O to the point it's got a load average of about 16. At any
given time from 8-20 processes are blocked and waiting. Operations that
hit a lot of rows like history and watchlist are particularly badly hit
since they don't play as well with caching.
If Geoffrin's not going to be up soon, and Pliny's still emitting
spurious disk errors, what are our options out of the available
machines?
-- brion vibber (brion @ pobox.com)
8 SM-1151SATA, P4 2.6, 2x1GB RAM, 80GB SATA RD, Redhat 9 preload,
1.44Mb floppy, rail kits.
The 2x1Gb was more expensive than 4x512Mb, but gives us more
flexibility for growth and also for shuffling the memory around to
specialize the machines. For machines that aren't doing anything too
intensive (mail server), we can buy cheap 2x256=512Mb and donate the
extra 2gb to the squid(s), if that seems helpful. And there will be
slots available to just buy more memory outright if that seems like a
good idea.
8 machines gives us some oomph, no matter what.
1 SM-2280S, Dual AMD Opteron 246, 4GB memory, SCSI RAID 5, 3x36,
logical capacity 72GB, redundant power supply, RH9 preload, 1.44mB
floppy,, rail kits.
This machine is very similar to our big dog geoffrin.
The total cost is $19,308, plus tax and shipping.
--Jimbo
Build in approximately 5 business days, and then shipping will take
approximately 5 more business days.
So that's two weeks in dog years, give or take. Let's say January
27th.
Hello,
I have modified Alfio Puglisi's wiki2static script and added a few of my own
to generate a static compressed version of wikipedia which should be
viewable on any UNIX-PDA featureing a webserver and a sh-compatible shell.
Compression works quite nicely, I currently have the German and the English
content (including media and TeX images) taking up up ca. 600MB on a
microdrive. Since TomeRaider is not available for Linux-based PDAs this
should fill the last non-Wikipedia hole. See
http://www.retsiemuab.de/wiki2zaurus .
Two questions:
1) Can some people (preferably with UNIX PDAs and desktops) check if the
descriptions on the above site are understandable and work for them also?
2) Should I put a link on a wikipedia page and if yes on which one?
"Wikipedia:Database download" has the links to the wiki2static script but I
would rather not be listed on a page about how to get database dumps.
Without necessary bandwidth for offering dumps I'd prefer to distribute only
scripts.
Markus