There is a broken image which has no filename on the SQL table of
nl.wikipedia.org:
(5554,6,'','','',19,'Jcwf','20021121050911','',0,0,0,0,0.281004031450491,'79978878949088','20031005104213'),(5555,6,'Tabeh.png','tabeh\r\n
een poging tot hieroglyphen (zelf getypt)
(from 2003-10-07 SQL dump).
Please remove it somehow because it interferes with a tool I'm writing.
Daniel
You daemon! The prospect of getting better hardware has me smacking my
lips!!
But Anthere's proposition makes sense, if we can bring Bayesian sampling
techniques to full flower.
I got the idea from the Cunctator, who is actually not a bad fellow once
you see through the bluster.
Ed Poor
> Explain to me why wikien-l and wikitech-l have no spam
> perhaps :-)
Wikien-l is configured to "pass" only messages from subscribers. All
other messages are put on "hold" until an administrator deals with them.
I don't know who was dealing with this before my vacations this year,
but recently I've been looking once a week and deleting anything that
didn't look look a legitimate message.
I **!! NEVER !!** communicate with spammers. Telling them not to send
more mail only alerts them to the fact that they've found a real live
human being, someone who might read their, um, merde. Until the US
Congress passes an anti-spam law, it's best just to delete unwanted
e-mail.
If you need help configuring Wiki-fr you could e-mail the password
privately and I could take a look.
Ed Poor
--- wikitech-l-request(a)Wikipedia.org wrote:
> Send Wikitech-l mailing list submissions to
> wikitech-l(a)Wikipedia.org
> Attached is a zip file containing the output from
> SHOW PROCESSLIST.
> There is one file showing "typical" output, between
> the episodes, and
> three showing output in the middle of an episode.
> One thing stands out
> like a sore thumb:
>
> SELECT 1 FROM user_newtalk WHERE
> user_ip='167.206.112.85'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='217.244.15.107'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='80.55.166.58'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='66.196.90.11'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='172.176.254.188'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='220.54.212.24'
> SELECT 1 FROM user_newtalk WHERE user_id=7580
> SELECT 1 FROM user_newtalk WHERE
> user_ip='194.78.48.226'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='80.225.14.29'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='144.32.128.73'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='62.216.15.127'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='134.151.225.179'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='213.81.145.123'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='129.137.208.159'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='195.93.72.17'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='218.19.141.2'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='68.11.187.242'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='63.34.208.93'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='81.196.21.16'
> SELECT 1 FROM user_newtalk WHERE
> user_ip='202.156.2.82'
>
> Millions of user_newtalk requests, conspicuously
> absent from the typical
> dump. Many of them are doing "statistics", whatever
> that means, and the
> rest are locked. One side of me is curious as to
> what "statistics" means
> and why it produces this behaviour. The other side
> of me says
>
> KILL KILL KILL
Needless to say, I have no idea what you are talking
about
But I feel the KILL KILL KILL feeling each time the
mailing list is attacked by spamers, which in turn
fills up my mailing box of rebouncing dirt.
I would be curious to know how much spam other mailing
lists receive ? Now that I know every money problem
the african business man have, that I know how to
expand my penis to incredible size, and how to get
called "lips on fire", what ?
Is it possible to add some words to an anti-spam list,
and to have any message containing "sex", "enlarge",
"lips", "money", "daemon", "free business
proposition", "free flower" (if only, sigh) and such
just be automatically sent back ? ,(or just plainly dumped)
__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
Anthere,
I am not getting any of the mailing list spam you are talking about --
at least, not in noticeable amounts.
I use Microsoft Outlook, and I'm behind a corporate firewall, so my
experience may be different from yours. But I have configured my mail
client (MS Outlook) to move e-mail containg words like:
[Wikitech-l]
Into a folder market Wikipedia Developers.
My anti-spam strategy is a poor man's version of filtering.
I leave everything in "Inbox", except e-mail from known sources like:
* Business colleagues (address contains the same domain as mine)
* Friends and relatives (added to a list, one by one)
* Wikipedia (subject line contains "Wiki")
I've heard of some sophisticated techniques using "Bayesian statistics"
but I haven't tried them yet.
Is there anything I can do, as mailing list administrator, to help get
rid of spam?
Ed Poor
Wiki-tech admin
Wikien admin
Tim,
I wouldn't discount the possibility of a denial of service (DOS) attack,
but I also wouldn't panic and take an axe to the hard drive.
(I do have a poster on the wall depicting an angry duck who says
"Compute this!" as he initiates a violent act against his recalcitrant
machine, but this is more for venting my own frustration than as a model
for my own behavior ;-)
I'm intrigued by the periodic shutdown phenomena, and I really
appreciate your efforts at using perl to open the oyster, so to speak.
There's precious jewelry to be found somewhere in Wikipedia's innards,
and you're just the man to find it.
Ed Poor
Cheerleader, Coach and Vendor of Refreshments
P.S. May I fax you a fresh, hot cup of coffee?
Tim,
Thanks for paying attention to the profiling issue -- I don't mean
[[racial profiling]], of course.
Identifying the parts of the code which are taking a long time to run,
is one of the best steps toward speeding up a system.
(By the way, if you really want my home phone number to contact me in an
emergency, just write me privately. I might not be any use with
technical details, but I have a long track record as a sympathetic
listener and CATALYST whose presence on a team seems to facilitate the
production of brilliant solutions by the the other team members.)
Ed Poor
To continue the saga of grand new schemes-that-solve-it-all, to be
implemented by someone else:
One server (the bigger one), let's call it edit-en, operates exactly
like our current system. Except it's closed to all spiders and runs
apache and mysql on one machine.
The other server, let's call it en, only runs apache and serves a static
tree to the world. The static tree is updated once a night from edit-en.
It's spidered by Google and takes the brunt of the browsers. Its
interface is radically stripped: no logging in, no cookies, no
RecentChanges, no special pages. Just a search box, What-links-here,
Edit-this-page, Discuss-this-page and page-history. The latter three
direct to edit-en; edit-en decides whether its version is newer than
en's version, and if so, displays a warning and lets people edit away.
The unwashed masses that come in through Google will thus never touch
edit-en.
The updating of en's versions could be done by dumping the whole
database and let en format it into a static HTML tree (probably too
slow), or by having edit-en always store its cached files in the file
system, to be periodically exported to en via rsync (need to make sure
that *all* files are cached, of course).
Benefits:
* hopefully speedup (but can edit-en alone do a decent job for us
editors?)
* to be done with current hardware
* simpler interface for drive-by-browsers
* Future extension: have several static ens, use DNS
round-robin on them.
* En can have a nice fast htdig search whose index is statically
computed once (or is that too slow?).
* The servers are almost independent; if either one goes
down, Wikipedia is at least still readable.
* If the rsync variant is chosen, we could offer public rsync
and static Wikipedia mirrors would spring up all over the world.
Drawbacks:
* some programming to distribute the static HTML to en.
* Edit wars or vandalizing could increase shortly prior to the
time of storing a static version on en (maybe randomize the
time of doing that?)
* Copyright violations and vandalizations sometimes stay visible
for a whole day.
Axel