Many vandals hide behind shared HTTP proxies, which makes them difficult
to ban.
Here's an idea:
Add a new attribute to user accounts:
* "authenticated" users are users **who have supplied a non-throwaway
E-mail address**: authentication to be done by sending them an E-mail
which they have to reply to, in the same way as mailing list authentication.
We can then "greylist" IP addresses or ranges, so that only
''authenticated'' logged-in users can post from behind these addresses.
We can point out to new users from these ISPs that the reason why they
are being asked to authenticate is that other users from the same ISP
have acted as vandals.
The good bit:
* At the same time, non-greylisted IP addresses can still allow
anonymous or non-authenticated user account edits, so we stay "open" to
>99.99% of all users.
We should greylist just the IP address for a proxy, or the whole /19
range for a user IP address: this is the minimum routable block on the
Internet, and will generally catch all users from a particular region.
This significantly increases the costs to vandals, and provides
traceability back to providers, or even real identities if necessary.
Vandals can go on making new accounts as many times as they like, but
they have to incur the costs of setting up new provider accounts every
time we ban their user account. (I believe that ISPs share phone
numbers and credit card numbers of persistent abusers, so these people
will either end up without access, or using rogue providers, who we can
then blacklist. )
Then, we can reserve "blacklisting" only for IP addresses that are
beyond hope, such as individual users who are non-cooperative, or
providers without a workable anti-abuse policy. "Blacklisting" should
then ban all editing.
We can also refuse to accept authentication E-mails from E-mail
providers who do not have a good abuse policy.
Neil
Simply viewing an article currently involves performing a lot of
queries. Since it's our most common event, we should perhaps be
directing optimizing effort in that direction.
(if logged in) check user info:
User::loadFromDatabase:SQL: SELECT user_name,user_password,
user_newpassword,user_email,user_options,user_rights,user_newtalk
FROM user WHERE user_id=1
Check if page exists:
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=0 AND cur_title='Test:stuff')
Load the content:
Article::loadContent:SQL: SELECT cur_text,cur_timestamp,cur_user,
cur_counter FROM cur WHERE cur_id=19
[These last two can be combined into one query.]
Now, separately for each and every linked page:
Check if it exists:
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=0 AND cur_title='A_link')
(if stub threshold set) check its size and redirect status:
SQL: SELECT length(cur_text) AS x, cur_namespace, cur_is_redirect
FROM cur WHERE cur_id='96'
[Checking the size can be done with checking existence, and it may be
better to chalk up a list of all linked pages during initial parsing,
then check them all in one big query.]
Check if current page is restricted for editing:
wfGetSQL:SQL: SELECT cur_restrictions FROM cur WHERE (cur_id=19)
[We should get cur_restrictions along with the rest in the initial
query.]
(if logged in) check if on watchlist:
wfGetSQL:SQL: SELECT user_watch FROM user WHERE (user_id=1)
[For users with big watchlists like me, it may be inefficient to load
and parse the entire list on every single page view. It should perhaps
either be grabbed along with the rest of the user info, or stored in a
separate table where the database can parse the gobs of strings for us.]
Check for existence of talk page (twice - sidebar and bottom bar):
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=1 AND cur_title='Test:stuff')
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=1 AND cur_title='Test:stuff')
[Should only need to do this once and store the result...]
...output the page...
Update counters:
ViewCountUpdate::doUpdate:SQL: UPDATE cur
SET cur_counter=(1+cur_counter),cur_timestamp=cur_timestamp
WHERE cur_id=19
SiteStatsUpdate::doUpdate:SQL: UPDATE site_stats
SET ss_total_views=(ss_total_views+1),
ss_total_edits=(ss_total_edits),
ss_good_articles=(ss_good_articles) WHERE ss_row_id=1
[Note that that single row of site_stats is updated on every single view
of any page in the database.]
-- brion vibber (brion @ pobox.com)
I just came upon an article [[Connected]] containing a definition of a
connected graph. I added a definition of a connected set and got an edit
conflict, containing lots of text about graphs. The web page title was "Edit
conflict: Connected". I added my text to the bottom and hit the save button.
I then found that someone had turned [[Connected]] into a redirect behind my
back and my definition of a connected set is now at the bottom of the [[Graph
theory]] page!
AFAICT this is the same as bug 602823. The bug is 5/2 months old and has not
been fixed.
phma
From the Postgres FAQ:
4.6) What is the maximum size for a row, table, database?
These are the limits:
Maximum size for a database? unlimited (60GB databases
exist)
Maximum size for a table? 16 TB
Maximum size for a row? unlimited in 7.1 and later
Maximum size for a field? 1GB in 7.1 and later
Maximum number of rows in a table? unlimited
Maximum number of columns in a table? 250-1600 depending on column
types
Maximum number of indexes on a table? unlimited
Of course, these are not actually unlimited, but limited to available
disk
space and memory/swap space. Performance may suffer when these values
get
unusually large.
The maximum table size of 16TB does not require large file support from
the operating system. Large tables are stored as multiple 1GB files.
The maximum table size and maximum number of columns can be increased if
the default block size is increased to 32k.
Jonathan
--
Geek House Productions, Ltd.
Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998
Phone: 604-435-1205
Email: djw(a)reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2
Could somebody change the displayed name for links to subject pages from "View
Article" to "View Subject Page" (or something similar). Per our own
definition, my user page is not an article and neither are any
Wikipedia:namespaced pages.
Is sourceforge a better place for this? Better question: is anybody reviewing
sourceforge bug reports and feature requests anymore?
-- Daniel Mayer (aka mav)
elian wrote:
>Toby Bartels wrote:
>>Or not have moderation.
>>Other than approving posts by nonmembers, why do we need it?
>>When was this decided?
>Never. The question was about administration, not moderation. Choosing
>taglines, defining welcome greetings and all the technical stuff.
Well, I certainly trust Ed to do *that*.
-- Toby
Hello,
I've never worked on an open-source project, so if there are any
conventions to follow, please tell me.
I did some redesign of the user-prefs form (tested on my MacOS X browser
zoo and Linux, seems to work still ;-). Shall I commit such stuff directly
to CVS or post it here so that everyone can test it first?
Note to Magnus:
Could you mail me your test.php for the portal page?
Question to all: I got wikipedia running (macos 10.1.5, mysql 3.23, apache
1.3.22), but one problem remains, I get
the following error when trying to upload a file:
UPDATE cur SET cur_timestamp='20021116204723', cur_user=3, cur_user_text='Elian', cur_text='
* 20:47, 16. Nov 2002 [[Homepage:Elian|Elian]] \"[[:Bild:Mjwinde.png|Mjwinde.png]]\" hochgeladen (test)
', cur_comment='\"Mjwinde.png\" hochgeladen: test' WHERE cur_id=
aus der Funktion "wfRecordUpload". MySQL meldete den Fehler "1064: You have an error in your SQL syntax near '' at line 1".
Any ideas? The files are uploaded, though.
greetings,
elian
--
Microsoft is not the answer.
Microsoft is the question and the answer is NO.
Hi,
is there a way to download the image directory of the en.wikipedia?
There are the database tarballs, but those do not include the images.
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Maybe this has been asked before, anyway: couldn't a script be writtten
for administrators to roll back all changes made by a certain user or ip
address after a certain date/time, provided noone else has updated
since.
Links to articles that have been updated by someone else than the vandal
could be stored in a special Vandalism/Userxxxx/Versionyyy article for
easy reference.
Erik Zachte
Regarding the latest vandalism:
Two things that could be achieved with minimum effort are:
1. On the user contributions list, mark those changes that are in the
"cur" table (call it "top" changes, as they are on top of the article
stack). For a vandal user, that means they have not been fixed yet.
2. On the user list, for sysops, have a "delete this user" link after
each user.
The latter could be combined with a "delete top edits", automatically
reverting all "top" changes by this user prior to deletion. That way, we
wouldn't have to clean up everything manually.
I might do that later, if noone beats me to it...
Magnus