It strikes me as violating the principle of least astonishment that
"Delete this page" and the little "(del)" link next to the current image
revision do different things on an image description page, namely:
* "Delete this page" deletes only the image description page, leaving
the image file and its revisions intact, and the image remains in the
images list
* "(del)" deletes the image file, any old revisions, the entry from the
images list, *and* the description page.
User expectation seems to be that "delete this page" should perform the
second function.
-- brion vibber (brion @ pobox.com)
I think that a subject classification of articles would vastly improve
"soft security" and would save regulars a lot of time, since not
everyone would have to check every edit as currently seems to be the
case.
>I'd still like to see if we couldn't build those subjects
>automatically in some way based on links in the database.
How about this: the possible topics coincide with the major pages
listed on [[Main Page]] (from "Astronomy" to "Visual Arts"). The
shortest link path from such a topic page to an article defines that
article's topic. If there is no such path, then the article is
classified as a topic orphan.
To compute these topics quickly, the cur table gets two new columns:
topic and distance, where distance stands for the link distance from the
Main Page topic page. If a new article is created, looking at the
distance entries of all articles that link to the new one, and taking
the minimum, immediately classifies the new one. If an existing
article is saved, the topic and distance entries of all articles it
links to (and their children) may need to be updated; these changes
can be propagated in a recursive manner.
Would that work?
Axel
Hi,
as promised, I have looked a bit into the RC code for the purpose of
possibly implementing a filter. Since Ram-Man seems to be finished with
his script, this is no longer an urgent issue, so we might want to take
a step back and look at the design of SpecialRecentchanges.php etc.
before any further changes.
I have taken the liberty to import the Wikipedia database locally to
find out which operations are fast and which ones are slow even locally.
Long RC queries are quite slow on my machine, in spite of cur_timestamp
being indexed. Even with an index, searching a 350 MB+ table with fairly
random row order may be heavy. (Is there any way to determine if the
index is used, BTW? MySQL has some weird conditions under which it
ignores indexes.) It certainly is slow on my system.
Since this is one of the most commonly accessed functions, we should
really do some performance tuning here, especially as we will sooner or
later have to do stuff like the aforementioned bot-filtering, making
queries even more complex. I noticed that there is already a
recentchanges table that new rows are inserted into upon edits. Is a
move away from SELECTs on the CUR table already in the works? If so,
what is its current status and who's doing it?
Regards,
Erik Moeller
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Can you set up an alias for wikidown(a)bomis.com? This alias should
point to everyone with login access on the wikipedia server. Once it
is set up, and tested, then someone can change the error message on the
wikipedia server to tell people to report errors to that address.
Anyone who wants to opt out of this, or to have those emails routed to
a pager or cell phone, please let Jason know...
--Jimbo
tarquin wrote:
> Brion -- bad news.
>
> I have already found a bug:
No -- good news! You found it, means we can fix it. :)
> on Wikipedia-FAQ, there is an interlink to [[en:Wikipedia:FAQ]].
> However, the accent twweak leads to Wikipédia:FAQ on the *english* pedia
> -- which doesn't exist.
Hmm, it probably shouldn't be trying to parse the namespace of a link
with an interwiki not the current one.
-- brion vibber (brion @ pobox.com)
One key problem with a wiki encyclopaedia is that there's no quality
control whatsoever. An article may have been vandalized 5 seconds ago,
or be grossly non-NPOV etc. As we get more and more articles, this
problem becomes more urgent.
Fortunately, the solution is rather simple. Articles can be certified by
contributors to be high quality. But who is allowed to certify articles?
The system works by allowing groups of people to form certification
teams. Anyone can submit a new team to be created, and anyone can apply
to join an existing team and certify articles in its name. Users can
then decide to view only article revisions certified by members of
selected teams.
So I could decide in my user preferences:
Certification: Approved Teams
Team Nupedia
Team Wiki-Fiction
Team Wiki-Maths
Then there would have to be a way to display certified article
revisions. This could be accomplished by having a "Certified Mode",
showing *only* articles that have received certs, with the most recently
certified revision shown. Somewhat weaker, where an article has been
certified, a link "There is a version of this article certified by Team
X" could be placed above the article, showing the certified revision
when clicked (or a text "This article has been certified by .." if the
current revision is the certified one). This could be the default view,
making users aware of the cert system.
Each team could have its own quality standards, policies, and subject
preferences. I suggest that the creation of new teams would have to be
approved by the Wikipedia cabal to avoid "Team Trolls". New team members
would either be voted on or approved by team members that have a certain
status flag ("can_approve_newcomers"). Teams could get their own
namespace as well.
A decision would have to be made as to which teams to include in the
default view, i.e. the one that anonymous and newly registered users
get. In the short term such decisions may be made by the cabal, in the
long term I would prefer voting.
Implementation:
There needs to be a teams table that at least has a TID and a TDesc, and
a team-member table that links TIDs and UIDs and grants individual users
certain permissions within the team, as well as a "pending" status flag
for newly applied members. A TCert table would link UIDs, TIDs certs and
article IDs (must include timestamp).
I am currently assuming that certifications would be simple "Good
article" binary flags. My reasoning is that a rating of "good, but not
good enough" may not be very helpful. Still, practice may prove that
wrong, and the table should therefore be designed to accomodate possibly
more flexible ratings.
Giving each individual member the power to rule an article certified in
the name of the whole team may be undesirable. Thus, teams should be
configurable to set their own min_number_of_certs, where certifications
would only be valid if so many team numbers agree that the article is
complete and high quality.
Finally, the PHP scripts of course have to be updated to reflect this
functionality. There needs to be some way to apply for team membership,
to approve team membership, and to submit a new team for cabal approval.
A module listing all articles approved by a certain team would be nice
and could substitute the "Brilliant Prose" page.
Results:
--------
If this works as intended, it should solve the quality problem and allow
users to browse Wikipedia as a high quality content only encyclopaedia.
The more teams you would admit to your personal filter, the more content
you would see, but quality standards of individual teams might not be up
to par. By distributing the job of quality approval on several team
leaders, we can get competition of quality standards and social methods,
which is probably a good thing and reduces social problems.
Potential problems:
-------------------
If too many people use highly customized views, caching will get harder.
I don't see this as too big a problem as a) most people typically don't
customize views, b) article retrieval is already very fast with or
without caching.
Too many teams may have undesired effects, such as teams deliberately
inserting POV articles to certify them. This is not a problem with the
team principle per se but with the way teams are approved and moderated.
Generally, teams should have a clear NPOV commitment and respect
Wikipedia policy, otherwise they should be deleted.
Comments on this would be appreciated. This is something I probably
won't have time to implement fully, but I will gladly help with any/all
efforts. I consider it very necessary for Wikipedia in the long term.
Regards,
Erik Moeller
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Magnus Manske wrote:
> Can I now redirect [[en:xyz]] to [[de:xyz]] and back, keeping the system
> busy?
Um, yes. Please don't do that. ;)
Your browser should be smart enough to give up after the first few.
> Also, I don't see any reason why one would redirect to another language.
> If one reads the en wikipedia, I doubt s/he wants to be redirected to,
> say, a chinese article. IMHO that should be restricted to wikipedia,
> meta and the 9-11 site.
Well, what's the benefit in not being able to? (Once the kinks are
smoothed out, which you're more than welcome to do.)
Which reminds me -- not for redirecting, but for interwiki linking in
general. Do we want to allow convenient [[Site:Title]] linking to other
non-Wikipedia wikis?
Eg, [[MeatBall:InterWiki]] would get you to
http://www.usemod.com/cgi-bin/mb.pl?InterWiki
I imagine this would be of use mainly in discussion pages, though
perhaps there's some good reference material out there.
-- brion vibber (brion @ pobox.com)
Mav wrote in part:
>We should only be making magical inter-language links to and from pages that
>directly relate to our main goal (hint: creating a huge multilanguage, NPOV
>encyclopedia).
Then you believe that we shouldn't make magical links
to September 11 memorials either?
-- Toby
> I think we should adopt a principle similar to KeptPages for
> RC logs, that is, to maintain a consistent log for 14 days or
> so, no matter how many entries happen in those days, instead
> of just saying "the last 5000 changes". So I guess we should
> just DELETE FROM the RecentChanges table all entries that are
> older than n days regularly.
Either should be fine, but keeping it at a constant 5000 should
make it faster. I created the table originally in an attempt to
speed up the RC page, and also to generate the new formats, which
gathered multiple changes to the same page. Alas, no matter how
many methods I tried for that, none of them had acceptable
performance, so I stepped back to think about it for a while, and
never got around to actually using the table even for the present
RC format. It should be simple to do that; I'll test Magnus's
fix and install it if all is well.