I have started to take a look at the database structure. I think
eventually I'll be able to come up with some ideas for speeding things
up. So far I've just been playing with the data.
The following SQL statements search the Recent Changes tables.
Unfortunately, without subqueries it gets cumbersome.
-- Changes by multiple users
select rc_user_text, rc_title, rc_comment from recentchanges where
rc_user in (4369, 188, 2376) order by rc_timestamp desc limit 999
-- Changes by sysops
select rc_user_text, rc_title, rc_comment from recentchanges where
rc_user in (100, 1078, 1104, 1123, 1131, 1132, 1300, 150, 166, 17, 188,
2, 24, 29, 30, 300, 31, 3113, 3362, 34, 369, 38, 39, 4, 43, 488, 51,
513, 584, 59, 62, 66, 68, 733, 83, 86, 90, 94, 95, 97, 99) order by
rc_timestamp limit 999
Other SQL statements:
-- blocked IP's sorted by blocker
select user_name, ipb_address, ipb_reason, ipb_timestamp from ipblocks,
user where ipb_by = user_id order by user_name, ipb_timestamp
-- Anonymous edits
select cur_user_text, cur_title, cur_comment from cur where cur_user=0
order by cur_timestamp desc limit 99
-- Vandal track
select rc_user_text, rc_title, rc_comment from recentchanges where
rc_user_text = '213.229.14.90' order by rc_timestamp desc
Ed Poor
I think that a subject classification of articles would vastly improve
"soft security" and would save regulars a lot of time, since not
everyone would have to check every edit as currently seems to be the
case.
>I'd still like to see if we couldn't build those subjects
>automatically in some way based on links in the database.
How about this: the possible topics coincide with the major pages
listed on [[Main Page]] (from "Astronomy" to "Visual Arts"). The
shortest link path from such a topic page to an article defines that
article's topic. If there is no such path, then the article is
classified as a topic orphan.
To compute these topics quickly, the cur table gets two new columns:
topic and distance, where distance stands for the link distance from the
Main Page topic page. If a new article is created, looking at the
distance entries of all articles that link to the new one, and taking
the minimum, immediately classifies the new one. If an existing
article is saved, the topic and distance entries of all articles it
links to (and their children) may need to be updated; these changes
can be propagated in a recursive manner.
Would that work?
Axel
> Not that i have objections against this change, but
changes to the
> software/userinterfase of the Wikipedias should be
announced formaly
and
> well in advance.
> Somthing like this;
> Give 3 weeks notice.
>This is way too long. I don't have time to track each
>individual patch over three weeks, especially since
patches >become obsolete quite quickly as other
changes are committed.
Please do consider that it is probably not worth
telling us anything under at least a week notice. Not
enough time to understand, then to translate, then for
people to react, then to translate again. If a patch
is so necessary that it can't wait a week, please
proceed without asking. Otherwise, please consider we
are maybe not in such a hurry that the other
wikipedians don't have time to maybe give their point
of view or just to be informed.
>Besides, intwiki-l is not for feature discussion (see
the list >description). wikitech-l is, and if feature
discussion concerns >wikipedia policies, it should be
copied/moved to wikipedia-l.
I am officially asking that the international
wikipedia list be discarded. It is useless, and
providing the false feeling to international people
that it is "their" list, and that they don't need to
register to the *main* *english* one. This is very
misleading. It would be better not to pretend things,
but to make them clear.
Please also make official wikipedia-l and wikitech-l
are the only lists that are worth considering as far
as general policies are concerned for similar reasons.
However, also realise that just doing so will prevent
international wikipedias to have any real involvement
in general policy matters. Today, in 5 hours, my mail
box received 85 messages. I do not feel fluent in
english enough to read all of them. I'll discard some
of them, hoping no essential issue for us was raised,
for I know nobody else will tell them.
>Notice that the above policy is not used by anyone
else either, >so I don't even see why you bring it
up.
So, if I follow you, not having other people follow
policies is enough reason for you not to follow them
enough ? If so, why are these policies still there ?
Let's discard them to.
>Members of the international Wikipedias that want to
take part >in feature discussions should subscribe to
wikitech-l and/or >wikipedia-l, if we spread this
stuff over three lists, we'll >never get anything
done.
Probably true. But, that is not what is widely
understood among "foreigners" though. Hence, I think I
am gonna remove international-l from our mailing list
page, let's be blunt and realistic. I already
understood quite many years ago that not understanding
english would just get me nowhere.
>What I can agree on is to send strings that need to
be >translated to intwiki-l before committing a
change, and to >wait a few days for translations to
come in, then to commit >the change together with the
translations. But if not all >translations come in
within a reasonable amount of time, they >will have
to be added later, meaning that the user interface
>will have some English in it until the translations
are
>submitted. If you check some of the international
Wikipedias, >you will notice that this is the case
for quite a few of them, >for features with which I
had nothing to do.
We can manage with that; we have no choice anyway if
we want to participate a bit but to understand
english; so a couple of features in english won't be
much trouble (though some will protest-their problem).
However, to tell the truth, I won't be able to
translate your feature for the very good reason that
though I tried to concentrate very hard on it, I have
not being able to understand what that checkbox in the
watch list was all about. I just couldnot figure what
you were talking about.
So, could you spend a tiny little bit of your time
just telling us what is going to appear ? I am sure it
is a good choice, since everybody seems to agree with,
but I am just curious.
Also, I have read your proposition (that if I
understood well you decided to implement today if
nobody complained about it...maybe some answers are in
the 85 messages...) about the counting of articles.
Please could you tell me whether it will apply to
international wikipedias or just to the english one ?
If it does, I am not sure I understood well when the
article will stop being considered a bot or a stub
- when at least 2 edits have been made after the
creation, whoever the authors of the edits are ?
- 2 edits by authors, creators excluded ?
- 2 edits by 2 different authors, creators excluded ?
Though I understand well the interest of this (and
definitly support the change of count to exclude some
small or automatic entries), I would be happy then if
you could provide a system to list articles that are
*above* a certain number of characters (no stub) and
not automatically generated.
I think your system is gonna exclude some specialized
articles I think would deserve accessing to the status
of articles. I would be glad - in my own field of
expertise - to go and humanly edit them enough for
them to be considered real articles.
I hope I didnot misunderstand entirely what you were
planning to do. When I don't understand things, I
usually wait for further discussion to enlighten me,
but here, I didnot see much discussion. So...I am not
sure I understood well.
Thank you in advance for your answers
__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2
Forward of a message by Ec that was submitted to Wikipedia-L (IMO a great set
of ideas)
Daniel Mayer wrote:
>Somebody else had the idea of creating different types of /useful/
> categories such as Wikipedians who have knowledge of certain subjects. I
> think this a great idea that should be integrated with the Wikipedia:Help
> Desk page along with the Wikipedians page. These types of categories are
> useful because they further goals of the project.
In conjunction with this, I believe it was Larry who wanted some sort of
article subject classification. Tying these two ideas together with the
recent changes page. Each article could have a few boxes where the
writer could enter category codes. The Wikipedians who are interested
in that category (i.e. potential editors) could enter themselves on the
category list for that code, or perhaps enter a list of these codes in
their preferences. Recent changes could as a default function as it
does now, but could optionally provide a categorized list based on one
or more of the codes in the person's preferences.
Thus if we were to use the Library of Congress letter classifications
(which has the benefit of being usable without reinventing the wheel)
all articles concerned with mathematics would receive a "QA" code. A
person interested in mathematics could add the "QA" code to his
preferences, and on request would be able to receive a list of the
recent changes in articles about mathematics. Having several boxes
associated with an article, allows for the fact that some articles may
be relevant to more than one category. An article on mathematical
applications in psychology would have both a "QA" and a "BF" code.
Is this technically feasible?
Eclecticology
Message: 8
Subject: Re: [Wikitech-l] Check boxes and other
articles counts
From: Erik Moeller <e.moeller(a)fokus.gmd.de>
To: wikitech-l(a)wikipedia.org
Date: 12 Nov 2002 11:06:05 +0100
Reply-To: wikitech-l(a)wikipedia.org
Hi Anthere,
>> Please do consider that it is probably not worth
>> telling us anything under at least a week notice.
Not
>> enough time to understand, then to translate
>Um, how much work is it to translate two text
strings?
I think there was a misunderstanding there. The time
delay was to discuss something new, not for the
translation. A translation can be done very quickly of
course. It can be changed later on if people don't
like it - by the people who don't like it ;-) (which
is why some part of the fr.wiki are still in english,
some don't like it, but they don't do the job
themselves...)
>That it may be
>hard to understand in some cases is true, we probably
need a >demo wiki
Yes, this would be nice.
>> I am officially asking that the international
>> wikipedia list be discarded.
>I'm not the right person to talk to here, but I tend
to agree.
There is no *right* person. Everybody should be part
of the right person. But the majority don't care, so
it won't be done.
>> However, also realise that just doing so will
prevent
>> international wikipedias to have any real
involvement
>> in general policy matters. Today, in 5 hours, my
mail
>> box received 85 messages. I do not feel fluent in
>> english enough to read all of them.
>Your English is more than good enough to be no excuse
;-). >Traffic can get a bit demanding, I agree.
Nope. Some people are easier to understand than
others. And 85 messages is more than demanding :-)
I got totally lost in Larry's very long and
circonvoluted messages and multiple answers. For sure,
now I have little idea left of who propose what.
And I didnot understand tmc at all;-)
>> However, to tell the truth, I won't be able to
>> translate your feature for the very good reason
that
>> though I tried to concentrate very hard on it, I
have
>> not being able to understand what that checkbox in
the
>> watch list was all about. I just couldnot figure
what
>> you were talking about.
>OK, here's how it will look. You edit an article:
>[Save page] [Show preview]
>Any clearer?
Absolutely. Great idea ! I like it.
>That's a misunderstanding, I'll change the navigation
bar of >the English Wikipedia only as agreed upon on
wikipedia-l. The >article count is on my to do list,
but if anyone else wants to >implement this, be my
guest. If implemented, it will likely be >done in such
a way that it will still be possible to use the
>previous count.
It sometimes is very hard to distinguish what is gonna
be only on the en. and what is gonna be everywhere.
I thank you very much for your answers Erik.
__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2
> On Tuesday 12 November 2002 00:20, Bridget [name omitted for privacy reasons] wrote:
> > Code a means by which all edits by a certain user can be undone.
>
> What happens if someone else later edits the same article?
It should only be applied to articles that have not received later edits.
Those that have are likely un-vandalized already.
This belongs on wikitech-l, discussion should continue there. I won't code
this (too much in my pipe already).
Regards,
Erik
--
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!
Hi,
after some browsing around in the code, I've found a major bottleneck:
the current internal link resolution system.
If you have a local Wikipedia install, try it for yourself: Visit a few
very link heavy pages, like [[List of reference tables]]. You will
notice that the rendering speed depends directly on the number of
internal links (not on the actual text size, or formatting tags). The
link cache seems to work as page rendering gets faster as you load a
page repeatedly, but it remains rather slow for link-heavy pages.
For further proof, edit OutputPage.php and in the function
replaceInternalLinks.php just add a return $s at the top. As a result,
internal links will no longer be resolved. Now browse a page (optimally
open Main Page before the change and follow links from there after it)
and notice that the rendering is lightning fast.
The internal link resolution process is quite complex. I notice that
lots of Title objects are created, and we have lots of ID lookups. I
won't speculate too much about the cause until I have had time to
examine them closer. But that this is a bottleneck is certain.
I believe we need to make use of the links and brokenlinks table for the
actual page rendering. Right now they seem to be used only for the
special pages.
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Can somebody unblock the database at least for logged-in users?
An uneditable wiki is a dead wiki and I've already mentioned that people can
read our articles via Google cache so it is stupid to disallow edits just so
people can read.
The reverse would make more sense in these situations; not serving any pages
to non-logged-in browsers (which I don't think is possible but would be a
nice alternative to the current db block).
-- Daniel Mayer (aka mav)