Re: [Wikipedia-l] Re:WikiIndex (idea)

List overview All Threads
Download

newer

older

Concern related to copyright...

Re: [Wikipedia-l] New Wikistats

Marc O'Morain

17 Mar 2005 17 Mar '05

4:50 p.m.

After thinking some more about the merits of the augmenting the category system or creating a new system, I thought of the following:

The Category tag can only ever represent an "is a" relationship.

By adding the Category:Human tag to Liam Neeson, we are really trying to convey that Liam Neeson IS A human. The category system is perfect for defining this kind of information. However, the category system cannot be used to define any other kind of relationship. For example, in the Anne Frank article, it is not currently possible in wikipedia to have an "author of" relationship.

It would therefore be a good idea to add an AuthorOf tag to wikipedia. There are many other relationships like this that could be defined. Each such relationship should have a well defined domain and co-domain

(For those that are unfamiliar with these terms, I will explain them with an example). Consider the "Author of" relationship:

(Lewis Carroll) AuthorOf (Alice's Adventures in Wonderland) AuthorOf is a relation, that maps a Person to a Book. We say that the domain of AuthorOf is People, and the co-domain of AuthorOf is Books.

Each new relation defined in wikipedia would have to have a domain and a co-domain associated with it. For the sake of example, I will create the following syntax to define a new relationship:

Relation: AuthorOf Domain: Person CoDomain: Book

Because we have defined the domains of the AuthorOf relation, we can now do some very powerfull stuff with wikipedia:

Imagine Dave is a user of wikipedia, who does not know about defining categories. He is doing some research on the web on his favourite author, Lewis Carroll, and notices that there is no page on wikipedia about him. He goes ahead and creates the page, and he also creates two pages for Lewis' most famour books: Alice's Adventures in Wonderland and Through the Looking Glass. Dave does not add his new pages to any category.

At the same time Mike is looking at the recent changes, and sees the new editions on Lewis Caroll, and decides to have a look at the pages. He notices that Dave has not defined the AuthorOf relationship, so on the Lewis Caroll page Mike adds:

[[AuthorOf:Alice's Adventures in Wonderland]] [[AuthorOf:Through the Looking Glass]]

Remember that Wikipedia knows that the domain of the AuthorOf relation is Person, and that the co-domain is Book, so it can automatically add Category:Person to Lewis Caroll, and Category:Book to Alice's Adventures in Wonderland and Through the Looking Glass

We can also define an inverse of AuthorOf: consider the revised syntax:

Relation: AuthorOf Inverse: WrittenBy Domain: Person CoDomain: Book

Now when someone defines that Lewis Carroll was the AuthorOf Through the Looking Glass, Wikipedia will automatically know that Through the Looking Glass was written by Lewis Carroll.

Mikes adds the tag [[AuthorOf:Through the Looking Glass]] to Lewis Caroll, and Wikipedia will automatically add [[WrittenBy:Lewis Carrol]] to the Through the Looking Glass page.

What does anyone else think?

-- Marc O'Morain

Show replies by date

Brion Vibber

17 Mar 17 Mar

5:03 p.m.

New subject: WikiIndex (idea)

Marc O'Morain wrote:

...

After thinking some more about the merits of the augmenting the category system or creating a new system, I thought of the following:

The Category tag can only ever represent an "is a" relationship.

Actually, the [[category:x]] system doesn't represent any fixed type of relationship at all.

I hate to use the demon term "folksonomy", but... ;)

-- brion vibber (brion @ pobox.com)

Marc O'Morain

5:16 p.m.

New subject: WikiIndex (idea)

On Thu, 17 Mar 2005 17:03:34 -0800, Brion Vibber brion@pobox.com wrote:

...

Marc O'Morain wrote:

...
After thinking some more about the merits of the augmenting the category system or creating a new system, I thought of the following:

The Category tag can only ever represent an "is a" relationship.

Actually, the [[category:x]] system doesn't represent any fixed type of relationship at all.

I see your point. I am confident that my definition holds for pages that are tangible things - pages about people, places etc. My definition does not hold for other types of thing:

Usability_testing is part of Category:Software_engineering Usability_testing is not a Software_engineering :)

Neil Harris

18 Mar 18 Mar

3:51 a.m.

New subject: WikiIndex (idea)

Marc O'Morain wrote:

...

[some good stuff]

Marc, have you taken a look at RDF, and, more to the point, n3, which is a simplified notation that is equivalent to RDF? You are quite right, our current categories approximate is-a relationships (although there are some subtleties that they completely skate over).

At the risk of repeating myself, it would be really neat if we could write something like

[[attribute = value]]

in articles, since we could then use it to generate all sorts of good stuff, including automatic rendering of navigation tables in the UI, structured search, and autogeneration of RDF for semantic search engines.

-- Neil

Delirium

3:02 p.m.

New subject: WikiIndex (idea)

Neil Harris wrote:

...

At the risk of repeating myself, it would be really neat if we could write something like

[[attribute = value]]

in articles, since we could then use it to generate all sorts of good stuff, including automatic rendering of navigation tables in the UI, structured search, and autogeneration of RDF for semantic search engines.

Not to mention that this data can be reused. For example, there are all sorts of attributes on the Wikiproject Mountains pages. If I could specify "height=1377 meters" somewhere, then it could *both* be used to generate the infobox, *and* used to sort mountains by height and whatnot. It'd also make the markup easier, because the current tables, even with the simplified table notation, are pretty hard to read for a layperson...

-Mark

Alfio Puglisi

19 Mar 19 Mar

2:46 a.m.

New subject: WikiIndex (idea)

On Fri, 18 Mar 2005, Delirium wrote:

...

Neil Harris wrote:

...
At the risk of repeating myself, it would be really neat if we could write something like

[[attribute = value]]

in articles, since we could then use it to generate all sorts of good stuff, including automatic rendering of navigation tables in the UI, structured search, and autogeneration of RDF for semantic search engines.

Not to mention that this data can be reused. For example, there are all sorts of attributes on the Wikiproject Mountains pages. If I could specify "height=1377 meters" somewhere, then it could *both* be used to generate the infobox, *and* used to sort mountains by height and whatnot.

Maybe we can leverage the existing templates for this? Say we have a [[Template:Mountain]] with parameters such as height and geographical position and others. The wikicode would look like:

I would assume such a template already exists. Then each page that uses the template has automatic attributes of the form "Mountain_height=1377", "Mountain_name=Foo", etc. When one searches for "mountain", look up if a suitable template exists. Now you can offer a list of mountain by height, position and so on.

Alfio

Marc O'Morain

2:52 a.m.

New subject: WikiIndex (idea)

On Fri, 18 Mar 2005 11:51:45 +0000, Neil Harris neil@tonal.clara.co.uk wrote:

...

At the risk of repeating myself, it would be really neat if we could write something like

[[attribute = value]]

A minor niggle:

The = operator should be used for equailty, not for assignment :) [[Attribute := Value]] Would be far nicer!

Does anyone know of any existing RDF schemas for describing the sets of the information that may be in an encyclopedia? History, People, Places etc.

-- Marc O'Morain

Phil Sandifer

10:10 a.m.

New subject: Edit speed choking

I know en is suffering from a seeming increase in the number of people who engage in high speed vandal attacks. My guess is other Wikipedias, if they don't already get this, will. Certainly, there are some technical measures in place and being put in place. Rollbacks on page moves are nice, plugging the interwiki redirect hole is very nice. But new problems will come up, and some of the existing problems (Page creation vandalism) won't go away with an easy technical solution addressing that specific problem.

What would be the result/problem/whatever of an edit speed throttle on new accounts. I'm thinking an edit a minute for the first 100 edits. I know edit count is a resource intensive query, so presumably some sort of technical wizardry would need to be come up with to check whether or not the throttle should be set. I'd assume the easiest way to do this would be to only check to see if the throttle should be lifted. That is, have the edit count only be employed if the editor is currently throttled - once 100 edits hits, the check wouldn't need to be performed anymore. Perhaps a new function could also be written that would be less database intensive but only check edits up to a certain number. (That is, instead of doing horrifyingly huge checks on users like Rambot who have millions of edits, once it notices that the number exceeds 100, it stops)

I don't know the tech answers here as I'm not a dev. My main two questions are:

1) Is an edit throttle feasable? 2) What good reasons are there to not throttle contributors for their first 100 edits so that they cannot launch widespread changes? (That is, does anything that people tend to do in their first 100 edits actually require editing more than once a minute?)

-Snowspinner

20 Mar 20 Mar

6:48 a.m.

New subject: Edit speed choking

On Sat, 19 Mar 2005 12:10:39 -0600, Phil Sandifer sandifer@sbcglobal.net wrote:

...

What would be the result/problem/whatever of an edit speed throttle on new accounts. I'm thinking an edit a minute for the first 100 edits. I know edit count is a resource intensive query,

This sounds like a great idea. It's important precisely because it increases the leverage serious contributors have over repeat vandals.

As to the technical aspect: if a separate table like "user_data" has a field "edit_count" that is updated every time the user makes an edit, it will not be so expensive to record (on-edit actions are comparatively reasonable), and will be cheap to query. That would not even be a particularly difficult database-change to make, since it could be a new table, rather than a new column in an old one.

...

I don't know the tech answers here as I'm not a dev. My main two questions are:

Is an edit throttle feasable?

What good reasons are there to not throttle contributors for their

first 100 edits so that they cannot launch widespread changes? (That is, does anything that people tend to do in their first 100 edits actually require editing more than once a minute?)

None that I can think of. While we're at it, we could disable the "move page" tab for the same period of time. As you(?) mentioned elsewhere, a site like slashdot has a permanent edit throttle, without noticeably dampening user enthusiasm..

-- +sj+

Richard Holton

9:40 a.m.

New subject: Edit speed choking

On Sun, 20 Mar 2005 09:48:20 -0500, Sj 2.718281828@gmail.com wrote:

...

On Sat, 19 Mar 2005 12:10:39 -0600, Phil Sandifer

...

...

What good reasons are there to not throttle contributors for their

first 100 edits so that they cannot launch widespread changes? (That is, does anything that people tend to do in their first 100 edits actually require editing more than once a minute?)

None that I can think of. While we're at it, we could disable the "move page" tab for the same period of time. As you(?) mentioned elsewhere, a site like slashdot has a permanent edit throttle, without noticeably dampening user enthusiasm..

As a new user, you might want/need to edit the same page several times in succession to get things right. Any edit count based throttling should take that into account.

-- Rich Holton

en.wikipedia:User:Rholton

David Gerard

10:33 a.m.

New subject: Edit speed choking

Richard Holton wrote:

...

As a new user, you might want/need to edit the same page several times in succession to get things right. Any edit count based throttling should take that into account.

We do have a 'preview' button. I expect it would merely train people to preview their work more.

(Of course, then their browser will crash and it'll be our fault they didn't save two hours' intense research and writing ...)

- d.

Andre Engels

10:47 a.m.

New subject: Edit speed choking

On Sun, 20 Mar 2005 18:33:53 +0000, David Gerard fun@thingy.apana.org.au wrote:

...

Richard Holton wrote:

...
As a new user, you might want/need to edit the same page several times in succession to get things right. Any edit count based throttling should take that into account.

We do have a 'preview' button. I expect it would merely train people to preview their work more.

Perhaps. It could also easily have the effect of getting them irritated and leave. "Let's make this wonderful article.... Ready, submit." "Oops, I made a typo... Let's correct it." "You have already made an edit this minute." "Okay, wait a minute... Let's load this other website in the meantime." "Ah, finally... Oooh... Another typo." "You have already made an edit this minute." "Ah well, if they don't want me, that's their problem. Let's go to that other cool website."

Andre Engels

Till Westermayer

21 Mar 21 Mar

1:24 p.m.

New subject: Edit speed choking

. . . . . . . . . . . . . . . . . . . . . . . . . . . till we *) . . .

Hi Andre,

...

Perhaps. It could also easily have the effect of getting them irritated and leave. "Let's make this wonderful article.... Ready, submit." "Oops, I made a typo... Let's correct it." "You have already made an edit this minute." "Okay, wait a minute... Let's load this other website in the meantime." "Ah, finally... Oooh... Another typo." "You have already made an edit this minute." "Ah well, if they don't want me, that's their problem. Let's go to that other cool website."

So the error message should make it clear that this doesn't happen when you are logged in and that there is a preview function. __ . / / / / ... Till Westermayer - till we *) . . . mailto:till@tillwe.de . www.westermayer.de/till/ . icq 320393072 . Hirschstraße 5. 79100 Freiburg . 0761 55697152 . 0160 96619179 . . . . .

David Gerard

20 Mar 20 Mar

10:29 a.m.

New subject: Edit speed choking

Sj wrote:

...

On Sat, 19 Mar 2005 12:10:39 -0600, Phil Sandifer sandifer@sbcglobal.net wrote:

...

...
What would be the result/problem/whatever of an edit speed throttle on new accounts. I'm thinking an edit a minute for the first 100 edits. I know edit count is a resource intensive query,

...

This sounds like a great idea. It's important precisely because it increases the leverage serious contributors have over repeat vandals. As to the technical aspect: if a separate table like "user_data" has a field "edit_count" that is updated every time the user makes an edit, it will not be so expensive to record (on-edit actions are comparatively reasonable), and will be cheap to query. That would not even be a particularly difficult database-change to make, since it could be a new table, rather than a new column in an old one.

Or the isNewbie() function (I think that's the name) checked before someone can do a page move. "Is newbie or anon, last edit under 60 seconds ago? Sorry, please wait a few seconds!"

Probably need some provision for authorised bots.

[cc: to wikitech-l]

- d.

Evan Prodromou

18 Mar 18 Mar

1:40 p.m.

New subject: WikiIndex (idea)

On Fri, 2005-18-03 at 00:50 +0000, Marc O'Morain wrote:

...

After thinking some more about the merits of the augmenting the category system or creating a new system, I thought of the following:

...

What does anyone else think?

I think that the best thing to do is use RDF for metadata, and use Turtle to save in-page RDF.

http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/

RDF is _the_ Internet metadata format. Turtle is the Wikitext of RDF: a plain, easy way to do metadata markup that anyone can learn in a few minutes.

I plan on incorporating Turtle markup into MediaWiki 1.5.x, probably as part of a larger overhaul of the RDF generation modules to use RAP (the RDF API for PHP).

~Evan

-- Evan Prodromou evan@bad.dynu.ca

7219

Age (days ago)

7222

Last active (days ago)

wikipedia-l@lists.wikimedia.org

14 comments

12 participants

tags (0)

participants (12)

Alfio Puglisi
Andre Engels
Brion Vibber
David Gerard
Delirium
Evan Prodromou
Marc O'Morain
Neil Harris
Phil Sandifer
Richard Holton
Sj
Till Westermayer