In a message dated 4/3/2008 11:24:51 A.M. Pacific Daylight Time, wikipedia.kawaii.neko@gmail.com writes:
I have seen fair share of article butchering through mass redirectification.>>
-------------------------- Could you put this in language my third-grade intellect can understand? Do you mean something like gutting an article with a simple redirect to another article? Wouldn't the article history contain the original article, that could be at least saved to user-space?
Will
**************Planning your summer road trip? Check out AOL Travel Guides. (http://travel.aol.com/travel-guide/united-states?ncid=aoltrv00030000000016)
On 2008.04.03 16:26:45 -0400, WJhonson@aol.com scribbled 0.6K characters:
In a message dated 4/3/2008 11:24:51 A.M. Pacific Daylight Time, wikipedia.kawaii.neko@gmail.com writes:
I have seen fair share of article butchering through mass redirectification.>>
Could you put this in language my third-grade intellect can understand? Do you mean something like gutting an article with a simple redirect to another article? Wouldn't the article history contain the original article, that could be at least saved to user-space?
Will
Yes, in theory that could be done. And in theory undeletion is a useless admin power as we could just copy stuff from the database dumps, and in theory we could add all sorts of barrier to registration and reduce vandalism that way, and in theory we don't need any editing features in MediaWiki - just let people find buffer overflows in PHP and do their editing of the database through shellcode.
In practice, of course, redirection is as good as deletion, except in the rare case someone finds a discussion about it or has a reason to wonder where the heck her article went. It only has to happen once, and there's plenty of ways. A bot might edit it, or perhaps you get distracted and don't visit Wikipedia for, say, three days? It's hard to tell the difference between a page you're not seeing any edits to because nobody is editing it, and a page you're not seeing any edits to because it's now a redirect...
(I'd note that one good way of fixing this problem would be to improve the raw watchlist - filter out or otherwise mark redirects. You could then combat the above by keeping your watchlist clean of redirects and occasionally scanning for untoward new ones.)
-- gwern FIPS140 the Enforcers Defcon supercomputer GSGI SAW DEVGRP Texas Yakima
On Thu, Apr 3, 2008 at 11:26 PM, WJhonson@aol.com wrote:
In a message dated 4/3/2008 11:24:51 A.M. Pacific Daylight Time, wikipedia.kawaii.neko@gmail.com writes:
I have seen fair share of article butchering through mass redirectification.>>
Could you put this in language my third-grade intellect can understand? Do you mean something like gutting an article with a simple redirect to another article? Wouldn't the article history contain the original article, that could be at least saved to user-space?
Will
**************Planning your summer road trip? Check out AOL Travel Guides. ( http://travel.aol.com/travel-guide/united-states?ncid=aoltrv00030000000016 ) _______________________________________________ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Let me explain
- There exists an article "Alpha" - There is a problem with article "Alpha" - Article "Alpha" does not follow all guidelines and/or policies - A user comes in and removes all content from Article "Alpha" and redirects it to article "Beta" which has a related coverage. - No content is added to article "Beta" in this process - User calls this action a merge which is an editorial decision independent from AFD - Even pages of failed AFDs can be redirectified
This is what is happening in a nutshell. The topics this is happening range from fiction related articles (episodes/characters) to highways (real world related) as well as Townships among other topics.
- White Cat
Yes, and the only solution is a frank admission that redirect is a form of delete, and that a change to redirect if challenged requires an AfD, and a change in deletion policy to say this. ditto for a destructive merge, but it would be much harder to quantify what amounts to "destructive" in such cases.
On Fri, Apr 4, 2008 at 2:37 PM, White Cat wikipedia.kawaii.neko@gmail.com wrote:
On Thu, Apr 3, 2008 at 11:26 PM, WJhonson@aol.com wrote:
In a message dated 4/3/2008 11:24:51 A.M. Pacific Daylight Time, wikipedia.kawaii.neko@gmail.com writes:
I have seen fair share of article butchering through mass redirectification.>>
Could you put this in language my third-grade intellect can understand? Do you mean something like gutting an article with a simple redirect to another article? Wouldn't the article history contain the original article, that could be at least saved to user-space?
Will
**************Planning your summer road trip? Check out AOL Travel Guides. ( http://travel.aol.com/travel-guide/united-states?ncid=aoltrv00030000000016 ) _______________________________________________ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Let me explain
- There exists an article "Alpha"
- There is a problem with article "Alpha"
- Article "Alpha" does not follow all guidelines and/or policies
- A user comes in and removes all content from Article "Alpha" and
redirects it to article "Beta" which has a related coverage.
- No content is added to article "Beta" in this process
- User calls this action a merge which is an editorial decision
independent from AFD
- Even pages of failed AFDs can be redirectified
This is what is happening in a nutshell. The topics this is happening range from fiction related articles (episodes/characters) to highways (real world related) as well as Townships among other topics.
- White Cat
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On Fri, Apr 4, 2008 at 3:56 PM, David Goodman dgoodmanny@gmail.com wrote:
Yes, and the only solution is a frank admission that redirect is a form of delete, and that a change to redirect if challenged requires an AfD, and a change in deletion policy to say this. ditto for a destructive merge, but it would be much harder to quantify what amounts to "destructive" in such cases.
I hate this more than anything else I find as a reader on Wikipedia, when I am looking for some fact. When I go to some article, and it's a redirect, and the target article is not a synonym for the word, and doesn't even contain the word... argh. I can't think of one off the top of my head, but I'm sure everyone has had this happen a few times.
Seems like something a bot could weed out pretty easily. Redirects to pages that don't include the redirect text in the article. Sometimes they are synonyms, misspellings etc, so a human might have to review.
I feel like if the redirect can't be to a particular section of the target article, or the word isn't a simple synonym or alternate spelling, a redirect is inappropriate. Call me crazy. :)
Judson Dunn wrote:
Seems like something a bot could weed out pretty easily. Redirects to pages that don't include the redirect text in the article. Sometimes they are synonyms, misspellings etc, so a human might have to review.
As long as the method of fixing the problem isn't to delete the redirect, mind you. Otherwise the stealthy deletion of material via conversion to redirect will become even more prevalent and permanent.
What I'd like is some kind of report that lists redirects that have long edit histories, that seems like a good way to find "hidden" and "lost" material for eventual rescue.
That may be very difficult. Such a query would be very expensive both CPU-wise and BW-wise.
On Sat, Apr 5, 2008 at 1:12 AM, Bryan Derksen bryan.derksen@shaw.ca wrote:
Judson Dunn wrote:
Seems like something a bot could weed out pretty easily. Redirects to pages that don't include the redirect text in the article. Sometimes they are synonyms, misspellings etc, so a human might have to review.
As long as the method of fixing the problem isn't to delete the redirect, mind you. Otherwise the stealthy deletion of material via conversion to redirect will become even more prevalent and permanent.
What I'd like is some kind of report that lists redirects that have long edit histories, that seems like a good way to find "hidden" and "lost" material for eventual rescue.
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On Fri, Apr 4, 2008 at 7:33 PM, White Cat wikipedia.kawaii.neko@gmail.com wrote:
That may be very difficult. Such a query would be very expensive both CPU-wise and BW-wise.
It could be run over several days, giving the server some time between requests to avoid DoSing it. The list of redirects could be obtained with a simple script using http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfilterredir=redirects&aplimit=500 as a base, setting the apfrom parameter as necessary. Then http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=...&rvlimit=50 for looking at revisions. It would not be a great system but give it a week or so and you'd have a good chunk of data to look at.
Optionally, someone with toolserver access could cook up a nice SQL query to kill the DB server with. :)
Page content in histories are not stored on the toolserv. Toolserv does NOT get full datadump. That's the main problem here.
Someone with a fast connection should run this query. I wasn't able to convince any such person yet.
- White Cat
On Sat, Apr 5, 2008 at 2:56 AM, Chris Howie cdhowie@gmail.com wrote:
On Fri, Apr 4, 2008 at 7:33 PM, White Cat wikipedia.kawaii.neko@gmail.com wrote:
That may be very difficult. Such a query would be very expensive both CPU-wise and BW-wise.
It could be run over several days, giving the server some time between requests to avoid DoSing it. The list of redirects could be obtained with a simple script using < http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfilte...
as a base, setting the apfrom parameter as necessary. Then < http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
for looking at revisions. It would not be a great system but give it a week or so and you'd have a good chunk of data to look at.
Optionally, someone with toolserver access could cook up a nice SQL query to kill the DB server with. :)
-- Chris Howie http://www.chrishowie.com http://en.wikipedia.org/wiki/User:Crazycomputers
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On Sun, Apr 6, 2008 at 12:02 PM, White Cat wikipedia.kawaii.neko@gmail.com wrote:
Page content in histories are not stored on the toolserv. Toolserv does NOT get full datadump. That's the main problem here.
Someone with a fast connection should run this query. I wasn't able to convince any such person yet.
- White Cat
I have a 45Mbps connection at my university, I'd be willing to run it if the servers are willing to put up with me.
As I understand it, the goal being discussed is to make a list of redirects with long edit histories. This does not require downloading the full text of each revision. It only requires the contents of the 'revision' and 'page' database tables, both of which are available in the database dumps:
http://download.wikimedia.org/enwiki/20080312/enwiki-20080312-stub-meta-hist... (6.7gb) http://download.wikimedia.org/enwiki/20080312/enwiki-20080312-page.sql.gz (385 MB)
- Carl
On Sun, Apr 6, 2008 at 7:04 PM, Chris Howie cdhowie@gmail.com wrote:
On Sun, Apr 6, 2008 at 12:02 PM, White Cat
wikipedia.kawaii.neko@gmail.com wrote:
Page content in histories are not stored on the toolserv. Toolserv does NOT get full datadump. That's the main problem here.
Someone with a fast connection should run this query. I wasn't able to convince any such person yet.
- White Cat
I have a 45Mbps connection at my university, I'd be willing to run it if the servers are willing to put up with me.
I'm opting for queries against the existing API instead of downloading dumps.
I have a full list of redirects in a flat file, right now it's going through and fetching the number of revisions (up to 100) for each redirect. I'll sort and post the top N here when it's done.
Chris Howie schreef:
I have a full list of redirects in a flat file, right now it's going through and fetching the number of revisions (up to 100) for each redirect. I'll sort and post the top N here when it's done.
You do realise that there are about 10 million redirects on WP?
Eugene
On Mon, Apr 7, 2008 at 1:32 PM, Eugene van der Pijll eugene@vanderpijll.nl wrote:
Chris Howie schreef:
I have a full list of redirects in a flat file, right now it's going through and fetching the number of revisions (up to 100) for each redirect. I'll sort and post the top N here when it's done.
You do realise that there are about 10 million redirects on WP?
Where are you getting that statistic?
chris@ravens:~/wikiredir$ bzcat redirects.bz2 | wc -l 2753639
Chris Howie schreef:
On Mon, Apr 7, 2008 at 1:32 PM, Eugene van der Pijll eugene@vanderpijll.nl wrote:
Chris Howie schreef:
I have a full list of redirects in a flat file, right now it's going through and fetching the number of revisions (up to 100) for each redirect. I'll sort and post the top N here when it's done.
You do realise that there are about 10 million redirects on WP?
Where are you getting that statistic?
chris@ravens:~/wikiredir$ bzcat redirects.bz2 | wc -l 2753639
I remembered this number from a previous discussion. Actually, according to [[Special:Statistics]], there are 12 million *pages* on enwiki, so I misrembered.
Let me ask another question then:
You do realise that there are about 3 million redirects on WP? And that downloading their history at 1 page per second will take about a month?
Eugene
On Mon, Apr 7, 2008 at 1:44 PM, Eugene van der Pijll eugene@vanderpijll.nl wrote:
Let me ask another question then:
You do realise that there are about 3 million redirects on WP? And that downloading their history at 1 page per second will take about a month?
Closer to 5-10 per second.
On 04/04/2008, White Cat wikipedia.kawaii.neko@gmail.com wrote:
Let me explain
- There exists an article "Alpha"
- There is a problem with article "Alpha"
- Article "Alpha" does not follow all guidelines and/or policies
- A user comes in and removes all content from Article "Alpha" and
redirects it to article "Beta" which has a related coverage.
- No content is added to article "Beta" in this process
- User calls this action a merge which is an editorial decision
independent from AFD
- Even pages of failed AFDs can be redirectified
This is what is happening in a nutshell. The topics this is happening range from fiction related articles (episodes/characters) to highways (real world related) as well as Townships among other topics.
- White Cat
This is a very serious and depressing issue. Happens very regularly indeed. Mind you, at least this can be simply reversed (and often is), unless the redirecting party is intransigent and persistent.
On 03/04/2008, Philip Sandifer snowspinner@gmail.com wrote:
If you wander over to WP:V you'll see where I reposted the discussion
there. However, I tend to find the mailing list also a useful place to discuss policy issues, as it has a higher number of grizzled old- timers who generally miss the policy page discussions.
As for the dead bodies, our article on [[Train wreck]] has a rather nice section on the use of the term as a metaphor. Complete with ill- advised {{fact}} tag.
-Phil
I consider myself a grizzled old-timer (contributor for over 4 years, admin 3.5 years) and I've long given hope. There's an inherent problem with the wiki model that the show ends up being dictated by the most persistent editors, regardless of whether they are reasonable or problematic. Wiki is great for collaborative editing, but is pretty much useless for decision-making and policy creation.
Consensus on Wikipedia is a fiction and a lie. The word is usually misappropriated to mean whatever the Wikipedia editor wants it to mean. It was almost preferable to rely on votes everywhere as in the past, although mind you, that still happens where it suits people (magically the votes of a handful on an obscure page are also "consensus"). Certainly preferable to a discussion and then one "side" getting their way, either through being the majority or most persistent. It's not consensus if you just keep going until the other side quits - it does not imply you've finally got something they can put up with, just that you have exhausted their patience and/or sanity.
I would certainly consider WP:V among other policy pages to be a train wreck, and of course it is not actually applied in practice all the time, because that would be unworkable. This allows people to strictly apply WP:V and other policies where it suits them, allowing them to get rid of content they disagree with.
Zoney