Hello,
I created lists of cur table entries which have the same namespace and title. They were probably created by clicking "Save page" more than once on a newly created article.
These lists lead to bogus entries in maintenance lists, e.g. orphaned articles or short pages.
I uploaded three lists: * http://en.wikipedia.org/wiki/User:SirJective/Double_entries * http://de.wikipedia.org/wiki/Benutzer:SirJective/Doppeleintr%C3%A4ge * http://nl.wikipedia.org/wiki/Gebruiker:SirJective/Double_entries
Every language I downloaded so far (hu, fr, ja) has double entries.
I used a query like this on a local dump:
SELECT DISTINCT c1.cur_namespace, c1.cur_title, c1.cur_id, c1.cur_timestamp FROM cur c1, cur c2 USE INDEX (name_title_timestamp) WHERE (c1.cur_namespace = c2.cur_namespace AND c1.cur_title = c2.cur_title) AND c1.cur_id <> c2.cur_id ORDER BY c1.cur_namespace, c1.cur_title, c1.cur_timestamp LIMIT 1000;
At de-WP, Echoray and Peterlustig found that these entries can be removed by deleting and undeleting them. Are there any objections to handling these entries this way? If not: Is some sysop willing to go through the lists and del/undel the entries?
Christian aka SirJective
Christian Semrau wrote:
I created lists of cur table entries which have the same namespace and title. [...]
You mentioned that they can be fixed by sysops. As such, I don't think the tech list is the right place to ask if there is any objection to them doing so. I'd rather ask on wikipedia-l. However, I don't see any real harm in it. All it does is add a revision to the history. We always want that! We want it to look like lots of people are working on lots of articles :-))))
Timwi
Timwi writes:
Christian Semrau wrote:
I created lists of cur table entries which have the same namespace and title. [...]
You mentioned that they can be fixed by sysops. As such, I don't think the tech list is the right place to ask if there is any objection to them doing so. I'd rather ask on wikipedia-l. However, I don't see any real harm in it. All it does is add a revision to the history. [...]
As I don't know the mechanisms of deletion and undeletion, I wanted to ask if there are technical objections to this procedure. Since you have none, I will promote my lists :-)
Christian de:Benutzer:SirJective
Timwi schrieb:
I created lists of cur table entries which have the same namespace and title. [...]
You mentioned that they can be fixed by sysops. As such, I don't think the tech list is the right place to ask if there is any objection to them doing so.
You know, deleting more than 200 articles by hand can be a painful job... Maybe some developer could invent a SQL query that gets rid of all those duplicates at once? And maybe the software can be somehow improved to make sure it updates articles correctly and does *not* create a duplicate in some weird corner-cases.
Alwin Meschede wrote:
You know, deleting more than 200 articles by hand can be a painful job... Maybe some developer could invent a SQL query that gets rid of all those duplicates at once? And maybe the software can be somehow improved to make sure it updates articles correctly and does *not* create a duplicate in some weird corner-cases.
There's a "INSERT ... ON DUPLICATE KEY UPDATE" function in MySQL which should avoid double creation. Alternatively, we could make the key pair cur_title,cur_namespace unique.
Magnus
Magnus Manske wrote:
Alwin Meschede wrote:
You know, deleting more than 200 articles by hand can be a painful job... Maybe some developer could invent a SQL query that gets rid of all those duplicates at once? And maybe the software can be somehow improved to make sure it updates articles correctly and does *not* create a duplicate in some weird corner-cases.
There's a "INSERT ... ON DUPLICATE KEY UPDATE" function in MySQL which should avoid double creation.
Ah, but that would over-write the older 'cur' row, no?
Alternatively, we could make the key pair cur_title,cur_namespace unique.
That sounds much better. :-)
Timwi wrote:
Magnus Manske wrote:
There's a "INSERT ... ON DUPLICATE KEY UPDATE" function in MySQL which should avoid double creation.
Ah, but that would over-write the older 'cur' row, no?
Yes, but AFAIK, this "double creation" only happens if a new article is submitted and saved twice (thrice etc.) in rapid-fire mode. So, it would basically overwrite the information with the same information again.
Magnus
Magnus Manske wrote:
Timwi wrote:
Magnus Manske wrote:
There's a "INSERT ... ON DUPLICATE KEY UPDATE" function in MySQL which should avoid double creation.
Ah, but that would over-write the older 'cur' row, no?
Yes, but AFAIK, this "double creation" only happens if a new article is submitted and saved twice (thrice etc.) in rapid-fire mode. So, it would basically overwrite the information with the same information again.
Good point.
Timwi
Alwin Meschede writes:
You know, deleting more than 200 articles by hand can be a painful job... Maybe some developer could invent a SQL query that gets rid of all those duplicates at once? And maybe the software can be somehow improved to make sure it updates articles correctly and does *not* create a duplicate in some weird corner-cases.
At de I saw a bot deleting redirects with nonexistent target (controlled by de:Benutzer:Head), it may be possible to use a bot to delete and undelete these items.
Apart from that it would be good to improve the software to stop the creation of duplicates.
Christian de:Benutzer:SirJective
At the german WP, user Fristu took care of all ~100 listed entries, within a few hours. I know the ~250 en: entries will not be cleaned within a day, but can surely be manually cleaned within a week, given the entries are linked properly (be careful with redirects).
Christian SirJective
PS: What's up with the wikipedia-l list archives (pipermail and gmane)? I can't believe Jimbo is the only one writing to this list today. I have posted a message via gmane (~12:00 UTC), but that didn't show up yet.
Christian Semrau wrote:
PS: What's up with the wikipedia-l list archives (pipermail and gmane)? I can't believe Jimbo is the only one writing to this list today. I have posted a message via gmane (~12:00 UTC), but that didn't show up yet.
There was also a spammer advertising Cialis, but the list didn't let that go through. :)
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org