Hello,
I played a little bit with the (german) old_table (from 20040305) and have some questions about it.
1. As I understand the first entry of each row, this is an id, incremented for each new row. since there are missing some of the numbers, I assume that this are rows, which was deleted. Is this right?
2. There is a column timestamp and a column inverse timestamp. For what reason we need the inverse timestamp?
There also seems to be an inconsistency. The entry with the id 494209 (article about "Optik" in namespace 0 has timestamp "20031231041409" but inverse timestamp "80008783828360". May this is a new-year-bug? May this should be corrected manually?!
3. Which namespace is represented by which of the numbers 0 till 9?
4. The column for the "user-comment" in the begining contains often '*' but later nothing. Was it just the behaviour of the old software to represent no comment by '*'?
5. The column for the "user_id" in the beginning contains just 0, even for well-known users. later (after conversion_script appears) the correct id is printed. The column for the "user_name" has the same strange behaviour. It contains dns-names instead of IP-adresses. Was this the behaviour of the old software?
6. The coulumn "old_flags" seems to contain nothing. (its almost always empty, but contains one/sometimes '0'). Whats the application of this column? What flags does it contain?
--Ivo Köthnig
On Mar 14, 2004, at 17:50, Ivo Köthnig wrote:
I played a little bit with the (german) old_table (from 20040305) and have some questions about it.
- As I understand the first entry of each row, this is an id,
incremented for each new row. since there are missing some of the numbers, I assume that this are rows, which was deleted. Is this right?
Correct.
- There is a column timestamp and a column inverse timestamp. For
what reason we need the inverse timestamp?
MySQL 3.x is unable to optimize a descending sort using a column index. Adding an inverted column to index on dramatically sped up various features such as page history and user contributions lists as the wikis continued to grow in size.
On MySQL 4.x this is unnecessary, and we may remove the column from the next major revision.
There also seems to be an inconsistency. The entry with the id 494209 (article about "Optik" in namespace 0 has timestamp "20031231041409" but inverse timestamp "80008783828360". May this is a new-year-bug? May this should be corrected manually?!
There was an old bug in page move that didn't properly update inverse_timestamp. If you see any others incorrect, please make a list and we'll fix them all...
Additionally, that inverse_timestamp is for the year 1999... there was a very brief problem at one point where one of the web servers came up with a dramatically incorrect clock setting and corrupted a number of articles' dates. That may be an artifact of that incident.
- Which namespace is represented by which of the numbers 0 till 9?
0: (article) 2: User: 4: Wikipedia: 6: Image: 8: MediaWiki:
The odd numbers are for the associated talk/discussion namespaces.
- The column for the "user-comment" in the begining contains often
'*' but later nothing. Was it just the behaviour of the old software to represent no comment by '*'?
Yes, UseModWiki puts a "*" as the default comment in the edit form.
- The column for the "user_id" in the beginning contains just 0, even
for well-known users. later (after conversion_script appears) the correct id is printed.
The conversion from UseModWiki is unable to set up user accounts, since the mechanism for this is wildly different on UseMod than on MediaWiki. Pre-conversion edits are thus marked without a user_id.
(UseModWiki records _sets of preferences_, not really user accounts. Multiple "user id"s may have the same username, and there's no authentication for the use of names.)
The column for the "user_name" has the same strange behaviour. It contains dns-names instead of IP-adresses. Was this the behaviour of the old software?
Yes.
- The coulumn "old_flags" seems to contain nothing. (its almost
always empty, but contains one/sometimes '0'). Whats the application of this column? What flags does it contain?
For a long time it was totally unused. Now it may contain "gzip" if that revision's text is stored compressed; currently this is used only on the English Wikipedia on a semi-experimental basis.
-- brion vibber (brion @ pobox.com)
On MySQL 4.x this is unnecessary, and we may remove the column from the next major revision.
There also seems to be an inconsistency. The entry with the id 494209 (article about "Optik" in namespace 0 has timestamp "20031231041409" but inverse timestamp "80008783828360". May this is a new-year-bug? May this should be corrected manually?!
There was an old bug in page move that didn't properly update inverse_timestamp. If you see any others incorrect, please make a list and we'll fix them all...
Additionally, that inverse_timestamp is for the year 1999... there was a very brief problem at one point where one of the web servers came up with a dramatically incorrect clock setting and corrupted a number of articles' dates. That may be an artifact of that incident.
That was the only point in that cur_old where a timestamp does not match with the inverse timestamp. I addionally checked if there is a timestamp which is older than the first one (200105xxxx). But there was none.
There is another thing I do not understand. I assumed that the timestamps should be sorted. Nearly that is true. But often there are some timestamps which are much smaller (more than month!) than the others around? Whats the reason for that?
0: (article) 2: User: 4: Wikipedia: 6: Image: 8: MediaWiki:
The odd numbers are for the associated talk/discussion namespaces.
Thanks!
Yes, UseModWiki puts a "*" as the default comment in the edit form.
Could'n we change that? It should be easy to change all "*" appearing before the change to the new software. Or is there a good reason for don't doing that.
- The column for the "user_id" in the beginning contains just 0, even
for well-known users. later (after conversion_script appears) the correct id is printed.
The conversion from UseModWiki is unable to set up user accounts, since the mechanism for this is wildly different on UseMod than on MediaWiki. Pre-conversion edits are thus marked without a user_id.
(UseModWiki records _sets of preferences_, not really user accounts. Multiple "user id"s may have the same username, and there's no authentication for the use of names.)
Since the username should be almost always the same before the conversion and after it we could change each old id (set to zero know) to the new id after the change to the new software. Since we should have a list of all user-ids and the coresponding user_names we could change all 0-id before convertion to the new id if the user_name matches?!
- The coulumn "old_flags" seems to contain nothing. (its almost
always empty, but contains one/sometimes '0'). Whats the application of this column? What flags does it contain?
For a long time it was totally unused. Now it may contain "gzip" if that revision's text is stored compressed; currently this is used only on the English Wikipedia on a semi-experimental basis.
Ok, but it even contains 68 times "0" in the german old_cur and is empty in all others. Could that zeros be changed to ""?
--Ivo Köthnig
On Mar 14, 2004, at 23:20, Ivo Köthnig wrote:
That was the only point in that cur_old where a timestamp does not match with the inverse timestamp. I addionally checked if there is a timestamp which is older than the first one (200105xxxx). But there was none.
Okay, I've fixed it. Thanks for the notice!
There is another thing I do not understand. I assumed that the timestamps should be sorted. Nearly that is true. But often there are some timestamps which are much smaller (more than month!) than the others around? Whats the reason for that?
Rows aren't necessarily inserted in chronological order, mainly for two reasons: conversion from UseModWiki was alphabetical by page name (so a page written later may be recorded before a page written earlier), and data for articles which are deleted and subsequently undeleted are re-inserted with new row id numbers.
Yes, UseModWiki puts a "*" as the default comment in the edit form.
Could'n we change that? It should be easy to change all "*" appearing before the change to the new software. Or is there a good reason for don't doing that.
I can't think of any reason to change it. What's wrong with leaving them as they are?
Since the username should be almost always the same before the conversion and after it we could change each old id (set to zero know) to the new id after the change to the new software. Since we should have a list of all user-ids and the coresponding user_names we could change all 0-id before convertion to the new id if the user_name matches?!
That could be done, yes.
[old_flags] Ok, but it even contains 68 times "0" in the german old_cur and is empty in all others. Could that zeros be changed to ""?
There's no harm in the "0", it doesn't interfere with anything.
-- brion vibber (brion @ pobox.com)
What do you think about force user to choose a licence (in a combo box) when uploading a media (pictures, sound, etc.) to Wikipedia. I see 2 benefits:
* Force user to care about licence,
* Automatically add {{msg:<NameOfLicence>}} to the bottom of media description page.
Aoineko
Example:
----------------------
| Choose a licence |v|
----------------------
| GFDL |
| Public Domain |
| Fair Use |
| ... |
--------------------
"GB" == Guillaume Blanchard gblanchard@arcsy.co.jp writes:
GB> * Automatically add {{msg:<NameOfLicence>}} to the bottom of GB> media description page.
Actually, it'd make things a lot easier for everyone if some code for the license was in a database field. Easier to separate out stuff with incompatible licenses, for instance.
~ESP
Evan Prodromou wrote:
"GB" == Guillaume Blanchard gblanchard@arcsy.co.jp writes:
GB> * Automatically add {{msg:<NameOfLicence>}} to the bottom of GB> media description page.Actually, it'd make things a lot easier for everyone if some code for the license was in a database field. Easier to separate out stuff with incompatible licenses, for instance.
~ESP
Thanks it's exactely what we need.
Alas, now uploading doesn't work anymore on fr: Can someone look at this, please?
-- Looxix
Luc Van Oostenryck wrote:
Evan Prodromou wrote:
> "GB" == Guillaume Blanchard gblanchard@arcsy.co.jp writes:
GB> * Automatically add {{msg:<NameOfLicence>}} to the bottom of GB> media description page.Actually, it'd make things a lot easier for everyone if some code for the license was in a database field. Easier to separate out stuff with incompatible licenses, for instance.
~ESP
Thanks it's exactely what we need.
Alas, now uploading doesn't work anymore on fr: Can someone look at this, please?
-- Looxix
The problem seems to come to the change that was done on [[MediaWiki:Affirmation]] (I have changed it now)
The HTML source og the upload page is then like this:
<form id="upload" method="post" enctype="multipart/form-data" action="/wiki/Special:Upload">
<table border=0><tr> <td align=right>Nom:</td><td align=left> <input tabindex=1 type=file name="wpUploadFile" value="" size=40> </td></tr><tr> <td align=right>Description:</td><td align=left> <input tabindex=2 type=text name="wpUploadDescription" value="" size=40> </td></tr><tr>
<td align=right> <input tabindex=3 type=checkbox name="wpUploadAffirm" value="1" id="wpUploadAffirm"> </td><td align=left><label for="wpUploadAffirm"><form>
^^^^^^ <select name="licence"> <option value="Choix 1">Choisir une licence <option value="Choix 2">GFDL <option value="Choix 3">GPL
<option value="Choix 4">LGPL <option value="Choix 5">Domaine public <option value="Choix 6">Fair Use <option value="Choix 7">Art libre <option value="Choix 8">CC-AS <option value="Choix 9">CC-BY-AS </select> </form> <br><br> Je déclare avoir pris connaissance des <a href="http://fr.wikipedia.org/wiki/Wikipédia:Règles_d'utilisation_des_images">règles d'utilisation des fichiers</a> et que le détenteur de son copyright accepte de le diffuser selon les termes de la <a href="/wiki/Wikip%E9dia:Copyright" class='internal' title ="Wikipédia:Copyright">licence Wikipédia</a>.</label></td>
</tr> <tr><td> </td><td align=left> <input tabindex=5 type=submit name="wpUpload" value="Copier un fichier"> </td></tr></table></form>
An extra <form> was inserted in the form.
Hope it helps. -- Looxix
I'm not sure what exactly this all means, but I am hopeful that it will mean something to someone else...
I've just finished doing some "repair" work on [[wikipedia:Orphaned Articles]]. I had to remove multiple repeated segments of the page. I say "segments" because they did not directly correspond with sections. Multiple sections were repeated as a group, though it did seem to be on section boundaries.
Either someone was extraordinarily careless (or deliberately disruptive), or somehow the data got mashed together incorrectly. (probably in the database?)
I did not compare the repeated segments to verify that they matched, so some changes could have been lost (not a huge issue with this page -- just some extra work). However, it did not seem to over-write data, just sort of append them together in strange ways.
My main concern is that this could be happening to other pages where loss of changes could be much more significant.
Is this a known bug? Something to do with edit conflicts that are not handled correctly?
-Rich Holton ([[User:Rholton]])
__________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com
Rich Holton wrote:
I'm not sure what exactly this all means, but I am hopeful that it will mean something to someone else...
Hiya. First of all, please could you not reply to a message that has nothing to do with your message? That way your posting shows up in a thread which has nothing to do with it. Thanks.
I've just finished doing some "repair" work on [[wikipedia:Orphaned Articles]].
Whoa. That pages seems like a mess. Well, not anymore, because I've fixed it just now, but this is what it looked like: http://en.wikipedia.org/w/wiki.phtml?title=Wikipedia:Orphaned_articles&o...
Are you sure you meant this page?... Nobody appears to have made edits to it since January.
I had to remove multiple repeated segments of the page. I say "segments" because they did not directly correspond with sections. Multiple sections were repeated as a group, though it did seem to be on section boundaries.
This description is pretty vague and unclear, but it does seem to resemble something similar that I experienced on [[Wikipedia:Duplicate articles]]. The sections U, W, X, Y, Z, were suddenly duplicated at the bottom of the page. (Oh, the irony.)
My main concern is that this could be happening to other pages where loss of changes could be much more significant.
Assuming you didn't actually mean [[Wikipedia:Orphaned articles]], but some other page: Did you check the history of the page in question? It seems unlikely that the error would affect the entire revision history. If significant information gets lost, you should always be able to retrieve it from an older revision.
Timwi
--- Timwi timwi@gmx.net wrote:
Rich Holton wrote:
I'm not sure what exactly this all means, but I am hopeful that it will mean something to someone
else...
Hiya. First of all, please could you not reply to a message that has nothing to do with your message? That way your posting shows up in a thread which has nothing to do with it. Thanks.
Ahhh! Mea culpa! If this reply isn't correct, I'll have to sign up for remedial training.
I've just finished doing some "repair" work on [[wikipedia:Orphaned Articles]].
Whoa. That pages seems like a mess. Well, not anymore, because I've fixed it just now, but this is what it looked like:
http://en.wikipedia.org/w/wiki.phtml?title=Wikipedia:Orphaned_articles&o...
Are you sure you meant this page?... Nobody appears to have made edits to it since January.
Ummm... [[wikipedia:Orphaned Articles]] is distinct from [[wikipedia:Orphaned articles]] (look closely). I understand the need for, but still hate, case sensitivity.
I had to remove multiple repeated segments of the page. I say "segments" because they did not directly
correspond
with sections. Multiple sections were repeated as
a
group, though it did seem to be on section
boundaries.
This description is pretty vague and unclear, but it does seem to resemble something similar that I experienced on [[Wikipedia:Duplicate articles]]. The sections U, W, X, Y, Z, were suddenly duplicated at the bottom of the page. (Oh, the irony.)
Yeah, a valid criticism of my description (but it was perfectly clear to ME what I meant! ;) Now that my mind is clearer (and I've been properly chastised for my vagueness), I can report the following:
It does sound similar to what you describe, though more muddled. Here we had: -Heading plus sections A & B -Heading -Heading plus sections A thru Z (ie whole correct page) -Sections D thru Z
Fortunately, I came along only three changes past where the corruption occurred, and all the changes were made to the part that corresponded to the whole correct page, which is what I retained. So nothing was lost, as it turns out. By the way, kudos to whoever conceived of and implemented the comparison-between-arbitrary-versions feature.
Looking at the page history, it's clear that the corruption was introduced when someone attempted to submit the same change three times in a row -- probably because of an apparent lack of response.
My main concern is that this could be happening to other pages where loss of changes could be much
more
significant.
Still my main concern.
If significant information gets lost, you should always be able to retrieve it from an older revision.
Thanks for your attention to this, Timwi. I do tend to forget the power of the revision history. Obviously, any loss could be recovered, but the work required to do so could be truly significant, depending upon the page in question, the nature of the corruption, and how many additional revisions happened between the corruption and its detection.
Again, thanks for your attention. I hope I've made a clearer report. I no longer see this to be particularly urgent, but I still consider it to be important.
-Rich Holton
__________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com
Rich Holton wrote:
Ummm... [[wikipedia:Orphaned Articles]] is distinct from [[wikipedia:Orphaned articles]] (look closely).
Yeah. Someone pointed it out to me on my User talk page. I suppose we can really delete [[Wikipedia:Orphaned articles]] and move [[Wikipedia:Orphaned Articles]] over to the intuitive capitalisation that matches our naming convention in the article namespace.
It does sound similar to what you describe, though more muddled. Here we had: -Heading plus sections A & B -Heading -Heading plus sections A thru Z (ie whole correct page) -Sections D thru Z
** PLING! **
Whee, I know what's going on. And I'm really surprised this didn't come up before, but anyway. This is just a theory, but it seems so 100% plausible to me that it just gotta be right:
I think what is happening is this:
- You click the little "edit" link for section editing. In your example, you edited the section "C". - You get an edit conflict. The top window on the edit conflict screen then displays the entire text of the article as it was submitted by the person that was quicker than you. - However, a hidden form element still contains the section number. Thus, when you submit, the software thinks you're still section-editing, although you're actually sending the entire article! - Thus, it replaces section "C" with the entire article text. This is why all sections except for "C" are duplicated.
This definitely needs to be fixed :-) I suppose the easiest fix would be to just remove that hidden form element and have users fiddle with the entire article text. A more sensible fix would be to actually display only that section. However, that latter solution will be extremely complicated in cases where the person that was quicker than you added or removed a section above the one you were editing, so the sections are renumbered...
Greetings, Timwi
wikitech-l@lists.wikimedia.org