Meta question: is there a place in the wikipedia for these issues? (Yes, http://www.wikipedia.com/wiki.phtml?title=wikipedia:PHP+script+bug+reports ugh, what a horrible URL. )
Meta meta question: is a wiki appropriate for defect tracking? (not answered)
The history of pages has been amputated when changing to the new software, in other words, changes made before the new PHP software are not accessible. I see two problems with this:
1) Lost authorship.
2) Lost changes.
1) is the problem that the revision history used to contain the author information for the documents. I understand that it is a requirement of the Gnu FDL that the author information be retained so that it can be reproduced as needed under the terms of the license.
2) is more obvious, but perhaps less problematic. In many cases you may say "so what", but the fact is that there may have been useful content in older revisions (I have browsed the older revisions in order to understand the creation process of a document or gain insight into some debate). This content has now been thrown away. What is the policy on keeping older versions? Is it "keep all of them" or "only keep revisions when it is convenient, in particular if changing the software destroys all older revisions then that's fine"? Don't forget that the software will change again and again over the lifetime of the wikipedia. I would say it behoves wikipedia to maintain a storage format that is largely independent of the software (even if that storage format is not the one used by the software, it could be an export format for example).
Sorry to go on about 2) so much, but it strikes me that the current attitude is a little immature. The current attitude as I see it is one of "changing software is simple; maintaining a database is not our concern; user requirements are only superficially important".
I think 1) (the authorship problem) needs addressing. What is intended?
Cheers, drj
I'm going to have to insist that people stop making up accusations.
David Jones wrote:
Sorry to go on about 2) so much, but it strikes me that the current attitude is a little immature. The current attitude as I see it is one of "changing software is simple; maintaining a database is not our concern; user requirements are only superficially important".
This is all completely false, and is in fact the exact opposite of my attitude and the attitude of everyone doing the coding.
Please, try to be helpful -- and that means *not* just making stuff up out of thin air every time you have a problem with something.
--Jimbo
OK, let's try this without a flame war ;)
The history of pages has been amputated when changing to the new software, in other words, changes made before the new PHP software are not accessible. I see two problems with this:
- Lost authorship.
- Lost changes.
* As Wikipedia articles have no author, there's no authorship. * AFAIK, the UseModWiki only stored the changes of the last two weeks (or something like that). * AFAIK, there will be "old.wikipedia.com", with a patched UseModWiki that doesn't allow to edit/save articles, only view them, with the contents from before the switch. * The reason why I didn't convert the article history as well as the "current" versions is simply that I didn't figure out how to read the UseModWiki history format. I asked several times, even on the 'pedia, for help, but...
I hope that claryfies things a little.
Magnus
Magnus Manske wrote:
- As Wikipedia articles have no author, there's no authorship.
Well, there was a good bit of useful data there. There is authorship in some sense, and we do like to keep the history.
- AFAIK, the UseModWiki only stored the changes of the last two weeks (or
something like that).
It was previously two weeks, but I extended it, I forget how long, but our intention is to keep the history more or less permanently, subject to disk space constraints. (But disk space is cheap.)
- AFAIK, there will be "old.wikipedia.com", with a patched UseModWiki that
doesn't allow to edit/save articles, only view them, with the contents from before the switch.
That's right. But we should still see about getting the old history into the new database if it isn't impossible.
- The reason why I didn't convert the article history as well as the
"current" versions is simply that I didn't figure out how to read the UseModWiki history format. I asked several times, even on the 'pedia, for help, but...
*nod* I don't know the answer to this. I've tried it myself and did not succeed.
Maybe Clifford Adams could help us?
In message 20020129094250.M23243@joey.bomis.com, Jimmy Wales writes:
Magnus Manske wrote:
- The reason why I didn't convert the article history as well as the
"current" versions is simply that I didn't figure out how to read the UseModWiki history format. I asked several times, even on the 'pedia, for help, but...
*nod* I don't know the answer to this. I've tried it myself and did not suc
ceed.
Maybe Clifford Adams could help us?
Maybe.
Does the following help? Sorry it is so scrappy, but I have other things to do and beggars can't be choosers.
2002-01-30 drj@pobox.com
The UseModWiki stored format
Introduction
This document is my attempt to describe the store file formats used by UseModWiki. UseModWiki is a cgi perl script that implements a wiki.
Directory Structure
wiki/usemod/zwisdb/... diff_log keep page rclog temp user
keep stores the older versions (possibly all kept versions).
keep/A/... /B/...
A file in a subdirectory of keep is called Spong.kp where Spong is the FreeToNormal conversion of the page title. The FreeToNormal conversion works by converting spaces to "_", upper casing the first letter, eliding runs of "___" into a single "_" and removing "_" from around "/" (when subpages are used).
File Format
Formal syntax for keep files (.kp ending) (supposedly in RFC2234 ABNF syntax):
keep-file = *revision
revision = FS1 kv2-list
kv2-list = kv2 ; also called Section in the code. =/ kv2-list FS2 kv2
kv2 = key FS2 value
value = simple-value =/ kv3-list
kv3-list = kv3 =/ kv3-list FS3 kv3
kv3 = key FS3 simple-value
simple-value ; any sequence of characters not including "\xb3"
The strings FS1, FS2, FS3 are: (in C syntax, not RFC2234 syntax) FS1 = "\xb31" (the character with decimal code 179 (hex b3) followed by ASCII 1) FS2 = "\xb32" FS3 = "\xb33"
The following kv2-list keys appear to be used: username - the wiki username of the user to create this revision. name - used by OpenKeptRevisions to select only certain revisions. However, only "text_default" is used in the code, and I haven't seen any others in the file. version - version number of the data format. ts - time of edit. Seconds since UNIX epoch. ip - ip address data - mostly the text and some (more) meta information tscreate- time of creation (ie revision 1). Seconds since UNIX epoch. keepts - obviously something to do with keeping and expiring but I don't know exactly what yet. id - user id of user making this revision. host - hostname of machine that user is connecting from. revision - the user visible wiki revision of the document.
Only the data key has a kv3-list value (aka Text hash in the code), the other keys have simple values. Data sub-keys: newauthor - 0 or 1, I _think_ this is 1 when the user making this revision is different from the one that made the previous revision. minor - 0 or 1 text - The complete text for this revision summary - the summary typed in by the user.
Numbers appear as printed decimals, strings appear as themselves.
The following awk script prints the version numbers held in a file: awk -F $(printf '\263.') '{for(i=1;i<NF;++i)if($i=="revision")print $(i+1)}' FILE WARNING: This will be slow if you use it multiple times (mostly because of evaluating printf '\263.' every invocation). Note: octal 263 == decimal 179 == hexadecimal b3
My random notes about the code and other things:
in "sub OpenKeptList" @KeptList = split(/$FS1/, $data, -1) so there is a list of things separated by FS1. OpenKeptRevisions iterates over each FS1 sep thing separating it with FS2: %tempSection = split(/$FS2/, $_, -1) (into a hash table note) ie, a series of key value pairs, each separated from the other by FS2
In message 20020129094250.M23243@joey.bomis.com, Jimmy Wales writes:
Magnus Manske wrote:
- The reason why I didn't convert the article history as well as the
"current" versions is simply that I didn't figure out how to read the UseModWiki history format. I asked several times, even on the 'pedia, for help, but...
*nod* I don't know the answer to this. I've tried it myself and did not suc
ceed.
Maybe Clifford Adams could help us?
Maybe.
Does the following help? Sorry it is so scrappy, but I have other things to do and beggars can't be choosers.
2002-01-30 drj@pobox.com
The UseModWiki stored format
Introduction
This document is my attempt to describe the store file formats used by UseModWiki. UseModWiki is a cgi perl script that implements a wiki.
Directory Structure
wiki/usemod/zwisdb/... diff_log keep page rclog temp user
keep stores the older versions (possibly all kept versions).
keep/A/... /B/...
A file in a subdirectory of keep is called Spong.kp where Spong is the FreeToNormal conversion of the page title. The FreeToNormal conversion works by converting spaces to "_", upper casing the first letter, eliding runs of "___" into a single "_" and removing "_" from around "/" (when subpages are used).
File Format
Formal syntax for keep files (.kp ending) (supposedly in RFC2234 ABNF syntax):
keep-file = *revision
revision = FS1 kv2-list
kv2-list = kv2 ; also called Section in the code. =/ kv2-list FS2 kv2
kv2 = key FS2 value
value = simple-value =/ kv3-list
kv3-list = kv3 =/ kv3-list FS3 kv3
kv3 = key FS3 simple-value
simple-value ; any sequence of characters not including "\xb3"
The strings FS1, FS2, FS3 are: (in C syntax, not RFC2234 syntax) FS1 = "\xb31" (the character with decimal code 179 (hex b3) followed by ASCII 1) FS2 = "\xb32" FS3 = "\xb33"
The following kv2-list keys appear to be used: username - the wiki username of the user to create this revision. name - used by OpenKeptRevisions to select only certain revisions. However, only "text_default" is used in the code, and I haven't seen any others in the file. version - version number of the data format. ts - time of edit. Seconds since UNIX epoch. ip - ip address data - mostly the text and some (more) meta information tscreate- time of creation (ie revision 1). Seconds since UNIX epoch. keepts - obviously something to do with keeping and expiring but I don't know exactly what yet. id - user id of user making this revision. host - hostname of machine that user is connecting from. revision - the user visible wiki revision of the document.
Only the data key has a kv3-list value (aka Text hash in the code), the other keys have simple values. Data sub-keys: newauthor - 0 or 1, I _think_ this is 1 when the user making this revision is different from the one that made the previous revision. minor - 0 or 1 text - The complete text for this revision summary - the summary typed in by the user.
Numbers appear as printed decimals, strings appear as themselves.
The following awk script prints the version numbers held in a file: awk -F $(printf '\263.') '{for(i=1;i<NF;++i)if($i=="revision")print $(i+1)}' FILE WARNING: This will be slow if you use it multiple times (mostly because of evaluating printf '\263.' every invocation). Note: octal 263 == decimal 179 == hexadecimal b3
My random notes about the code and other things:
in "sub OpenKeptList" @KeptList = split(/$FS1/, $data, -1) so there is a list of things separated by FS1. OpenKeptRevisions iterates over each FS1 sep thing separating it with FS2: %tempSection = split(/$FS2/, $_, -1) (into a hash table note) ie, a series of key value pairs, each separated from the other by FS2
In message ILENJBCFHGHELEBFPPBGMEAHCFAA.Magnus.Manske@epost.de, "Magnus Mansk e" writes:
OK, let's try this without a flame war ;)
The history of pages has been amputated when changing to the new software, in other words, changes made before the new PHP software are not accessible. I see two problems with this:
- Lost authorship.
- Lost changes.
- As Wikipedia articles have no author, there's no authorship.
There is an author (as I think Jimbo has pointed out); when I write text in an article I am the author of that text and the owner of the copyright. When I hit submit I, as the copyright holder, license it to be used under the terms of the Gnu FDL.
The fact that the community doesn't really recognise authors doesn't affect the law.
- The reason why I didn't convert the article history as well as the
"current" versions is simply that I didn't figure out how to read the UseModWiki history format. I asked several times, even on the 'pedia, for help, but...
Hmm. I have submitted what I have learnt from an hour or two of eyeballing the code in another message. You should probably badger me if I can me of any more help.
Cheers, drj
wikipedia-l@lists.wikimedia.org