In message 20020129094250.M23243@joey.bomis.com, Jimmy Wales writes:
Magnus Manske wrote:
- The reason why I didn't convert the article history as well as the
"current" versions is simply that I didn't figure out how to read the UseModWiki history format. I asked several times, even on the 'pedia, for help, but...
*nod* I don't know the answer to this. I've tried it myself and did not suc
ceed.
Maybe Clifford Adams could help us?
Maybe.
Does the following help? Sorry it is so scrappy, but I have other things to do and beggars can't be choosers.
2002-01-30 drj@pobox.com
The UseModWiki stored format
Introduction
This document is my attempt to describe the store file formats used by UseModWiki. UseModWiki is a cgi perl script that implements a wiki.
Directory Structure
wiki/usemod/zwisdb/... diff_log keep page rclog temp user
keep stores the older versions (possibly all kept versions).
keep/A/... /B/...
A file in a subdirectory of keep is called Spong.kp where Spong is the FreeToNormal conversion of the page title. The FreeToNormal conversion works by converting spaces to "_", upper casing the first letter, eliding runs of "___" into a single "_" and removing "_" from around "/" (when subpages are used).
File Format
Formal syntax for keep files (.kp ending) (supposedly in RFC2234 ABNF syntax):
keep-file = *revision
revision = FS1 kv2-list
kv2-list = kv2 ; also called Section in the code. =/ kv2-list FS2 kv2
kv2 = key FS2 value
value = simple-value =/ kv3-list
kv3-list = kv3 =/ kv3-list FS3 kv3
kv3 = key FS3 simple-value
simple-value ; any sequence of characters not including "\xb3"
The strings FS1, FS2, FS3 are: (in C syntax, not RFC2234 syntax) FS1 = "\xb31" (the character with decimal code 179 (hex b3) followed by ASCII 1) FS2 = "\xb32" FS3 = "\xb33"
The following kv2-list keys appear to be used: username - the wiki username of the user to create this revision. name - used by OpenKeptRevisions to select only certain revisions. However, only "text_default" is used in the code, and I haven't seen any others in the file. version - version number of the data format. ts - time of edit. Seconds since UNIX epoch. ip - ip address data - mostly the text and some (more) meta information tscreate- time of creation (ie revision 1). Seconds since UNIX epoch. keepts - obviously something to do with keeping and expiring but I don't know exactly what yet. id - user id of user making this revision. host - hostname of machine that user is connecting from. revision - the user visible wiki revision of the document.
Only the data key has a kv3-list value (aka Text hash in the code), the other keys have simple values. Data sub-keys: newauthor - 0 or 1, I _think_ this is 1 when the user making this revision is different from the one that made the previous revision. minor - 0 or 1 text - The complete text for this revision summary - the summary typed in by the user.
Numbers appear as printed decimals, strings appear as themselves.
The following awk script prints the version numbers held in a file: awk -F $(printf '\263.') '{for(i=1;i<NF;++i)if($i=="revision")print $(i+1)}' FILE WARNING: This will be slow if you use it multiple times (mostly because of evaluating printf '\263.' every invocation). Note: octal 263 == decimal 179 == hexadecimal b3
My random notes about the code and other things:
in "sub OpenKeptList" @KeptList = split(/$FS1/, $data, -1) so there is a list of things separated by FS1. OpenKeptRevisions iterates over each FS1 sep thing separating it with FS2: %tempSection = split(/$FS2/, $_, -1) (into a hash table note) ie, a series of key value pairs, each separated from the other by FS2