keep file stored format.
David Jones
djones at zoonami.com
Wed Jan 30 13:08:45 UTC 2002
In message <20020129094250.M23243 at joey.bomis.com>, Jimmy Wales writes:
> Magnus Manske wrote:
> > * The reason why I didn't convert the article history as well as the
> > "current" versions is simply that I didn't figure out how to read the
> > UseModWiki history format. I asked several times, even on the 'pedia, for
> > help, but...
>
> *nod* I don't know the answer to this. I've tried it myself and did not suc
>>ceed.
>
> Maybe Clifford Adams could help us?
Maybe.
Does the following help? Sorry it is so scrappy, but I have other
things to do and beggars can't be choosers.
2002-01-30
drj at pobox.com
The UseModWiki stored format
Introduction
This document is my attempt to describe the store file formats used
by UseModWiki. UseModWiki is a cgi perl script that implements a
wiki.
Directory Structure
wiki/usemod/zwisdb/...
diff_log
keep
page
rclog
temp
user
keep stores the older versions (possibly all kept versions).
keep/A/...
/B/...
A file in a subdirectory of keep is called Spong.kp where Spong is the FreeToNormal conversion of the page title. The FreeToNormal conversion works by converting spaces to "_", upper casing the first letter, eliding runs of "___" into a single "_" and removing "_" from around "/" (when subpages are used).
File Format
Formal syntax for keep files (.kp ending) (supposedly in RFC2234 ABNF syntax):
keep-file = *revision
revision = FS1 kv2-list
kv2-list = kv2 ; also called Section in the code.
=/ kv2-list FS2 kv2
kv2 = key FS2 value
value = simple-value
=/ kv3-list
kv3-list = kv3
=/ kv3-list FS3 kv3
kv3 = key FS3 simple-value
simple-value ; any sequence of characters not including "\xb3"
The strings FS1, FS2, FS3 are: (in C syntax, not RFC2234 syntax)
FS1 = "\xb31" (the character with decimal code 179
(hex b3) followed by ASCII 1)
FS2 = "\xb32"
FS3 = "\xb33"
The following kv2-list keys appear to be used:
username - the wiki username of the user to create this revision.
name - used by OpenKeptRevisions to select only certain revisions. However,
only "text_default" is used in the code, and I haven't seen any others
in the file.
version - version number of the data format.
ts - time of edit. Seconds since UNIX epoch.
ip - ip address
data - mostly the text and some (more) meta information
tscreate- time of creation (ie revision 1). Seconds since UNIX epoch.
keepts - obviously something to do with keeping and expiring but I don't
know exactly what yet.
id - user id of user making this revision.
host - hostname of machine that user is connecting from.
revision - the user visible wiki revision of the document.
Only the data key has a kv3-list value (aka Text hash in the code), the
other keys have simple values.
Data sub-keys:
newauthor - 0 or 1, I _think_ this is 1 when the user making this revision
is different from the one that made the previous revision.
minor - 0 or 1
text - The complete text for this revision
summary - the summary typed in by the user.
Numbers appear as printed decimals, strings appear as themselves.
The following awk script prints the version numbers held in a file:
awk -F $(printf '\263.') '{for(i=1;i<NF;++i)if($i=="revision")print $(i+1)}' FILE
WARNING: This will be slow if you use it multiple times (mostly because of evaluating printf '\263.' every invocation).
Note: octal 263 == decimal 179 == hexadecimal b3
My random notes about the code and other things:
in "sub OpenKeptList"
@KeptList = split(/$FS1/, $data, -1)
so there is a list of things separated by FS1.
OpenKeptRevisions iterates over each FS1 sep thing
separating it with FS2:
%tempSection = split(/$FS2/, $_, -1)
(into a hash table note)
ie, a series of key value pairs, each separated from the other by FS2
More information about the Wikipedia-l
mailing list