keep file stored format.

David Jones drj at pobox.com
Wed Jan 30 13:26:50 UTC 2002


In message <20020129094250.M23243 at joey.bomis.com>, Jimmy Wales writes:
> Magnus Manske wrote:
> > * The reason why I didn't convert the article history as well as the
> > "current" versions is simply that I didn't figure out how to read the
> > UseModWiki history format. I asked several times, even on the 'pedia, for
> > help, but...
> 
> *nod*  I don't know the answer to this.  I've tried it myself and did not suc
>>ceed.
> 
> Maybe Clifford Adams could help us?

Maybe.

Does the following help?  Sorry it is so scrappy, but I have other
things to do and beggars can't be choosers.

2002-01-30
drj at pobox.com

The UseModWiki stored format


Introduction

This document is my attempt to describe the store file formats used
by UseModWiki.  UseModWiki is a cgi perl script that implements a
wiki.


Directory Structure

wiki/usemod/zwisdb/...
  diff_log
  keep
  page
  rclog
  temp
  user

keep stores the older versions (possibly all kept versions).

keep/A/...
    /B/...

A file in a subdirectory of keep is called Spong.kp where Spong is the FreeToNormal conversion of the page title.  The FreeToNormal conversion works by converting spaces to "_", upper casing the first letter, eliding runs of "___" into a single "_" and removing "_" from around "/" (when subpages are used).


File Format


Formal syntax for keep files (.kp ending) (supposedly in RFC2234 ABNF syntax):


keep-file =  *revision

revision  =  FS1 kv2-list

kv2-list  =  kv2		; also called Section in the code.
          =/ kv2-list FS2 kv2

kv2       =  key FS2 value

value     =  simple-value
          =/ kv3-list

kv3-list  =  kv3
          =/ kv3-list FS3 kv3

kv3       =  key FS3 simple-value

simple-value ; any sequence of characters not including "\xb3"

The strings FS1, FS2, FS3 are: (in C syntax, not RFC2234 syntax)
  FS1 = "\xb31" (the character with decimal code 179
      (hex b3) followed by ASCII 1)
  FS2 = "\xb32"
  FS3 = "\xb33"


The following kv2-list keys appear to be used:
username - the wiki username of the user to create this revision.
name 	- used by OpenKeptRevisions to select only certain revisions.  However,
	only "text_default" is used in the code, and I haven't seen any others
	in the file.
version - version number of the data format.
ts      - time of edit.  Seconds since UNIX epoch.
ip	- ip address
data    - mostly the text and some (more) meta information
tscreate- time of creation (ie revision 1).  Seconds since UNIX epoch.
keepts  - obviously something to do with keeping and expiring but I don't
 	know exactly what yet.
id	- user id of user making this revision.
host	- hostname of machine that user is connecting from.
revision - the user visible wiki revision of the document.


Only the data key has a kv3-list value (aka Text hash in the code), the
other keys have simple values.
Data sub-keys:
newauthor 	- 0 or 1, I _think_ this is 1 when the user making this revision
	is different from the one that made the previous revision.
minor		- 0 or 1
text		- The complete text for this revision
summary		- the summary typed in by the user.

Numbers appear as printed decimals, strings appear as themselves.


The following awk script prints the version numbers held in a file:
awk -F $(printf '\263.') '{for(i=1;i<NF;++i)if($i=="revision")print $(i+1)}' FILE
WARNING: This will be slow if you use it multiple times (mostly because of evaluating printf '\263.' every invocation).
Note: octal 263 == decimal 179 == hexadecimal b3


My random notes about the code and other things:

in "sub OpenKeptList"
@KeptList = split(/$FS1/, $data, -1)
so there is a list of things separated by FS1.
OpenKeptRevisions iterates over each FS1 sep thing
separating it with FS2:
  %tempSection = split(/$FS2/, $_, -1)
(into a hash table note)
ie, a series of key value pairs, each separated from the other by FS2




More information about the Wikipedia-l mailing list