On mar, 2003-01-07 at 14:40, Tomasz Wegrzanowski wrote:
I played a bit with database.
Here is something that looks like Wikipedia database schema but
compiles in Posgres (and no, Wikipedia script won't be able to use
that yet):
:)))) Yay!!!!!! I *like* it when suggestions come with actual attempts
at preliminary code.
Of course, this isn't "real" code, and
many changes will be necessary
to make it any useful. Anyway:
* user and old are reserved keywords in Postgres
Grrr! Well, 'user' can become 'users' easily enough. ;)
I'm also considering whether 'cur' and 'old' should be split up
differently, into 'pageinfo' (title, restrictions, counter, redirect)
and 'revisions' (text, comment, date, user, minor). This would perhaps
be less radical than Jonathan's suggestion of a raw title<->numeric id
table, but should simplify operations that work on both current and old
revisions. (May need a shortcut for grabbing the current revision,
perhaps a key from the pageinfo.)
* why are we using tinyblob(1) instead of bool for
boolean data ?
Good question.
* why are we creating so many indexes ? are all of
them necessary ?
On cur and old, we need indexes on title (to look up individual pages),
timestamp (for sorting), and user (for contribs). Fulltext indexes for
searching on the special index fields. Namespace presumably helps along
with title?
On link tables, we're keying to indexed fields, and have a *lot* of rows
which we may search from either end (whatlinkshere, orphan checks,
wantedpages, etc etc).
Indexes on ipblocks table are probably useless, as it's very small.
User table has index on user name; helps with sorting the big user list.
(What about lookup of options on every page view by logged in user? Is
this number or name?)
Size, name, and timestamp fields in the image table may be used for
sorting the huge and growing list o' files.
* shouldn't we use database-provided timestamps
instead of varchar(14) ?
As long as we know how to manipulate them programmatically and format
them for the selected language and time zone...
* rights and restrictions don't seem to be
represented very efficiently.
using integer flag for that seems like a better idea to me
Maybe. It's not really a bottleneck, though, and legibility isn't
something to be hated.
* it might be good idea to consider making our own
searching system
instead of relying on something provided by database
Hell, why rely on a database? Why not write our own storage system? And
filesystem... and virtual memory... and process manager... :)
We shouldn't have to be a wheel factory; if a good, fast, flexible
search mechanism is available in our database we should use it.
-- brion vibber (brion @
pobox.com)