On mar, 2003-01-07 at 14:40, Tomasz Wegrzanowski wrote:
I played a bit with database.
Here is something that looks like Wikipedia database schema but compiles in Posgres (and no, Wikipedia script won't be able to use that yet):
:)))) Yay!!!!!! I *like* it when suggestions come with actual attempts at preliminary code.
Of course, this isn't "real" code, and many changes will be necessary to make it any useful. Anyway:
- user and old are reserved keywords in Postgres
Grrr! Well, 'user' can become 'users' easily enough. ;)
I'm also considering whether 'cur' and 'old' should be split up differently, into 'pageinfo' (title, restrictions, counter, redirect) and 'revisions' (text, comment, date, user, minor). This would perhaps be less radical than Jonathan's suggestion of a raw title<->numeric id table, but should simplify operations that work on both current and old revisions. (May need a shortcut for grabbing the current revision, perhaps a key from the pageinfo.)
- why are we using tinyblob(1) instead of bool for boolean data ?
Good question.
- why are we creating so many indexes ? are all of them necessary ?
On cur and old, we need indexes on title (to look up individual pages), timestamp (for sorting), and user (for contribs). Fulltext indexes for searching on the special index fields. Namespace presumably helps along with title?
On link tables, we're keying to indexed fields, and have a *lot* of rows which we may search from either end (whatlinkshere, orphan checks, wantedpages, etc etc).
Indexes on ipblocks table are probably useless, as it's very small.
User table has index on user name; helps with sorting the big user list. (What about lookup of options on every page view by logged in user? Is this number or name?)
Size, name, and timestamp fields in the image table may be used for sorting the huge and growing list o' files.
- shouldn't we use database-provided timestamps instead of varchar(14) ?
As long as we know how to manipulate them programmatically and format them for the selected language and time zone...
- rights and restrictions don't seem to be represented very efficiently. using integer flag for that seems like a better idea to me
Maybe. It's not really a bottleneck, though, and legibility isn't something to be hated.
- it might be good idea to consider making our own searching system instead of relying on something provided by database
Hell, why rely on a database? Why not write our own storage system? And filesystem... and virtual memory... and process manager... :)
We shouldn't have to be a wheel factory; if a good, fast, flexible search mechanism is available in our database we should use it.
-- brion vibber (brion @ pobox.com)