On Fri, 2002-11-15 at 23:14, Jonathan Walther wrote:
If someone can answer some of the following questions,
that would be
great. I'm going to paste each SQL table definition, then ask questions
about it. I'll number each question so I can keep track of which ones
still aren't answered at the end of the week. :-)
I'm sure Ed Poor will have more insightful and relevant questions than
the ones I'm posting below; Brion and you other developers, please don't
be offended at the tone of the questions. I seek knowledge only.
Disclaimer: I'm not sure how all of this works myself. I came here to
work on localization, all this database stuff is an ugly side trip for
me.
See also 'schema.doc' in the 'docs' subdirectory in the source tarball,
which lists the usage of some stuff.
CREATE TABLE archive (
ar_namespace tinyint(2) unsigned NOT NULL default '0',
ar_title varchar(255) binary NOT NULL default '',
ar_text mediumtext NOT NULL,
ar_comment tinyblob NOT NULL,
ar_user int(5) unsigned NOT NULL default '0',
ar_user_text varchar(255) binary NOT NULL default '',
ar_timestamp varchar(14) binary NOT NULL default '',
ar_minor_edit tinyint(1) NOT NULL default '0',
ar_flags tinyblob NOT NULL
) TYPE=MyISAM PACK_KEYS=1;
1) What is the "archive" table for? Is that where articles are stored?
'archive' is where deleted articles are stored. They can be viewed or
undeleted by a sysop via Special:Undelete or through manual queries.
2) Why is there a separate ar_user_text property? Is
it because MySQL
doesn't support subqueries?
Probably.
3) What is ar_flags used for? And why is it a
tinyblob?
I believe the flags field is not used for any purpose.
4) Could we save a lot of disk space by replacing
ar_title with
article_id, and having all the article titles and their corresponding
ids in a separate table?
Not significantly for that table. Might be a useful trick elsewhere,
though.
CREATE TABLE brokenlinks (
bl_from int(8) unsigned NOT NULL default '0',
bl_to varchar(255) binary NOT NULL default '',
KEY bl_from (bl_from),
KEY bl_to (bl_to(10))
) TYPE=MyISAM;
5) Why is a broken link table necessary? Couldn't it be sufficient to
do a query to find out which links are broken? Or is this another "we
have no subqueries" issue?
I believe the primary use of the brokenlinks table is for generating the
'most wanted' list.
6) I notice bl_from is an integer, while bl_to is a
string; why the
difference? I thought articles didn't have identifying id numbers; if
so, what is bl_from then?
All articles have ID numbers; that's cur_id in the cur table. bl_from
and l_to reference cur_id numbers. Articles that don't exist don't have
cur_id numbers of course, so cannot be referenced by id, but only by
title.
CREATE TABLE cur (
cur_id int(8) unsigned NOT NULL auto_increment,
cur_namespace tinyint(2) unsigned NOT NULL default '0',
cur_title varchar(255) binary NOT NULL default '',
cur_text mediumtext NOT NULL,
cur_comment tinyblob NOT NULL,
cur_user int(5) unsigned NOT NULL default '0',
cur_user_text varchar(255) binary NOT NULL default '',
cur_timestamp varchar(14) binary NOT NULL default '',
cur_restrictions tinyblob NOT NULL,
cur_counter bigint(20) unsigned NOT NULL default '0',
cur_ind_title varchar(255) NOT NULL default '',
cur_ind_text mediumtext NOT NULL,
cur_is_redirect tinyint(1) unsigned NOT NULL default '0',
cur_minor_edit tinyint(1) unsigned NOT NULL default '0',
cur_is_new tinyint(1) unsigned NOT NULL default '0',
UNIQUE KEY cur_id (cur_id),
KEY cur_namespace (cur_namespace),
KEY cur_title (cur_title(20)),
KEY cur_timestamp (cur_timestamp),
FULLTEXT KEY cur_ind_title (cur_ind_title),
FULLTEXT KEY cur_ind_text (cur_ind_text)
) TYPE=MyISAM PACK_KEYS=1;
7) What is the table cur for? I find myself understanding nothing about
it; I even looked at the 3 default entries that are inserted, and they
didn't clue me in.
The cur table contains the current revision of every article in the
database. It's kind of important. ;)
The default entries aren't very informative because an empty database
is, well, pretty empty!
CREATE TABLE image (
img_name varchar(255) binary NOT NULL default '',
img_size int(8) unsigned NOT NULL default '0',
img_description tinyblob NOT NULL,
img_user int(5) unsigned NOT NULL default '0',
img_user_text varchar(255) binary NOT NULL default '',
img_timestamp varchar(14) binary NOT NULL default '',
KEY img_name (img_name(10)),
KEY img_size (img_size),
KEY img_timestamp (img_timestamp)
) TYPE=MyISAM PACK_KEYS=1;
8) Is img_user_text present for the "MySQL has no subquery
functionality" reason?
Prolly.
9) What format are timestamps in?
Text strings formatted like so: 'YYYYMMDDHHMMSS'. Note that we will have
a year 10000 problem unless we upgrade the software first! ;)
In phase II they were TIMESTAMP fields, which I believe came out
formatted that way by default. This was annoying as TIMESTAMPs are
updated automatically every time you touch the row unless you explicitly
set them to themselves. (That's why you'll occasionally see some legacy
cruft in update queries that set cur_timestamp=cur_timestamp or
whatnot.)
There's probably a better way to do that.
10) Is there special PHP code that ensures an image
with the same name,
but differing descriptions or img_user properties are present in the
table? Does MySQL not have the "UNIQUE" keyword which could eliminate
all need for that code, or is it wanted to have a record of all changes
to the description of an image? If so, I would recommend a slightly
different solution, similar to that for the articles table.
The current solution is more or less like that for the articles tables.
There's 'image' ~= 'cur' and 'oldimage' ~= 'old'.
Uploading a file with
the same name as an existing one is considered creating a new revision
of the same file; the old one is pushed back into the old list, and the
new one put in the current place.
CREATE TABLE imagelinks (
il_from varchar(255) binary NOT NULL default '',
il_to varchar(255) binary NOT NULL default '',
KEY il_from (il_from(10)),
KEY il_to (il_to(10))
) TYPE=MyISAM;
11) Surely we could save space by using integer article ids?
The main use of this table is generating lists of pages which link to a
given image. No subqueries again...
CREATE TABLE ipblocks (
ipb_address varchar(40) binary NOT NULL default '',
ipb_user int(8) unsigned NOT NULL default '0',
ipb_by int(8) unsigned NOT NULL default '0',
ipb_reason tinyblob NOT NULL,
ipb_timestamp varchar(14) binary NOT NULL default '',
KEY ipb_address (ipb_address),
KEY ipb_user (ipb_user)
) TYPE=MyISAM PACK_KEYS=1;
12) No questions, but why is the KEY keyword used? Is this a MySQL
trigger for making indexes? Also, theres a bunch of things that could
use the UNIQUE keyword to save php code.
Sounds like.
CREATE TABLE links (
l_from varchar(255) binary NOT NULL default '',
l_to int(8) unsigned NOT NULL default '0',
KEY l_from (l_from(10)),
KEY l_to (l_to)
) TYPE=MyISAM;
13) Similar to the question about the orphanedlinks table, why is l_from
a string, while l_to is an integer?
The primary use of the links table is to generate the 'Whatlinkshere'
listing. No subqueries, and we need the titles to list the links, not
the id numbers.
CREATE TABLE old (
old_id int(8) unsigned NOT NULL auto_increment,
old_namespace tinyint(2) unsigned NOT NULL default '0',
old_title varchar(255) binary NOT NULL default '',
old_text mediumtext NOT NULL,
old_comment tinyblob NOT NULL,
old_user int(5) unsigned NOT NULL default '0',
old_user_text varchar(255) binary NOT NULL default '',
old_timestamp varchar(14) binary NOT NULL default '',
old_minor_edit tinyint(1) NOT NULL default '0',
old_flags tinyblob NOT NULL,
UNIQUE KEY old_id (old_id),
KEY old_title (old_title(20)),
KEY old_timestamp (old_timestamp)
) TYPE=MyISAM PACK_KEYS=1;
14) What is table "old" used for? Deleted articles? Versions of
articles that are no longer "the most current"?
Versions of articles that are no longer 'the most current', yes. It
would be more efficient probably to dump the title and namespace fields
and simply reference the cur_id.
CREATE TABLE oldimage (
oi_name varchar(255) binary NOT NULL default '',
oi_archive_name varchar(255) binary NOT NULL default '',
oi_size int(8) unsigned NOT NULL default '0',
oi_description tinyblob NOT NULL,
oi_user int(5) unsigned NOT NULL default '0',
oi_user_text varchar(255) binary NOT NULL default '',
oi_timestamp varchar(14) binary NOT NULL default '',
KEY oi_name (oi_name(10))
) TYPE=MyISAM PACK_KEYS=1;
15) So, history IS stored for each image? Then I recommend the scheme
of having a separate table for unique image names, and their
corresponding "image id"
...
CREATE TABLE random (
ra_current tinyint(1) unsigned NOT NULL default '0',
ra_title varchar(255) binary NOT NULL default ''
) TYPE=MyISAM PACK_KEYS=1;
16) What is this "random" table for?
Stores ~1000 randomly selected page names. When someone clicks 'Random
page', one of those is selected rather than one of the whole bunch;
every once in a while the list is regenerated. Allegedly this improves
performance.
CREATE TABLE recentchanges (
rc_timestamp varchar(14) binary NOT NULL default '',
rc_cur_time varchar(14) binary NOT NULL default '',
rc_user int(10) unsigned NOT NULL default '0',
rc_user_text varchar(255) binary NOT NULL default '',
rc_namespace tinyint(3) unsigned NOT NULL default '0',
rc_title varchar(255) binary NOT NULL default '',
rc_comment varchar(255) binary NOT NULL default '',
rc_minor tinyint(3) unsigned NOT NULL default '0',
rc_new tinyint(3) unsigned NOT NULL default '0',
rc_cur_id int(10) unsigned NOT NULL default '0',
rc_this_oldid int(10) unsigned NOT NULL default '0',
rc_last_oldid int(10) unsigned NOT NULL default '0'
) TYPE=MyISAM PACK_KEYS=1;
17) Could the recent changes table be replaced with an appropriately
defined "view"? Or are views something MySQL doesn't do very well?
Yes, probably. It's all very hackish, and if it's not helping much, a
cleaner and well-performaning replacement is welcome.
CREATE TABLE site_stats (
ss_row_id int(8) unsigned NOT NULL default '0',
ss_total_views bigint(20) unsigned default '0',
ss_total_edits bigint(20) unsigned default '0',
ss_good_articles bigint(20) unsigned default '0',
UNIQUE KEY ss_row_id (ss_row_id)
) TYPE=MyISAM;
18) What is ss_row_id and ss_good_articles? How does this table work?
Does it contain stats for each article? Shouldn't that go in the table
that already contains the info for an article? That would lessen the
number of SQL updates, collapsing two UPDATE statements into one.
This contains a single row for the whole database. The good articles
count is generated by faulty math anyway, though.
CREATE TABLE user (
user_id int(5) unsigned NOT NULL auto_increment,
user_name varchar(255) binary NOT NULL default '',
user_rights tinyblob NOT NULL,
user_password tinyblob NOT NULL,
user_newpassword tinyblob NOT NULL,
user_email tinytext NOT NULL,
user_options blob NOT NULL,
user_watch mediumblob NOT NULL,
user_newtalk tinyint(1) unsigned NOT NULL default '1',
UNIQUE KEY user_id (user_id),
KEY user_name (user_name(10))
) TYPE=MyISAM PACK_KEYS=1;
19) Does user_rights correspond to ar_flags?
No. user_rights is a comma-separated list of priveledges. Currently the
only ones defined are 'sysop' and 'developer', so possible values are:
'' (normal users)
'sysop' ("sysops" or "administrators")
'sysop,developer' ("developers")
20) What is "newpassword" for?
If you click 'e-mail me a new password', a new password is generated and
stored in this field. Then, you can log in either with this new
temporary password (in which case it becomes your regular password until
you manually change it) or with your current password. This prevents you
from having your account locked out if you accidentally (or someone else
maliciously) clicks the button with your user name given, but you don't
intend to use (or can't get it) it.
21) Are all the "options" really necessary?
Yes?
22) It might be the SQL way to make a
"watchlist" table, where each row
contains a user id, and an article id. Or maybe this would be useless
in a database that doesn't have subqueries.
Could do.
23) What is the user_newtalk field?
Set to 1 if your user talk page is edited, sets that annoying asterisk
in the corner of your screen that doesn't get cleared till you look at
the darn page.
-- brion vibber (brion @
pobox.com)