On 1/22/07, Domas Mituzas <midom.lists(a)gmail.com> wrote:
Magnus,
Yes, but that would require users to go our X
million articles and
retag them. Granted, voluntary user slave labor is cheap ;-) but I
wanted to use the already existing data from template inclusions.
The concept (and code) for parsing the template information was
written back at Frankfurt's Wikimania. The problem - how to
efficiently index and query that afterwards.
Really? Who did that? Wasn't me, right? (creepy thoughts of that
disease whose name I can't remember - something starting with A... ;-)
IMHO we can safely limit this to existing article => existing
template, so template_variable table could be:
tv_article_id int ; // Article ID
tv_template_id int ; // Template ID
tv_key varchar[255] ; // Alternatively int, refering to a secondary
table with unique words
tv_value varchar[255] ; // Indexes better than MEDIUMBLOB, and values
longer than 250 bytes are too long anyway ;-)
As queries would usually be like
WHERE tv_template="123" AND tv_key="year" AND tv_value LIKE
"%1984%"
tv_article doesn't necessarily have to be indexed.
To save indices, we could also extend tv_key to be "TEMPLATE|KEY" as
"|" is not allowed in either article or variable title. We could then
not index tv_template as well and go like
WHERE tv_key="Persondata|year" AND tv_value LIKE "%1984%"
Either way, the LIKE clause would only be invoked a rather limited
number of times, as most of the work would be done by
tv_key="something", which is an index.
My 2c.
Magnus