On 1/22/07, Domas Mituzas midom.lists@gmail.com wrote:
Magnus,
Yes, but that would require users to go our X million articles and retag them. Granted, voluntary user slave labor is cheap ;-) but I wanted to use the already existing data from template inclusions.
The concept (and code) for parsing the template information was written back at Frankfurt's Wikimania. The problem - how to efficiently index and query that afterwards.
Really? Who did that? Wasn't me, right? (creepy thoughts of that disease whose name I can't remember - something starting with A... ;-)
IMHO we can safely limit this to existing article => existing template, so template_variable table could be: tv_article_id int ; // Article ID tv_template_id int ; // Template ID tv_key varchar[255] ; // Alternatively int, refering to a secondary table with unique words tv_value varchar[255] ; // Indexes better than MEDIUMBLOB, and values longer than 250 bytes are too long anyway ;-)
As queries would usually be like WHERE tv_template="123" AND tv_key="year" AND tv_value LIKE "%1984%" tv_article doesn't necessarily have to be indexed.
To save indices, we could also extend tv_key to be "TEMPLATE|KEY" as "|" is not allowed in either article or variable title. We could then not index tv_template as well and go like WHERE tv_key="Persondata|year" AND tv_value LIKE "%1984%"
Either way, the LIKE clause would only be invoked a rather limited number of times, as most of the work would be done by tv_key="something", which is an index.
My 2c.
Magnus