Hi,
for example, the extension AJAXPoll adds and uses two new database tables to a MediaWiki installation. This specific extension could be rewritten to use only one new table.
My questions: 1. Is there a policy, convention, that more than one new table should be avoided in extensions ? 2. Are two or more new tables tolerated?
On 17.04.2012, 11:05 Thomas wrote:
Hi,
for example, the extension AJAXPoll adds and uses two new database tables to a MediaWiki installation. This specific extension could be rewritten to use only one new table.
My questions:
- Is there a policy, convention, that more than one new table should be
avoided in extensions ?
What for?
- Are two or more new tables tolerated?
Much more, even in WMF-deployed extensions.
On Apr 17, 2012, at 9:05 AM, Thomas Gries wrote:
Hi,
for example, the extension AJAXPoll adds and uses two new database tables to a MediaWiki installation. This specific extension could be rewritten to use only one new table.
My questions:
- Is there a policy, convention, that more than one new table should be
avoided in extensions ? 2. Are two or more new tables tolerated?
If it it required, then sure it's tolerated. Some of the extensions currently deployed on Wikipedia have lots more tables even.
Of course it goes without saying, that if you can optimize the number of tables without sacrificing performance, then by all means: Go for it.
If you could merge the tables and make it still perform well with the right database indexes, why not :)
On the other hand, if it means the table will be significantly larger, then it may be better to keep them separate. For example, I'd say it's better two tables (say, 'group' and 'item', where item.it_group refers to group.gr_id). So that you don't have to repeat all information about the group in each item-row, and if the group has to change, no need to change all item-rows.
-- Krinkle
On Tue, Apr 17, 2012 at 10:51 PM, Krinkle krinklemail@gmail.com wrote:
On Apr 17, 2012, at 9:05 AM, Thomas Gries wrote:
Hi,
for example, the extension AJAXPoll adds and uses two new database tables to a MediaWiki installation. This specific extension could be rewritten to use only one new table.
My questions:
- Is there a policy, convention, that more than one new table should be
avoided in extensions ? 2. Are two or more new tables tolerated?
If it it required, then sure it's tolerated. Some of the extensions currently deployed on Wikipedia have lots more tables even.
Of course it goes without saying, that if you can optimize the number of tables without sacrificing performance, then by all means: Go for it.
If you could merge the tables and make it still perform well with the right database indexes, why not :)
On the other hand, if it means the table will be significantly larger, then it may be better to keep them separate. For example, I'd say it's better two tables (say, 'group' and 'item', where item.it_group refers to group.gr_id). So that you don't have to repeat all information about the group in each item-row, and if the group has to change, no need to change all item-rows.
-- Krinkle
Am I reading this right as suggesting and encouragement of database denormalisation in extensions?
On Tue, Apr 17, 2012 at 5:37 PM, Martijn Hoekstra martijnhoekstra@gmail.com wrote:
On Tue, Apr 17, 2012 at 10:51 PM, Krinkle krinklemail@gmail.com wrote:
On Apr 17, 2012, at 9:05 AM, Thomas Gries wrote:
Hi,
for example, the extension AJAXPoll adds and uses two new database tables to a MediaWiki installation. This specific extension could be rewritten to use only one new table.
My questions:
- Is there a policy, convention, that more than one new table should be
avoided in extensions ? 2. Are two or more new tables tolerated?
If it it required, then sure it's tolerated. Some of the extensions currently deployed on Wikipedia have lots more tables even.
Of course it goes without saying, that if you can optimize the number of tables without sacrificing performance, then by all means: Go for it.
If you could merge the tables and make it still perform well with the right database indexes, why not :)
On the other hand, if it means the table will be significantly larger, then it may be better to keep them separate. For example, I'd say it's better two tables (say, 'group' and 'item', where item.it_group refers to group.gr_id). So that you don't have to repeat all information about the group in each item-row, and if the group has to change, no need to change all item-rows.
-- Krinkle
Am I reading this right as suggesting and encouragement of database denormalisation in extensions?
You're right, I was thinking the same thing. I don't know why we'd suggest such a thing :)
-Chad
On Wed, Apr 18, 2012 at 12:16 AM, Roan Kattouw roan.kattouw@gmail.com wrote:
On Tue, Apr 17, 2012 at 5:37 PM, Martijn Hoekstra martijnhoekstra@gmail.com wrote:
On Tue, Apr 17, 2012 at 10:51 PM, Krinkle krinklemail@gmail.com wrote:
On Apr 17, 2012, at 9:05 AM, Thomas Gries wrote:
My questions:
- Is there a policy, convention, that more than one new table should
be
avoided in extensions ? 2. Are two or more new tables tolerated?
If it it required, then sure it's tolerated. Some of the extensions
currently
deployed on Wikipedia have lots more tables even.
Of course it goes without saying, that if you can optimize the number
of tables
without sacrificing performance, then by all means: Go for it.
If you could merge the tables and make it still perform well with the
right
database indexes, why not :)
On the other hand, if it means the table will be significantly larger,
then it
may be better to keep them separate. For example, I'd say it's better
two tables
(say, 'group' and 'item', where item.it_group refers to group.gr_id).
So that
you don't have to repeat all information about the group in each
item-row, and
if the group has to change, no need to change all item-rows.
-- Krinkle
Am I reading this right as suggesting and encouragement of database denormalisation in extensions?
Ignore what Krinkle said. We DO NOT encourage denormalization, except
where necessary for performance reasons.
Your extension should have a sanely designed database schema which is normalized in as far as it makes sense. Don't feel bad about creating too many or too few tables, just try to design the schema the sanest way you can.
Who said anything about denormalization[1]? Maybe I'm missing something here, but I think we're saying the same thing.
What I meant (and thought I made clear) was that one should put a little bit of thinking into the database design, using as many or as few tables as it needs to work well. Preferably without duplication of information by splitting it into separate logical tables (such as the 'group' / 'item' example I mentioned, which is quite common in MediaWiki and in pretty much any other major SQL-backed web application). Maybe my description of the "merge" was a bit too vague, but let me elaborate on what I meant.
I wanted to add to the discussion that creating separate tables is not inherently good or bad on itself. Sometimes it makes sense to use less tables, sometimes it makes sense to use more tables. In the above cited mail I mentioned a group/item relation where it is best to keep them in separate logical tables. Here is an example where not splitting it up might make sense: A system for managing lists with items of a certain type (where the types are variable). Then it may make more sense to have a single table for the list items (with a column indicating the item type) and a table for lists and a table for types. So, only 1 table for items with a column to indicate the item type, rather than having a separate item-table for each item-type. Again, it depends on the situation and on how variable "variable" is.
-- Krinkle
On Tue, Apr 17, 2012 at 8:58 PM, Krinkle krinklemail@gmail.com wrote:
Who said anything about denormalization[1]? Maybe I'm missing something here, but I think we're saying the same thing.
Now that you've elaborated I can tell that we are. It's just that "If you could merge the tables and make it still perform well with the right database indexes, why not :)" sounded like it was encouraging denormalization, or at least 3 people (Martijn, Chad and myself) parsed it that way.
Glad to see this is all a misunderstanding :)
Roan
On Tue, Apr 17, 2012 at 2:37 PM, Martijn Hoekstra martijnhoekstra@gmail.com wrote:
Am I reading this right as suggesting and encouragement of database denormalisation in extensions?
Ignore what Krinkle said. We DO NOT encourage denormalization, except where necessary for performance reasons.
Your extension should have a sanely designed database schema which is normalized in as far as it makes sense. Don't feel bad about creating too many or too few tables, just try to design the schema the sanest way you can.
Roan
I suggest that all extension tables should have at least the extension's name as common prefix.
On Wed, Apr 18, 2012 at 3:49 PM, Thomas Gries mail@tgries.de wrote:
I suggest that all extension tables should have at least the extension's name as common prefix.
I think/hope that this is already common practice.
wikitech-l@lists.wikimedia.org