On Monday, July 11, 2016, Daniel Kinzler <daniel.kinzler(a)wikimedia.de>
wrote:
It seems there is disagreement about what the correct
interpretation of
NULL in
the rev_content_model column is. Should NULL there
mean
(a) "the current page content model, as recorded in page_content_model"
or should it mean
(b) "the default for this title, no matter what page_content_model says"?
Kunal and I have had an unintentional edit war about this question in
Revision.php:
Kunal changed it from (a) to (b) in
https://gerrit.wikimedia.org/r/#/c/222043/
I later changed it from (b) to (a) in
https://gerrit.wikimedia.org/r/#/c/297787/
Kunal reverted me from (a) to (b) in
https://gerrit.wikimedia.org/r/#/c/298239/
So, which way do we want it?
The conflict seems to arise from (at least) three competing use cases:
I) re-interpreting page content. For instance, a user may move a misnamed
User:Foo.jss to User:Foo.js. In this case, the content should be
re-interpreted
as JavaScript, including all old revisions. This would
be in favor of
behavior
(a), though it still works with (b), because the
default model changes
based on
the suffix ".js". I think it would however
be better to only rely on title
parsing magic once, when creating the page, not later, when rendering old
revisions.
II) converting page content. For instance, if a talk page gets converted
to
using Flow, new revisions (and page_content_model)
will have the Flow
model,
while old revisions need to keep their original
wikitext model (even
though
their rev_content_model is null). That would need
behavior (b).
III) changing a namespace's default content model. E.g. when installing an
extension that changes the default content model of a namespace (such as
Wikibase with Items in the main namespace, or Flow-per-default for Talk
pages),
existing pages that were already in that namespace
should still be
readable.
With (b), this would fail: even though
page_content_model has the correct
model
for reading the page, rev_content_model is null, so
the new namespace
default is
used, which will fail. With (a), this would simply
work: the page will be
rendered according to page_content_model.
In all cases it's possible to resolve the issue by replacing the NULL
entries
for all revisions of a page with the current model id.
The question is
just when
and how we do that, and when and how we can even
detect that this needs
doing.
There is also an in-between option, let's call it a/b: fall back to
page_content_model for the latest revision (that should *always* be
right), but
to ignore page_content_model for older revisions. That
would cater to use
case
III at least in so far as it would be possible to view
the "misplaced"
pages.
But viewing old revisions or diffs would still fail
with a nasty error.
This
option may look better on the surface, but I fear it
will just add to the
confusion.
There's another fix: never write null into rev_content_model. Always put
the
actual model ID there. That's pretty wasteful, but
it's robust and
reliable.
When we decided to use null as a placeholder for the
default, we assumed
the
default would never change. But as we now see, it
sometimes does...
So, what should it be, option (a) or (b)? And how do we address the use
case
that is then broken? What should we write into
rev_content_model in the
future?
I personally think that option (a) makes more sense, because the
resolutions of
defaults is then local to the database. It could even
be done within the
SQL
query. It's easier to maintain consistency that
way. For use case II,
that would
require us to "fill in" all the
rev_content_model fields in old revisions
when
converting a page. I think it would be a good thing to
do that. If we
have the
content model change between revisions, it seems
prudent to record it
explicitly.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
To me, (b) makes more sense, as all the other fields in page represent the
info for the current revision. Additionally all the fields in revision
(except rev_deleted) are immutable and never change, and definitely dont
change interpretation based on other db fields. Having old revisions have a
dependency on the page table (especially a dependency going in the
direction revision->page) seems wrong to me.
--
bawolff