On Monday, July 11, 2016, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
It seems there is disagreement about what the correct interpretation of
NULL in
the rev_content_model column is. Should NULL there mean
(a) "the current page content model, as recorded in page_content_model"
or should it mean
(b) "the default for this title, no matter what page_content_model says"?
Kunal and I have had an unintentional edit war about this question in
Revision.php:
Kunal changed it from (a) to (b) in
https://gerrit.wikimedia.org/r/#/c/222043/
I later changed it from (b) to (a) in
https://gerrit.wikimedia.org/r/#/c/297787/
Kunal reverted me from (a) to (b) in
https://gerrit.wikimedia.org/r/#/c/298239/
So, which way do we want it?
The conflict seems to arise from (at least) three competing use cases:
I) re-interpreting page content. For instance, a user may move a misnamed User:Foo.jss to User:Foo.js. In this case, the content should be
re-interpreted
as JavaScript, including all old revisions. This would be in favor of
behavior
(a), though it still works with (b), because the default model changes
based on
the suffix ".js". I think it would however be better to only rely on title parsing magic once, when creating the page, not later, when rendering old
revisions.
II) converting page content. For instance, if a talk page gets converted
to
using Flow, new revisions (and page_content_model) will have the Flow
model,
while old revisions need to keep their original wikitext model (even
though
their rev_content_model is null). That would need behavior (b).
III) changing a namespace's default content model. E.g. when installing an extension that changes the default content model of a namespace (such as Wikibase with Items in the main namespace, or Flow-per-default for Talk
pages),
existing pages that were already in that namespace should still be
readable.
With (b), this would fail: even though page_content_model has the correct
model
for reading the page, rev_content_model is null, so the new namespace
default is
used, which will fail. With (a), this would simply work: the page will be rendered according to page_content_model.
In all cases it's possible to resolve the issue by replacing the NULL
entries
for all revisions of a page with the current model id. The question is
just when
and how we do that, and when and how we can even detect that this needs
doing.
There is also an in-between option, let's call it a/b: fall back to page_content_model for the latest revision (that should *always* be
right), but
to ignore page_content_model for older revisions. That would cater to use
case
III at least in so far as it would be possible to view the "misplaced"
pages.
But viewing old revisions or diffs would still fail with a nasty error.
This
option may look better on the surface, but I fear it will just add to the
confusion.
There's another fix: never write null into rev_content_model. Always put
the
actual model ID there. That's pretty wasteful, but it's robust and
reliable.
When we decided to use null as a placeholder for the default, we assumed
the
default would never change. But as we now see, it sometimes does...
So, what should it be, option (a) or (b)? And how do we address the use
case
that is then broken? What should we write into rev_content_model in the
future?
I personally think that option (a) makes more sense, because the
resolutions of
defaults is then local to the database. It could even be done within the
SQL
query. It's easier to maintain consistency that way. For use case II,
that would
require us to "fill in" all the rev_content_model fields in old revisions
when
converting a page. I think it would be a good thing to do that. If we
have the
content model change between revisions, it seems prudent to record it
explicitly.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
To me, (b) makes more sense, as all the other fields in page represent the info for the current revision. Additionally all the fields in revision (except rev_deleted) are immutable and never change, and definitely dont change interpretation based on other db fields. Having old revisions have a dependency on the page table (especially a dependency going in the direction revision->page) seems wrong to me.
-- bawolff