Addendum, after sleeping over this:
Do we really want to manage something that is essentially configuration, namely the set of available content models and formats, in a database table? How is it maintained?
For context: * As per T113034, we are movign away from managing interwiki prefixes in the database, in favor of configuration files. * Namespace IDs are defined in LocalSettings.php.
The original design of ContentHandler used integer IDs for content models and formats in the DB. A mapping to human readable names is only needed for logging and error messages anyway. Such a mapping could be maintain in LocalSettings.php, just like we do for namespaces. This would also serve to avoid ID clashes. My idea back then was to have a sort of registry on mediawiki.org where extensions could reserve an ID for themselves, so that the same ID would stand for the same model everywhere.
The disadvantage is of course that the model and format are not obvious when eyeballing the result of an SQL query. It also makes database dumps more brittle, since they cannot be interpreted without knowledge of the format and model identifiers. That's an argument for having these in the DB.
Still... configuration in the database is nasty to maintain by hand, and also annoying for extensions that define content models. Do we introduce a simple hook that makes sure the content model and format gets registered in the database?
Am 11.07.2016 um 21:26 schrieb Daniel Kinzler:
Hi Jaime, thanks for the pointer! I had completely forgotten about that.
A few thoughts about that RFC:
- I have long thought that content_format is pretty pointless and redundant. I
haven't seen any content model that uses different serialization formats (I wrote a few that support two, but only ever used one). If the serialization does need to change for some reason, it's usually easy to detect from the first few bytes.
- What we need instead is versioning on the content model. It happens quite
often that the data structure you store changes slightly. Knowing what version you are dealing with is quite helpful when deserializing and processing. These differences are much harder to auto-detect than the serialization format,
- Per-page and per-revision content model will become redundant with
Multi-Content-Revisions. We will instead have this info in the revision_slot table (multiple per revision). The same design still applies, but changing the page and revision table would be pointless. We would just ignore the content model (and format) in the page and revision table, and rely on the info for the slot table instead. At some point, we can then drop this info from page and revision.
I propose to introduce the content_model (and maybe also content_format) tables, but not touch the page and revision table for now. Instead, we introduce revision_slots for Multi-Content-Revisions first, using the content_model table, and introduce model versioning; maybe drop the format in the process.
What do you think?
Am 11.07.2016 um 14:27 schrieb Jaime Crespo:
On Mon, Jul 11, 2016 at 2:07 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
It seems there is disagreement about what the correct interpretation of NULL in the rev_content_model column is. Should NULL there mean
What should we write into rev_content_model in the future
Content model handling is pending a refactoring: https://www.mediawiki.org/wiki/Requests_for_comment/Content_model_storage Once that happens, they should never be NULL.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l