On 27.03.2012 00:37, MZMcBride wrote:
It's an ancient assumption that's built in to many parts of MediaWiki (and many outside tools and scripts). Is there any kind of assessment about the level of impact this would have?
Not formally, just my own poking at the code base. There is a lot of places in the code that access revision text, and do something with it, not all can easily be found or changed (especially true for extensions).
My proposal covers a compatibility layer that will cause legacy code to just see an empty page when trying to access the contents of a non-wikitext page. Only code aware of content models will see any non-wikitext content. This should avoid most problems, and should ensure that things will work as before at least for everything that is wikitext.
For example, would the diff engine need to be rewritten so that people can monitor these pages for vandalism?
A diff engine needs to be implemented for each content model. The existing engine(s) does not need to be rewritten, it will be used for all wikitext pages.
Will these pages be editable in the same way as current wikitext pages?
No. The entire point of this proposal is to be able to neatly supply specialized display, editing and diffing of different kinds of content.
If not, will there be special editors for the various data types?
Indeed.
What other parts of the MediaWiki codebase will be affected and to what extent?
A few classes (like Revision or WikiPage) need some major additions or changes, see the proposal on meta. Lots of places should eventually be changed to become aware of content models, but don't need to be adapted immediately (see above).
Will text still go in the text table or will separate tables and infrastructure be used?
Uh, did you read the proposal?...
All content is serialized just before storing it. It is stored into the text table using the same code as before. The content model and serialization format is recorded in the revision table.
Secondary data (index data, analogous to the link tables) may be extracted from the content and stored in separate database tables, or in some other service, as needed.
I'm reminded a little of LiquidThreads for some reason. This idea sounds good, but I'm worried about the implementation details, particularly as the assumption you seek to upend is so old and ingrained.
It's more like the transition to using MediaHandlers instead of assuming uploaded files to be images: existing concepts and actions are generalized to apply to more types of content.
LiquidThreads introduces new concepts (threads, conversations) and interactions (re-arranging, summarazing, etc) and tries to integrate them with the concepts used for wiki pages. This seems far more complicated to me.
The background is that the Wikidata project needs a way to store structured data (JSON) on wiki pages instead of wikitext. Having a pluggable system would solve that problem along with several others, like doing away with the special cases for JS/CSS, the ability to maintain categories etc separate from body text, manage Gadgets sanely on a wiki page, or several other things (see the link below).
How would this affect categories being stored in wikitext (alongside the rest of the page content text)? That part doesn't make any sense to me.
Imagine a data model that works like mime/multipart email: you have a wrapper that contains the "main" text as well as "attachments". The whole shebang gets serialized and stored in the text table, as usual. For displaying, editing and visualizing, you have code that is aware of the multipart nature of the content, and puts the parts together nicely.
However, the category stuff is a use case I'm just mentioning because it has bee requested so often in the past (namely, editing categories, interlanguage links, etc separately from the wiki text); this mechanism is not essential to the concept of ContentHandlers, and not something I plan to implement for the Wikidata project. It'S just somethign that will become much easier once we have ContentHandlers.
-- daniel