On 27.03.2012 00:37, MZMcBride wrote:
It's an ancient assumption that's built in to
many parts of MediaWiki (and
many outside tools and scripts). Is there any kind of assessment about the
level of impact this would have?
Not formally, just my own poking at the code base. There is a lot of places in
the code that access revision text, and do something with it, not all can easily
be found or changed (especially true for extensions).
My proposal covers a compatibility layer that will cause legacy code to just see
an empty page when trying to access the contents of a non-wikitext page. Only
code aware of content models will see any non-wikitext content. This should
avoid most problems, and should ensure that things will work as before at least
for everything that is wikitext.
For example, would the diff engine need to be
rewritten so that people can
monitor these pages for vandalism?
A diff engine needs to be implemented for each content model. The existing
engine(s) does not need to be rewritten, it will be used for all wikitext pages.
Will these pages be editable in the same
way as current wikitext pages?
No. The entire point of this proposal is to be able to neatly supply specialized
display, editing and diffing of different kinds of content.
If not, will there be special editors for the
various data types?
Indeed.
What other parts of the MediaWiki codebase will be
affected and to what extent?
A few classes (like Revision or WikiPage) need some major additions or changes,
see the proposal on meta. Lots of places should eventually be changed to become
aware of content models, but don't need to be adapted immediately (see above).
Will text still go in the text table or will
separate tables and infrastructure be used?
Uh, did you read the proposal?...
All content is serialized just before storing it. It is stored into the text
table using the same code as before. The content model and serialization format
is recorded in the revision table.
Secondary data (index data, analogous to the link tables) may be extracted from
the content and stored in separate database tables, or in some other service, as
needed.
I'm reminded a little of LiquidThreads for some
reason. This idea sounds
good, but I'm worried about the implementation details, particularly as the
assumption you seek to upend is so old and ingrained.
It's more like the transition to using MediaHandlers instead of assuming
uploaded files to be images: existing concepts and actions are generalized to
apply to more types of content.
LiquidThreads introduces new concepts (threads, conversations) and interactions
(re-arranging, summarazing, etc) and tries to integrate them with the concepts
used for wiki pages. This seems far more complicated to me.
The background
is that the Wikidata project needs a way to store structured
data (JSON) on wiki pages instead of wikitext. Having a pluggable system would
solve that problem along with several others, like doing away with the special
cases for JS/CSS, the ability to maintain categories etc separate from body
text, manage Gadgets sanely on a wiki page, or several other things (see the
link below).
How would this affect categories being stored in wikitext (alongside the
rest of the page content text)? That part doesn't make any sense to me.
Imagine a data model that works like mime/multipart email: you have a wrapper
that contains the "main" text as well as "attachments". The whole
shebang gets
serialized and stored in the text table, as usual. For displaying, editing and
visualizing, you have code that is aware of the multipart nature of the content,
and puts the parts together nicely.
However, the category stuff is a use case I'm just mentioning because it has bee
requested so often in the past (namely, editing categories, interlanguage links,
etc separately from the wiki text); this mechanism is not essential to the
concept of ContentHandlers, and not something I plan to implement for the
Wikidata project. It'S just somethign that will become much easier once we have
ContentHandlers.
-- daniel