[Mediawiki-l] New diff feature for MediaWiki

Elliott F. Cable ecable at avxw.com
Wed Jun 7 22:47:49 UTC 2006


I'm forwarding this to MediaWiki-l also. I think it's more relevant  
there anyway.

I like that a lot, and think it should be implemented in MediaWiki  
ASAP. I'm e-mailing him to ask why he only wants to contribute if  
it's used in 'teh wikipedia'<!-- wikicrapia more like it -->, there  
are lots of other wiki sites out there that need this and could  
definitely use it. I'm aallll for this, and willing to help in any  
way I can.

On Jun 6, 2006, at 2:13 PM, Roman Nosov wrote:
> Hi all,
>
> I've been directed here by Brion, Robchurch and others on  
> #wikimedia-tech.
> So I propose a new feature for Wikipedia which people on
> #wikimedia-tech mostly refer as blame page or blame map. I would
> prefer to call it something like "Track contributions mode" (because
> of similarity with MS Word track changes mode) or "Hall of fame" but
> whatever. I have live prototype written in PHP&MySQL at
> http://217.147.83.36:9000/ Example of "blame map" can be seen at
> http://217.147.83.36:9000/history::171 two blame maps compared
> http://217.147.83.36:9000/history::171=169
>
> For some reason folks at #wikimedia-tech. were mainly concerned with
> speed and almost nothing else so I'll try explaining performance
> issues as best as I can.
>
> First of all, I DO NOT propose to recalculate diffs for all zillions
> of edits Wikipedia already has. Diffs would only be calculated for a
> new edits.
> Next, I want to explain in detail how I see this working. So first I
> propose to modify revision table and add a flag with following
> possible values: "Revision is too old to be diffed", "Revision is
> awaiting to be diffed", "Revision has been diffed". Also another table
> should be added that will store blame maps for each revision. Blame
> map for each subsequent revision will be calculated incrementally. So
> it doesn't really matter whether article has 10 or 1000 revisions. We
> would only need last blame map.
>
> I also propose to have separate dedicated diff server(s) with sole job
> to calculate diffs in background. I.e. diff server grabs revision with
> "Revision is awaiting to be diffed" flag and last blame map from
> database, calculates diff and finally stores new blame map in the
> database and also changes revision flag to "Revision has been diffed".
> Repeat.
>
> In addition, article display logic should be altered. The module that
> displays article should check diff flag. If diff flag is set to
> "Revision is too old to be diffed" no further changes needed. If diff
> flag is set to "Revision is awaiting to be diffed" then Credits
> section should be created that only contains message "Calculation in
> progress". If diff flag is set to "Revision has been diffed" then
> Credits section should be created that contains list of contributors
> ordered by contribution size. The list of contributors in correct
> order can be generated with a single select to blame map table. In
> addition this select can be cached. Direct link to blame map should be
> displayed too. If user clicks on this blame map link corresponding
> blame map should be presented. Every blame map can be generated with a
> single select and can be placed in cache. Yawn
>
> If you are still awake by now more thoughts on fault tolerance here.
> Should diff server die, crash, fail or whatever the only side effect
> end user will see is "Calculation in progress" message right after
> article body. That's it. No slowdown or anything. If user still wants
> see some kind of diff he/she can still use old diff engine. Because
> blame maps aren't calculated in real time this feature is impractical
> target for DoS attacks. However I should point out that any real time
> diff algorithm is one big fat target for DoS attacks on other wikis
> which are run on single server without some sort of acceleration.
>
> There is also small Unicode issue. Due to crappy utf-8 support in PHP
> all non-latin characters are currently ignored. I believe this could
> be solved either by enabling proper Unicode support in PHP or writing
> custom code to separate words. But before that I propose to test on
> English Wikipedia first because if it will works for English it should
> work for other languages.
>
> So I offer following practical steps. Dedicate one of servers to be
> diff playground. I will need a shell account on this server. Install
> mediawiki on it alongside with diff logic running in background.
> Create read only mysql account on live database server. So as a result
> this diff server can grab new revisions from live database, diff them
> and store results locally. This way we can find out how many edits
> single server can process and see how many servers this feature will
> require in total (I don't think it will be more than 2-3 though).
>
> In conclusion, I'd like to say that in my opinion this feature will be
> useful and practical if implemented. It also can be crucial building
> block for other interesting features. However, I want to stress that
> I'm not interested in doing this *unless* it is used in English
> Wikipedia and I'm given appropriate credit. I can give a reason why I
> want that in private e-mail.
>
> Thank you for reading this long and boring e-mail.
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l at wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l




More information about the MediaWiki-l mailing list