Re: [Wikitech-l] What would be a perfect wiki syntax? (Re: WYSIWYG)

6 Jan 2011

On Thu, Jan 6, 2011 at 11:38 AM, Brion Vibber &lt;brion(a)pobox.com&gt; wrote:
...
  On Thu, Jan 6, 2011 at 11:01 AM, Jay Ashworth
&lt;jra(a)baylink.com&gt; wrote:
   From:
"George Herbert" &lt;george.herbert(a)gmail.com&gt;
 I suspect that diffs are relatively rare events in the day to day WMF
 processing, though non-trivial. 
 Every single time you make an edit, unless I badly misunderstand the
 current
 architecture; that's how it's possible for multiple people editing the
 same article not to collide unless their edits actually collide at the
 paragraph level.

 Not to mention pulling old versions.

 Can someone who knows the current code better than me confirm or deny?

 There's a few separate issues mixed up here, I think.

 First: diffs for viewing and the external diff3 merging for resolving edit
 conflicts are actually unrelated code paths and use separate diff engines.
 (Nor does diff3 get used at all unless there actually is a conflict to
 resolve -- if nobody else edited since your change, it's not called.)

 Second: the notion that diffing a structured document must inherently be
 very slow is, I think, not right.

 A well-structured document should be pretty diff-friendly actually; our
 diffs are already working on two separate levels (paragraphs as a whole,
 then words within matched paragraphs). In the most common cases, the diffing
 might actually work pretty much the same -- look for nodes that match, then
 move on to nodes that don't; within changed nodes, look for sub-nodes that
 can be highlighted. Comparisons between nodes may be slower than straight
 strings, but the basic algorithms don't need to be hugely different, and the
 implementation can be in heavily-optimized C++ just like our text diffs are
 today.

 Third: the most common diff view cases are likely adjacent revisions of
 recent edits, which smells like cache. :) Heck, these could be made once and
 then simply *stored*, never needing to be recalculated again.

 Fourth: the notion that diffing structured documents would be overwhelming
 for the entire Wikimedia infrastructure... even if we assume such diffs are
 much slower, I think this is not really an issue compared to the huge CPU
 savings that it could bring elsewhere.

 The biggest user of CPU has long been parsing and re-parsing of wikitext.
 Every time someone comes along with different view preferences, we have to
 parse again. Every time a template or image changes, we have to parse again.
 Every time there's an edit, we have to parse again. Every time something
 fell out of cache, we have to parse again.

 And that parsing is *really expensive* on large, complex pages. Much of the
 history of MediaWiki's parser development has been in figuring out how to
 avoid parsing quite as much, or setting limits to keep the worst corner
 cases from bringing down the server farm.

 We parse *way*, *wayyyyy* more than we diff.
[...] 
Even if we diff on average 2-3x per edit, we're only doing order ten
edits a second across the projects, right?  Not going to dig up the
current stats, but that's what I remember from last time I looked.

So; priority remains parser and actual used syntax cleanup, from a
sanity point of view (being able to describe the syntax usefully, and
in a way that allows multiple parsers to be written), with diff
management as a distant low-impact priority...

-- 
-george william herbert
george.herbert(a)gmail.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] What would be a perfect wiki syntax? (Re: WYSIWYG)