Re: [Wikitech-l] dirty diffs and VE

24 Jul 2013


      On 07/24/2013 09:58 AM, Roan Kattouw wrote:
...
There are a few things I wish it tested, but they're mostly about how 
it tests things rather than what data is collected. For instance, it 
would be nice if the round-trip tests could round-trip from wikitext 
to HTML *string* and back, rather than to HTML *DOM* and back. This 
would help catch cases where the DOM doesn't cleanly round-trip 
through the HTML parser (foster-parenting for instance). It may be 
that this is already implemented, or that it was considered and 
rejected, I don't know.
Yes, we've considered this for a while now.  Just not done yet since we 
haven't had a chance to work on the testing infrastructure in over 6 
months till now.
...
Additionally, it might be helpful to have some tests looking for null 
DSRs or other broken data-parsoid stuff (because this breaks selser), 
and/or some sort of selser testing in general (though off the top of 
my head I'm not sure what that would look like). Another fun 
serialization test that could be done is stripping all data-parsoid 
attributes and asserting that this doesn't result in any semantic 
diffs (you'll get lots of syntactic diffs of course).
We've on and off talked about how whether we could mimic editing on real 
pages and test correctness of resulting wikitext -- it is unclear at 
this time.  So, hasn't happened yet.
Also, null DSR  (* see below for what a DSR is)  by itself is not a 
serious problem -- it just means that that particular DOM node will go 
through regular serialization (and *might* introduce dirty diffs).  We 
also dont want to add a lot of noise to testing results without having a 
way to filter useful things out of it.
But, we could brainstorm ways of doing this on IRC.
Subbu.
* DSR: Domain Source Range.  Given a DOM node, a DSR tells what range of 
wikitext generated that piece of HTML.  While seemingly simple, 
calculating this accurately without introducing errors is quite tricky 
given that wikitext is string-based and DOM is structural and there is 
not such a clean mapping, especially in the presence of templates that 
generate fragments of a HTML string (ex: generating part of an html tag 
like a style attribute, generating multiple table cells, or multiple 
attributes, etc.).  Selective serialization for avoiding dirty diffs 
relies crucially on the accuracy of this mapping.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] dirty diffs and VE