On 07/24/2013 09:58 AM, Roan Kattouw wrote:
There are a few things I wish it tested, but
they're mostly about how
it tests things rather than what data is collected. For instance, it
would be nice if the round-trip tests could round-trip from wikitext
to HTML *string* and back, rather than to HTML *DOM* and back. This
would help catch cases where the DOM doesn't cleanly round-trip
through the HTML parser (foster-parenting for instance). It may be
that this is already implemented, or that it was considered and
rejected, I don't know.
Yes, we've considered this for a while now. Just not done yet since we
haven't had a chance to work on the testing infrastructure in over 6
months till now.
Additionally, it might be helpful to have some tests
looking for null
DSRs or other broken data-parsoid stuff (because this breaks selser),
and/or some sort of selser testing in general (though off the top of
my head I'm not sure what that would look like). Another fun
serialization test that could be done is stripping all data-parsoid
attributes and asserting that this doesn't result in any semantic
diffs (you'll get lots of syntactic diffs of course).
We've on and off talked about how whether we could mimic editing on real
pages and test correctness of resulting wikitext -- it is unclear at
this time. So, hasn't happened yet.
Also, null DSR (* see below for what a DSR is) by itself is not a
serious problem -- it just means that that particular DOM node will go
through regular serialization (and *might* introduce dirty diffs). We
also dont want to add a lot of noise to testing results without having a
way to filter useful things out of it.
But, we could brainstorm ways of doing this on IRC.
Subbu.
* DSR: Domain Source Range. Given a DOM node, a DSR tells what range of
wikitext generated that piece of HTML. While seemingly simple,
calculating this accurately without introducing errors is quite tricky
given that wikitext is string-based and DOM is structural and there is
not such a clean mapping, especially in the presence of templates that
generate fragments of a HTML string (ex: generating part of an html tag
like a style attribute, generating multiple table cells, or multiple
attributes, etc.). Selective serialization for avoiding dirty diffs
relies crucially on the accuracy of this mapping.