On 07/24/2013 09:58 AM, Roan Kattouw wrote:
There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the round-trip tests could round-trip from wikitext to HTML *string* and back, rather than to HTML *DOM* and back. This would help catch cases where the DOM doesn't cleanly round-trip through the HTML parser (foster-parenting for instance). It may be that this is already implemented, or that it was considered and rejected, I don't know.
Yes, we've considered this for a while now. Just not done yet since we haven't had a chance to work on the testing infrastructure in over 6 months till now.
Additionally, it might be helpful to have some tests looking for null DSRs or other broken data-parsoid stuff (because this breaks selser), and/or some sort of selser testing in general (though off the top of my head I'm not sure what that would look like). Another fun serialization test that could be done is stripping all data-parsoid attributes and asserting that this doesn't result in any semantic diffs (you'll get lots of syntactic diffs of course).
We've on and off talked about how whether we could mimic editing on real pages and test correctness of resulting wikitext -- it is unclear at this time. So, hasn't happened yet.
Also, null DSR (* see below for what a DSR is) by itself is not a serious problem -- it just means that that particular DOM node will go through regular serialization (and *might* introduce dirty diffs). We also dont want to add a lot of noise to testing results without having a way to filter useful things out of it.
But, we could brainstorm ways of doing this on IRC.
Subbu.
* DSR: Domain Source Range. Given a DOM node, a DSR tells what range of wikitext generated that piece of HTML. While seemingly simple, calculating this accurately without introducing errors is quite tricky given that wikitext is string-based and DOM is structural and there is not such a clean mapping, especially in the presence of templates that generate fragments of a HTML string (ex: generating part of an html tag like a style attribute, generating multiple table cells, or multiple attributes, etc.). Selective serialization for avoiding dirty diffs relies crucially on the accuracy of this mapping.