On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
Hi John and Risker,
First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that the most significant "bugs" (bugs, not missing features) that need fixing (and are being fixed) are those that have to do with usability tweaks.
How do you know that? Have you performed automated tests on all Wikipedia content? Or are you waiting for users to find these bugs?
My intention in that post was also not one to put some distance between us and the complaints, just to clarify that we are fixing things as fast as we can and it can be seen in the recent changes stream.
John: specific answers to the edit diffs you highlighted in your post. I acknowledge your intention to make sure we dont make false claims about VE/Parsoid's usability. Thanks for taking the time for digging them up. My answers below are made with an intention of figuring out what the issues are so they can be fixed where they need to be.
On 07/23/2013 02:50 AM, John Vandenberg wrote:
On Tue, Jul 23, 2013 at 4:32 PM, Subramanya Sastry ssastry@wikimedia.org wrote:
On 07/22/2013 10:44 PM, Tim Starling wrote:
Round-trip bugs, and bugs which cause a given wikitext input to give different HTML in Parsoid compared to MW, should have been detected during automated testing, prior to beta deployment. I don't know why we need users to report them.
500+ edits are being done per hour using Visual Editor [1] (less at this time given that it is way past midnight -- I have seen about 700/hour at times). I did go and click on over 100 links and examined the diffs. I did that twice in the last hour. I am happy to report clean diffs on all edits I checked both times.
I did run into a couple of nowiki-insertions which is, strictly speaking not erroneous and based on user input, but is more a usability issue.
What is a dirty diff? One that inserts junk unexpectedly, unrelated to the user's input?
That is correct. Strictly speaking, yes, any changes to the wikitext markup that arose from what the user didn't change.
The broken table injection bugs are still happening.
https://en.wikipedia.org/w/index.php?title=Sai_Baba_of_Shirdi&curid=1441...
If the parser isnt going to be fixed quickly to ignore tables it doesnt understand, we need to find the templates and pages with these broken tables - preferably using SQL and heuristics and fix them. The same needs to be done for all the other wikis, otherwise they are going to have the same problems happening randomly, causing lots of grief.
This maybe related to this: https://bugzilla.wikimedia.org/show_bug.cgi?id=51217 and I have a tentative fix for it as of y'day.
Fixes are of course appreciated. The pace of bugfixes is not the problem ...
VE and Parsoid devs have put in a lot and lot of effort to recognize broken wikitext source, fix it or isolate it,
My point was that you dont appear to be doing analysis of how of all Wikipedia content is broken; at least I dont see a public document listing which templates and pages are causing the parser problems, so the communities on each Wikipedia can fix them ahead of deployment.
I believe there is bug about automated testing of the parser against existing pages, which would identify problems.
I scanned the Spanish 'visualeditor' tag's 50 recentchanges earlier and found a dirty diff, which I believe hasnt been raised in bugzilla yet.
https://bugzilla.wikimedia.org/show_bug.cgi?id=51909
50 VE edits on eswp is more than one day of recentchanges. Most of the top 10 wikis have roughly the same level of testing going on. That should be a concern. The number of VE edits is about to increase on another nine Wikipedias, with very little real impact analysis having been done. That is a shame, because the enwp deployment has provided us with a list of problems which will impact those wikis if they are using the same syntax, be it weird or broken or otherwise troublesome.
and protect it across edits, and roundtrip it back in original form to prevent corruption. I think we have been largely successful but we still have more cases to go that are being exposed here which we will fix. But, occasionally, these kind of errors do show up -- and we ask for your patience as we fix these. Once again, this is not a claim to perfection, but a claim that this is not a significant source of corrupt edits. But, yes even a 0.1% error rate does mean a big number in the absolute when thousands of pages are being edited -- and we will continue to pare this down.
Is 0.1% a real data point, or a stab in the dark? Because I found two in 100 on enwp; Robert found at least one in 200 on enwp; and I found 1 in 50 on eswp.
In addition to nowikis, there are also wikilinks that are not what the user intended
https://en.wikipedia.org/w/index.php?title=Ben_Tre&curid=1822927&dif...
https://en.wikipedia.org/w/index.php?title=Celton_Manx&curid=28176434&am...
You are correct, but this is not a dirty diff. I dont want to claim this is an user error entirely -- but a combination of user and software error.
fwiw, I wasnt claiming these or the ones that followed were dirty diffs; these are other problems which the software is a contributor to, *other* than the nowiki cases we know so well.
Here is three edits to try to add a section header and a sentence, with a wikilink in the section header. (In the process they added other junk into the page, probably unintentionally.)
https://en.wikipedia.org/w/index.php?title=Port_of_Davao&action=history&...
What is the problem here exactly? (that is a question, not a challenge). The user might have entered those newlines as well.
The VE UI is confusing, and did many silly things during those edits. The user had to resort to editing in source editor to clean it up. Step through the diffs.
On Tue, Jul 23, 2013 at 6:28 PM, John Vandenberg jayvdb@gmail.com wrote:
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
Hi John and Risker,
First off, I do want to once again clarify that my intention in the
previous
post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that the most significant "bugs"
(bugs,
not missing features) that need fixing (and are being fixed) are those
that
have to do with usability tweaks.
How do you know that? Have you performed automated tests on all Wikipedia content?
Yes -- or at least a large random subset of wp content comprising 160,509 articles across a dozen or so different languages.
http://www.mediawiki.org/wiki/Parsoid/Roundtrip
--scott
On 07/23/2013 05:28 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
Hi John and Risker,
First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that the most significant "bugs" (bugs, not missing features) that need fixing (and are being fixed) are those that have to do with usability tweaks.
How do you know that? Have you performed automated tests on all Wikipedia content? Or are you waiting for users to find these bugs?
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Till late March, we used to run round trip testing on 100K enwp pages. We then moved to a mix of pages from different WPs to catch language and wiki-specific issues and fix them.
So, this is our methodology for catching parse and roundtrip errors on real WP pages and regressions.
I wont go into great details of what the 3 numbers mean and the nuances.
But, 99.6% means that 0.4% of pages still had corruptions, and that 15% of pages had syntactic dirty diffs.
However, note that this is because the serialization behaves as if the entire document is edited (which lets us stress test our seriailzation system) but is not real behavior in production. In production, our HTML two WT is smarter and attempts to only serialize modified segments and uses original wikitext for unmodified segments of the dom (called selective serialization). So, in reality, the corruption percentage should be much smaller than even the 0.4% and the dirty diffs as well will be way smaller (but you are still finding 1 in 200 or more) -- and this is separate from nowiki issues.
We are not solely dependent on users to find bugs for us, no, but in production, if there are corruptions that show up, it would be helpful if we are alerted.
Does that clarify?
VE and Parsoid devs have put in a lot and lot of effort to recognize broken wikitext source, fix it or isolate it,
My point was that you dont appear to be doing analysis of how of all Wikipedia content is broken; at least I dont see a public document listing which templates and pages are causing the parser problems, so the communities on each Wikipedia can fix them ahead of deployment.
Unfortunately, this is much harder to do. What we can consider is to periodically swap out our test pages to consider a fresh patch of pages so new kinds of problems show up in automated testing. In some cases, detecting problems automatically is equivalent to be able to fix them up automatically as well.
Gabriel is currently on a (well-deserved) vacation and once he is back, we'll discuss this issue and see what can be done. But, whenever we find problems, we've been fixing templates (about 3 or 4 fixed so far) or we fix broken wikitext as well.
We also have this desirable enhancement/tool that we could build: https://bugzilla.wikimedia.org/show_bug.cgi?id=46705
I believe there is bug about automated testing of the parser against existing pages, which would identify problems.
I scanned the Spanish 'visualeditor' tag's 50 recentchanges earlier and found a dirty diff, which I believe hasnt been raised in bugzilla yet.
https://bugzilla.wikimedia.org/show_bug.cgi?id=51909
50 VE edits on eswp is more than one day of recentchanges. Most of the top 10 wikis have roughly the same level of testing going on. That should be a concern. The number of VE edits is about to increase on another nine Wikipedias, with very little real impact analysis having been done. That is a shame, because the enwp deployment has provided us with a list of problems which will impact those wikis if they are using the same syntax, be it weird or broken or otherwise troublesome.
As indicated earlier, we have done automated RT testing on 20K pages on different WPs and fixed various problems, but yes, this will not catch all problematic scenarios.
and protect it across edits, and roundtrip it back in original form to prevent corruption. I think we have been largely successful but we still have more cases to go that are being exposed here which we will fix. But, occasionally, these kind of errors do show up -- and we ask for your patience as we fix these. Once again, this is not a claim to perfection, but a claim that this is not a significant source of corrupt edits. But, yes even a 0.1% error rate does mean a big number in the absolute when thousands of pages are being edited -- and we will continue to pare this down.
Is 0.1% a real data point, or a stab in the dark? Because I found two in 100 on enwp; Robert found at least one in 200 on enwp; and I found 1 in 50 on eswp.
Sorry -- I should have phrased that better. I just picked 0.1% as an arbitrary number to make the observation that even when it is as low as 0.1%, in absolute numbers, it can still be noticeable.
Subbu.
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
On 07/23/2013 05:28 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
Hi John and Risker,
First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that the most significant "bugs" (bugs, not missing features) that need fixing (and are being fixed) are those that have to do with usability tweaks.
How do you know that? Have you performed automated tests on all Wikipedia content? Or are you waiting for users to find these bugs?
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Fantastic! How frequently are those tests re-run? Could you add a last-run-date on that page?
Was a regression testsuite built using the issues encountered during the last parser rewrite?
-- John Vandenberg
On 07/23/2013 06:13 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
On 07/23/2013 05:28 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
Hi John and Risker,
First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that the most significant "bugs" (bugs, not missing features) that need fixing (and are being fixed) are those that have to do with usability tweaks.
How do you know that? Have you performed automated tests on all Wikipedia content? Or are you waiting for users to find these bugs?
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Fantastic! How frequently are those tests re-run? Could you add a last-run-date on that page?
The tests are re-run after a bunch of commits that we think should be regression tested -- usually updated one or more times a day (when a lot of patches are being merged) or after a few days (during periods of low activity). The last code udpate was Thursday
http://parsoid.wmflabs.org:8001/commits gives you the list of commits (and date when code was updated) http://parsoid.wmflabs.org:8001/topfails gives you individual test results on every tested page for more detail.
Currently we are updating our rt testing infrastructure to gather performance numbers as well (this has been on the cards for a long time, but never got the attention it needed). But, Marco is working that part of our codebase as we speak. https://bugzilla.wikimedia.org/show_bug.cgi?id=46659 and other related ones.
We do not deploy to production before we have run tests on a subset of pages in rt-testing. Given the nature of how tests are run, it is usually sufficient to run on about a 1000 pages to know if there are serious regressions .. sometimes we run on a larger subset of pages.
Was a regression testsuite built using the issues encountered during the last parser rewrite?
We also continually update a parser tests file (in the code repository) with minimized test cases based on regressions and odd wikitext usage. About 1100 tests so far that run in 4 modes (wt2html, wt2wt, html2wt, html2html) plus 14000 randomly generated edits to the tests to mimic edits and test our selective serializer. This is our first guard against bad code.
Subbu.
On Tue, Jul 23, 2013 at 7:13 PM, John Vandenberg jayvdb@gmail.com wrote:
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Fantastic! How frequently are those tests re-run? Could you add a last-run-date on that page?
The git sha1 displayed on the page can be turned into a timestamp. For example, it's currently showing git d5fe6c9052c23bcc0b63a4d0d1b3e5b68fd2ef37 and https://git.wikimedia.org/commit/mediawiki%2Fextensions%2FParsoid/d5fe6c9052... says that commit was authored on Fri Jul 19 10:20:39 2013 -0700. So, less than a week old (it takes a few days to crank through all the pages in its set).
Was a regression testsuite built using the issues encountered during
the last parser rewrite?
Yes, mediawiki/core/tests/parser/parserTests.txt (which predates parsoid) has been continuously updated throughout the development process. --scott
On Tue, Jul 23, 2013 at 7:24 PM, C. Scott Ananian cananian@wikimedia.orgwrote:
Was a regression testsuite built using the issues encountered during the last parser rewrite?
Yes, mediawiki/core/tests/parser/parserTests.txt (which predates parsoid) has been continuously updated throughout the development process.
If you'd like to see the set of tests:
https://git.wikimedia.org/blob/mediawiki%2Fcore/master/tests%2Fparser%2Fpars... (git.wikimedia.org was temporarily down when I wrote my previous email.) --scott
On 07/23/2013 06:02 PM, Subramanya Sastry wrote:
On 07/23/2013 05:28 PM, John Vandenberg wrote:
VE and Parsoid devs have put in a lot and lot of effort to recognize broken wikitext source, fix it or isolate it,
My point was that you dont appear to be doing analysis of how of all Wikipedia content is broken; at least I dont see a public document listing which templates and pages are causing the parser problems, so the communities on each Wikipedia can fix them ahead of deployment.
Unfortunately, this is much harder to do. What we can consider is to periodically swap out our test pages to consider a fresh patch of pages so new kinds of problems show up in automated testing. In some cases, detecting problems automatically is equivalent to be able to fix them up automatically as well.
Actually, we do have a beginnings of a page for this that I had forgotten about: http://www.mediawiki.org/wiki/Parsoid/Broken_wikitext_tar_pit I dont think this is very helpful at this time and is what you are asking for, but just pointing it out for the record that we've thought about it some.
Some of these cases -- we are actually beginning to address * fostered content in top-level pages (we handle fostering from templates) * handling of templates that produce part of a table-cell, or multiple cells, or multiple attributes of an image.
Ideally, we would not have to support these kind of use cases, but given what we are seeing in production now, we might try to deal with some of these cases ... Interestingly enough, we do a much better job of protecting against unclosed tables, fostered content out of tables, etc. when they come from templates rather than when such wikitext occurs in the page content itself. We have a couple of DOM analysis passes to detect those problems and protect them from editing ... but that needs to be extended to top level page content.
Subbu.
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Very minor point .. there are ~400 missing pages on the list; is that intentional ? ;-)
One is 'Mos:time' which is in NS 0, and does actually exist as a redirect to the WP: manual of style: https://en.wikipedia.org/wiki/Mos:time
... But, 99.6% means that 0.4% of pages still had corruptions, and that 15% of pages had syntactic dirty diffs.
So 15% is 24000 pages which can bust, but may not if the edit doesnt touch the bustable part.
Does /topfails cycle through all 24000, 40 pages at a time?
Could you provide a dump of the list of 24000 bustable pages? Split by project? Each community could then investigate those pages for broken tables, and more critically .. templates which emit broken wikisyntax that is causing your team grief.
Do you have stats on each of those eight wikipedias? i.e. is there noticeable differences in the percentages on different wikipedias? if so, can you report those percentages for each projects? I'm guessing Chinese is an example where there are higher percentages..?
-- John Vandenberg
On 07/23/2013 06:55 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Very minor point .. there are ~400 missing pages on the list; is that intentional ? ;-)
One is 'Mos:time' which is in NS 0, and does actually exist as a redirect to the WP: manual of style: https://en.wikipedia.org/wiki/Mos:time
1. Some pages get deleted and then go 404. (http://parsoid.wmflabs.org:8001/failedFetches) 2. There are some (known) bugs in our rt testing infrastructure around recording results -- should be fixed once our testing infrastructure is updated and moved to mysql (from sqlite)
... But, 99.6% means that 0.4% of pages still had corruptions, and that 15% of pages had syntactic dirty diffs.
So 15% is 24000 pages which can bust, but may not if the edit doesnt touch the bustable part.
No, 15% of pages aren't bust. 15% of pages introduce meaning-preserving (hence purely syntactic) dirty diffs depending on what piece of the page is edited. Ex: whitespace diffs, addition of " around attribute values are the most common ones.
For an example, see this: http://parsoid.wmflabs.org:8001/result/d5fe6c9052c23bcc0b63a4d0d1b3e5b68fd2e...
0.4% (~ 640) pages are classified as semantic diffs. We assign a numerical score in base 1000 (digit 3 = # errors, digit 2 = # semantic errors, digit 1: # syntactic errors). When results are sorted in reverse order of score, it gives us the most egregious pages to focus on (crashers first, semantic errors next, purely dirty diffs next).
So, going to http://parsoid.wmflabs.org:8001/topfails and paging through that will give you what you are looking for. 16 pages with 40 entries each. We hang out on #mediawiki-parsoid and can help editors make sense of the diffs if anyone wants to look for broken wikitext and fix them.
Subbu.
Does /topfails cycle through all 24000, 40 pages at a time?
Could you provide a dump of the list of 24000 bustable pages? Split by project? Each community could then investigate those pages for broken tables, and more critically .. templates which emit broken wikisyntax that is causing your team grief.
Do you have stats on each of those eight wikipedias? i.e. is there noticeable differences in the percentages on different wikipedias? if so, can you report those percentages for each projects? I'm guessing Chinese is an example where there are higher percentages..?
-- John Vandenberg
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Jul 23, 2013 at 7:55 PM, John Vandenberg jayvdb@gmail.com wrote:
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Very minor point .. there are ~400 missing pages on the list; is that intentional ? ;-)
One is 'Mos:time' which is in NS 0, and does actually exist as a redirect to the WP: manual of style: https://en.wikipedia.org/wiki/Mos:time
I think it's an artifact of the changing article set on the wikis. We created the original page set months ago, and we haven't changed it since so that our results are still comparable over time. Since then 1) some pages have been deleted/moved, and 2) we fixed parsoid not to automatically follow redirects (bug 45808).
But, 99.6% means that 0.4% of pages still had corruptions, and that 15%
of
pages had syntactic dirty diffs.
So 15% is 24000 pages which can bust, but may not if the edit doesnt touch the bustable part.
subbu covered this in his email. "yes" but only if you consider an extra unrendered newline (etc) a "bust". Syntactic diffs are wikitext differences which do *not* lead to visible differences. *Semantic* diffs are the ones which lead to visible differences. So 0.4% of the pages will "bust" iff the bustable part is touched.
Does /topfails cycle through all 24000, 40 pages at a time?
yes.
Could you provide a dump of the list of 24000 bustable pages? Split
by project? Each community could then investigate those pages for broken tables, and more critically .. templates which emit broken wikisyntax that is causing your team grief.
we could do that. Usually there will be a very small number of broken templates which end up reused in lots of places. So it's probably best to just look at the first few pages, fix the issues there, and then retest.
Do you have stats on each of those eight wikipedias? i.e. is there
noticeable differences in the percentages on different wikipedias? if so, can you report those percentages for each projects? I'm guessing Chinese is an example where there are higher percentages..?
http://parsoid.wmflabs.org:8001/stats/en gives results just for en, etc. There are 10k titles from each of en de nl fr it ru es sv pl ja ar he hi ko zh is. (Of course, some titles have been deleted/moved as described above.) --scott
On Wed, Jul 24, 2013 at 1:55 AM, John Vandenberg jayvdb@gmail.com wrote:
Could you provide a dump of the list of 24000 bustable pages? Split by project? Each community could then investigate those pages for broken tables, and more critically .. templates which emit broken wikisyntax that is causing your team grief.
As Subbu said, I'm currently working on improving the round-trip test server, mostly on porting it from sqlite to MySQL but also on expanding the stats kept (with things like performance, etc.). If you think of some other data we should track, or any new report we could add, we certainly welcome suggestions :) Please open a new bug or add to the existing one: https://bugzilla.wikimedia.org/show_bug.cgi?id=46659
Or just drop by #wikimedia-parsoid, I'm marcoil there.
Cheers, Marc
On Wed, Jul 24, 2013 at 3:10 AM, Marc Ordinas i Llopis marcoil@wikimedia.org wrote:
As Subbu said, I'm currently working on improving the round-trip test server, mostly on porting it from sqlite to MySQL but also on expanding the stats kept (with things like performance, etc.). If you think of some other data we should track, or any new report we could add, we certainly welcome suggestions :) Please open a new bug or add to the existing one: https://bugzilla.wikimedia.org/show_bug.cgi?id=46659
Thanks for working on this! The Parsoid testing infrastructure is pretty awesome.
There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the round-trip tests could round-trip from wikitext to HTML *string* and back, rather than to HTML *DOM* and back. This would help catch cases where the DOM doesn't cleanly round-trip through the HTML parser (foster-parenting for instance). It may be that this is already implemented, or that it was considered and rejected, I don't know.
Additionally, it might be helpful to have some tests looking for null DSRs or other broken data-parsoid stuff (because this breaks selser), and/or some sort of selser testing in general (though off the top of my head I'm not sure what that would look like). Another fun serialization test that could be done is stripping all data-parsoid attributes and asserting that this doesn't result in any semantic diffs (you'll get lots of syntactic diffs of course).
Or just drop by #wikimedia-parsoid, I'm marcoil there.
The channel is #mediawiki-parsoid :)
Roan
On 07/24/2013 09:58 AM, Roan Kattouw wrote:
There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the round-trip tests could round-trip from wikitext to HTML *string* and back, rather than to HTML *DOM* and back. This would help catch cases where the DOM doesn't cleanly round-trip through the HTML parser (foster-parenting for instance). It may be that this is already implemented, or that it was considered and rejected, I don't know.
Yes, we've considered this for a while now. Just not done yet since we haven't had a chance to work on the testing infrastructure in over 6 months till now.
Additionally, it might be helpful to have some tests looking for null DSRs or other broken data-parsoid stuff (because this breaks selser), and/or some sort of selser testing in general (though off the top of my head I'm not sure what that would look like). Another fun serialization test that could be done is stripping all data-parsoid attributes and asserting that this doesn't result in any semantic diffs (you'll get lots of syntactic diffs of course).
We've on and off talked about how whether we could mimic editing on real pages and test correctness of resulting wikitext -- it is unclear at this time. So, hasn't happened yet.
Also, null DSR (* see below for what a DSR is) by itself is not a serious problem -- it just means that that particular DOM node will go through regular serialization (and *might* introduce dirty diffs). We also dont want to add a lot of noise to testing results without having a way to filter useful things out of it.
But, we could brainstorm ways of doing this on IRC.
Subbu.
* DSR: Domain Source Range. Given a DOM node, a DSR tells what range of wikitext generated that piece of HTML. While seemingly simple, calculating this accurately without introducing errors is quite tricky given that wikitext is string-based and DOM is structural and there is not such a clean mapping, especially in the presence of templates that generate fragments of a HTML string (ex: generating part of an html tag like a style attribute, generating multiple table cells, or multiple attributes, etc.). Selective serialization for avoiding dirty diffs relies crucially on the accuracy of this mapping.
On Wed, Jul 24, 2013 at 11:20 AM, Subramanya Sastry ssastry@wikimedia.orgwrote:
On 07/24/2013 09:58 AM, Roan Kattouw wrote:
There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the round-trip tests could round-trip from wikitext to HTML *string* and back, rather than to HTML *DOM* and back. This would help catch cases where the DOM doesn't cleanly round-trip through the HTML parser (foster-parenting for instance). It may be that this is already implemented, or that it was considered and rejected, I don't know.
Yes, we've considered this for a while now. Just not done yet since we haven't had a chance to work on the testing infrastructure in over 6 months till now.
For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from http://www.w3.org/html/wg/wiki/Testing I believe). So although the above is probably worth doing as a low-priority task, it's really a test of the third-party library, not of Parsoid. (Although, since I'm a co-maintainer of domino, I'd be very interested in fixing any bugs which it did turn up.)
The foster parenting issues mostly arise in the wikitext->parsoid DOM phase. Basically, the wikitext is tokenized into a HTML tag soup and then a customized version of the standard HTML parser is used to assemble the soup into a DOM, mimicking the process by which a browser would parse the tag soup emitted by the current PHP parser. So the existing test suite does expose these foster-parenting issues already. --scott
On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian cananian@wikimedia.org wrote:
For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from http://www.w3.org/html/wg/wiki/Testing I believe). So although the above is probably worth doing as a low-priority task, it's really a test of the third-party library, not of Parsoid. (Although, since I'm a co-maintainer of domino, I'd be very interested in fixing any bugs which it did turn up.)
I didn't mean it as a test of Domino, I meant it as a test of Parsoid: does it generate things that are then foster-parented out, or other things that a compliant DOM parser won't round-trip? It's also a more realistic test, because the way that Parsoid is actually used by VE in practice is that it serializes its DOM, sends it over the wire to VE, which then does things with it and gives an HTML string back, which is then parsed through Domino. So even in normal operation, ignoring the fact that VE runs stuff through the browser's DOM parser, Parsoid itself already round-trips the HTML through Domino, effectively.
The foster parenting issues mostly arise in the wikitext->parsoid DOM phase. Basically, the wikitext is tokenized into a HTML tag soup and then a customized version of the standard HTML parser is used to assemble the soup into a DOM, mimicking the process by which a browser would parse the tag soup emitted by the current PHP parser. So the existing test suite does expose these foster-parenting issues already.
Does it really? There were a number of foster-parenting issues a few months ago where Parsoid inserted <meta> tags in places where they can't be put (e.g. <tr>s), and no one in the Parsoid team seemed to have noticed until I tracked down a few VE bugs to that problem.
Roan
On 07/25/2013 01:03 PM, Roan Kattouw wrote:
On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian cananian@wikimedia.org wrote:
For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from http://www.w3.org/html/wg/wiki/Testing I believe). So although the above is probably worth doing as a low-priority task, it's really a test of the third-party library, not of Parsoid. (Although, since I'm a co-maintainer of domino, I'd be very interested in fixing any bugs which it did turn up.)
I didn't mean it as a test of Domino, I meant it as a test of Parsoid: does it generate things that are then foster-parented out, or other things that a compliant DOM parser won't round-trip? It's also a more realistic test, because the way that Parsoid is actually used by VE in practice is that it serializes its DOM, sends it over the wire to VE, which then does things with it and gives an HTML string back, which is then parsed through Domino. So even in normal operation, ignoring the fact that VE runs stuff through the browser's DOM parser, Parsoid itself already round-trips the HTML through Domino, effectively.
We use two different libraries for different things:
* html5 library for building a DOM from a tag soup * domino for serializing DOM --> HTML and for parsing HTML --> DOM
When doing a WT2WT roundtrip test, there are 2 ways to do this:
1. wikitext --> tag soup --> DOM (in-memory tree) --> wikitext 2. wikitext --> tag soup --> DOM (in-memory tree) --> HTML (string)--> DOM --> wikitext
We currently do 1. in our wt2wt testing. If there are foster-parenting bugs in the HTML5 library, then they will get hidden if we use path 1. However, when using VE and serializing its result back to wikitext, we are effectively using path 2.
And, both Roan and Scott are correct. Pathway 2. would be a test of of external libraries (HTML5 and Domino, not just domino). And, we did have bugs in the HTML5 parsing library we used (which I fixed based on reports from Roan) and then added them to parser tests.
But, if we use path 2. for all our RT testing for wp pages, other latent bugs with fostered content will show up.
Hope this clarifies the issue.
Subbu.
The foster parenting issues mostly arise in the wikitext->parsoid DOM phase. Basically, the wikitext is tokenized into a HTML tag soup and then a customized version of the standard HTML parser is used to assemble the soup into a DOM, mimicking the process by which a browser would parse the tag soup emitted by the current PHP parser. So the existing test suite does expose these foster-parenting issues already.
Does it really? There were a number of foster-parenting issues a few months ago where Parsoid inserted <meta> tags in places where they can't be put (e.g. <tr>s), and no one in the Parsoid team seemed to have noticed until I tracked down a few VE bugs to that problem.
Roan
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Jul 25, 2013 at 2:19 PM, Subramanya Sastry ssastry@wikimedia.orgwrote:
And, both Roan and Scott are correct. Pathway 2. would be a test of of external libraries (HTML5 and Domino, not just domino). And, we did have bugs in the HTML5 parsing library we used (which I fixed based on reports from Roan) and then added them to parser tests.
If you're playing along at home, the domino bug was: https://github.com/fgnass/domino/pull/36
Hopefully there's not too many more of those lurking. --scott
On Wed, Jul 24, 2013 at 4:58 PM, Roan Kattouw roan.kattouw@gmail.comwrote:
Or just drop by #wikimedia-parsoid, I'm marcoil there.
The channel is #mediawiki-parsoid :)
Yes, sorry… I hadn't had enough coffee :)
wikitech-l@lists.wikimedia.org