Hi Bin,
Thanks for the reports. Please feel free to add yourselves to the
relevant bug reports to track progress. You can also file additional bug
reports against Parsoid here:
https://bugzilla.wikimedia.org/enter_bug.cgi?product=Parsoid
On 12/02/2013 01:39 AM, Bin Li (李斌) wrote:
Hi Parsoid developers,
I have compared Wikipedia HTML and Parsoid HTML (same title and
oldid) for 500 random samples. And I found some bug examples and
difference patterns that may help you. We also expect the bugs to be
fixed. Thanks! Below are the examples:
Bug examples:
1. In
http://parsoid-lb.eqiad.wikimedia.org/enwiki/1913_Gettysburg_reunion?oldid=…,
References 18 is “(Pennsylvania Department of Health).
http://books.google.com/books?id=swkTAAAAYAAJ&pg=PA72PA72. Retrieved
2011-02-06.”. But in
http://en.wikipedia.org/w/index.php?title=1913_Gettysburg_reunion&oldid…1478,
it’s “(Pennsylvania Department of Health). Retrieved 2011-02-06.”
Looks like some differences in Cite template processing. We'll
investigate and file a bug.
2. The first external link in
http://en.wikipedia.org/w/index.php?title=...From_the_Hungry_i&oldid=55…
is “The Kingston Trio Liner Notes album entry.”, but in
http://parsoid-lb.eqiad.wikimedia.org/enwiki/...From_the_Hungry_i?oldid=555…
it’s
“[http://www.lazyka.com/linernotes/trio_01(Guard,Rynolds,Shane)/recrdngs/LP_T1107.htm#.%20.%20.%20From%20the%20hungry%20i
<http://www.lazyka.com/linernotes/trio_01%28Guard,Rynolds,Shane%29/recrdngs/LP_T1107.htm#.%20.%20.%20From%20the%20hungry%20i>:
The Kingston Trio Liner Notes album entry.]”. It’s an obvious bug.
We will investigate and file a bug.
Bug 53139 (
https://bugzilla.wikimedia.org/show_bug.cgi?id=53139) --
duplicates (53927, 57266). We'll probably have to tackle this sooner
than later.
Related to Bug 53139.
There are various image parsing bugs in bugzilla (that Marc pasted the
url for in an earlier email) that we haven't gotten to fixing yet.
We'll investigate and file a bug.
Parsoid doesn't generate Table of contents or edit links yet. We may not
generate edit links in Parsoid and may rely on JS for rendering them. As
for the latter two, we are thinking of dealing with wiki-specific styles
by relying on CSS/JS rather than generating different HTML for different
rendering styles so core Parsoid code is not cluttered with these
stylistic differences which are really core parse output issues.
5. The voice playing component may be different
between
http://en.wikipedia.org/w/index.php?title=%C3%89tincelles_(Moszkowski)&…
<http://en.wikipedia.org/w/index.php?title=%C3%89tincelles_%28Moszkowski%29&oldid=555997335>
(See Problems playing this file?) and
http://parsoid-lb.eqiad.wikimedia.org/enwiki/%C3%89tincelles_(Moszkowski)?o…
<http://parsoid-lb.eqiad.wikimedia.org/enwiki/%C3%89tincelles_%28Moszkowski%29?oldid=555997335>.
I haven't looked closely, but this could be Bug 49896
(
https://bugzilla.wikimedia.org/show_bug.cgi?id=49896) and is one our
list of things to fix.
Subbu.