Hi,
In the last few days I've been looking for reasons for the appearance of unnecessary <nowiki> tags. This mostly happens because of various VisualEditor and Parsoid issues. The developers have been very good at fixing them, and now it happens very rarely, but there are still lots of these useless tags lurking in pages.
Two examples are: * '''<nowiki/>''' - this doesn't do anything at all. I couldn't reproduce it in any way, so it's probably a bug that was fixed. * <nowiki> </nowiki> in the beginning of a paragraph. This was added in the past to avoid putting the paragraph in <pre>, but it's entirely useless, because the spaces are trimmed. Now they are pre-trimmed, so this is also a fixed bug, but a lot of pages still have it.
There may be more - I'm still looking for these.
It would be easy to write bots to fix such easy common cases, but they would have to run on every project. Would it make sense to write them as maintenance scripts that update them everywhere when people upgrade VE?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Starting by replacing <nowiki>\s<nowiki> would be safe enough imho.
Vito
2015-06-19 10:38 GMT+02:00 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Hi,
In the last few days I've been looking for reasons for the appearance of unnecessary <nowiki> tags. This mostly happens because of various VisualEditor and Parsoid issues. The developers have been very good at fixing them, and now it happens very rarely, but there are still lots of these useless tags lurking in pages.
Two examples are:
- '''<nowiki/>''' - this doesn't do anything at all. I couldn't reproduce
it in any way, so it's probably a bug that was fixed.
- <nowiki> </nowiki> in the beginning of a paragraph. This was added in the
past to avoid putting the paragraph in <pre>, but it's entirely useless, because the spaces are trimmed. Now they are pre-trimmed, so this is also a fixed bug, but a lot of pages still have it.
There may be more - I'm still looking for these.
It would be easy to write bots to fix such easy common cases, but they would have to run on every project. Would it make sense to write them as maintenance scripts that update them everywhere when people upgrade VE?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Jun 19, 2015 at 4:38 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
- '''<nowiki/>''' - this doesn't do anything at all. I couldn't reproduce
it in any way, so it's probably a bug that was fixed.
Beware that in some cases, <nowiki/> is doing something important. For example, [<nowiki/>[[sic]]] produces a wikilink to "sic" inside of square brackets. Removing it there will break the link.
Here is another practical example where <nowiki/> may be required: https://www.mediawiki.org/wiki/Extension:Arrays#Using_arrayprint
DanB
I still see too many pointless <nowiki> in https://it.wikipedia.org/w/index.php?title=Speciale:UltimeModifiche&tagf...
I'm currently cleaning up [^']<nowiki>(\s{,8})</nowiki> (more than 8 spaces means there could be something terribly wrong), using June 2th dump. When finished I hope I'll be able to estimate whatever fixes did the trick.
Vito
2015-06-19 19:21 GMT+02:00 Daniel Barrett danb@cimpress.com:
Here is another practical example where <nowiki/> may be required: https://www.mediawiki.org/wiki/Extension:Arrays#Using_arrayprint
DanB
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, 19 Jun 2015 10:38:01 +0200, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
- '''<nowiki/>''' - this doesn't do anything at all. I couldn't reproduce
it in any way, so it's probably a bug that was fixed.
This sounds very much like https://phabricator.wikimedia.org/T95730 which was fixed recently.
It would be easy to write bots to fix such easy common cases, but they would have to run on every project. Would it make sense to write them as maintenance scripts that update them everywhere when people upgrade VE?
It should be easier to both write and run these as on-wiki bots, and I doubt anyone other than Wikimedia has used VisualEditor enough to need such cleanup. I doubt that you'd have problems just running the bot on all projects, as very few of them seek out and ban helpful automated editors who do not jump through the necessary hoops; English Wikipedia, sadly, is one of these. You could always get a global bot flag: https://meta.wikimedia.org/wiki/Steward_requests/Bot_status
https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=insou...
Just be careful with the replacements, as Dan and Jack raise good points about weird syntax sometimes being intentional. Happy botting!
On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote:
There may be more - I'm still looking for these.
If you find any, please propose them on the Parsoid’s normalization talk page [0]. I’ve added the ones you’ve mentioned so far.
We’ve documented [1] what’s currently been implemented.
A few months back, Subbu solicited feedback [2] on what style norms should be enforced. We’ve since added a `scrubWikitext` parameter to Parsoid’s API that clients (like VE) can benefit from.
Cleaning up our past transgressions is great. Helping to prevent their continued existence is even better.
I was reading the discussion on gradually enabling VE for new accounts [3] and Kww writes there,
"Further, we still have issues with stray nowiki tags being scattered across articles. Until those are addressed, the notion that VE doesn't cause extra work for experienced editors is simply a sign that the metrics used to analyze effort were wrong. Jdforrester, can you explain how a study that was intended to measure whether VE caused extra work failed to note that even with the current limited use, it corrupts articles at this kind of volume [4]? Why would we want to encourage such a thing?”
Makes me sad.
[0] https://www.mediawiki.org/wiki/Talk:Parsoid/Normalizations [1] https://www.mediawiki.org/wiki/Parsoid/Normalizations [2] https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081453.html [3] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#Gradual... [4] https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&offset=&...
On 06/20/2015 11:45 AM, Arlo Breault wrote:
On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote:
There may be more - I'm still looking for these.
I was reading the discussion on gradually enabling VE for new accounts [3] and Kww writes there,
"Further, we still have issues with stray nowiki tags being scattered across articles. Until those are addressed, the notion that VE doesn't cause extra work for experienced editors is simply a sign that the metrics used to analyze effort were wrong. Jdforrester, can you explain how a study that was intended to measure whether VE caused extra work failed to note that even with the current limited use, it corrupts articles at this kind of volume [4]? Why would we want to encourage such a thing?”
Makes me sad.
User:Whatamidoing (WMF) (Sherry Snyder) has noted there that User:Kww might have been confused by the fact that the filter includes nowikis added by editors during normal wikitext editing ( see https://phabricator.wikimedia.org/T53421 ). In reality, the number of nowikis from VE edits are a minor fraction of the nowiki entries there (which I verified a couple days back by clicking through to the diffs in the AbuseLog), and the couple of sources of those nowiki insertions might have already been fixed as noted earlier in this thread.
All that said, yes, we do want to and will continue fixing any sources of nowikis that Parsoid is introducing.
Subbu.
Subramanya Sastry schreef op 2015/06/20 om 10:49:
On 06/20/2015 11:45 AM, Arlo Breault wrote:
On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote:
There may be more - I'm still looking for these.
I was reading the discussion on gradually enabling VE for new accounts [3] and Kww writes there,
"Further, we still have issues with stray nowiki tags being scattered across articles. Until those are addressed, the notion that VE doesn't cause extra work for experienced editors is simply a sign that the metrics used to analyze effort were wrong. Jdforrester, can you explain how a study that was intended to measure whether VE caused extra work failed to note that even with the current limited use, it corrupts articles at this kind of volume [4]? Why would we want to encourage such a thing?”
Makes me sad.
User:Whatamidoing (WMF) (Sherry Snyder) has noted there that User:Kww might have been confused by the fact that the filter includes nowikis added by editors during normal wikitext editing ( see https://phabricator.wikimedia.org/T53421 ). In reality, the number of nowikis from VE edits are a minor fraction of the nowiki entries there (which I verified a couple days back by clicking through to the diffs in the AbuseLog), and the couple of sources of those nowiki insertions might have already been fixed as noted earlier in this thread.
All that said, yes, we do want to and will continue fixing any sources of nowikis that Parsoid is introducing.
Last I looked, the edits from Visual Editor were still a trivially small percentage of edits, much smaller than the "minor fraction" noted here. Where is the actual percentage tracked these days? KWW
On Sat, Jun 20, 2015 at 12:18 PM, Kevin Wayne Williams < kwwilliams@kwwilliams.com> wrote:
Subramanya Sastry schreef op 2015/06/20 om 10:49:
On 06/20/2015 11:45 AM, Arlo Breault wrote:
On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote:
There may be more - I'm still looking for these.
I was reading the discussion on gradually enabling VE for new accounts [3] and Kww writes there,
"Further, we still have issues with stray nowiki tags being scattered across articles. Until those are addressed, the notion that VE doesn't cause extra work for experienced editors is simply a sign that the metrics used to analyze effort were wrong. Jdforrester, can you explain how a study that was intended to measure whether VE caused extra work failed to note that even with the current limited use, it corrupts articles at this kind of volume [4]? Why would we want to encourage such a thing?”
Makes me sad.
User:Whatamidoing (WMF) (Sherry Snyder) has noted there that User:Kww might have been confused by the fact that the filter includes nowikis added by editors during normal wikitext editing ( see https://phabricator.wikimedia.org/T53421 ). In reality, the number of nowikis from VE edits are a minor fraction of the nowiki entries there (which I verified a couple days back by clicking through to the diffs in the AbuseLog), and the couple of sources of those nowiki insertions might have already been fixed as noted earlier in this thread.
All that said, yes, we do want to and will continue fixing any sources of nowikis that Parsoid is introducing.
Last I looked, the edits from Visual Editor were still a trivially small percentage of edits, much smaller than the "minor fraction" noted here. Where is the actual percentage tracked these days? KWW
Hi Kww.
Re: the metrics are at pages such as http://ee-dashboard.wmflabs.org/dashboards/enwiki-metrics for Enwiki, and similar for some of the other larger Wikipedias (De, Es, Fr, He, It, Nl, Pl, Pt, Ru, Sv).
Re: the various types of nowiki: As far as I know, the quickest way to find out how many nowiki tags are introduced by VE is actually checking the log of recent edits made with VE,[1] and looking for the nowiki tag on it. However, this still doesn’t tell us whether the nowiki tag is the result of a bug, if it’s a relic of a nowiki bug which is actually fixed but still occurring due to cache issues (e.g. phab:T68628), if it was introduced manually by the user either by typing it (ignoring a warning) or by copy/pasting wikitext in VE, or whether it was correctly added [2], etc.
[1] https://en.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&a...
[2] See e.g. https://en.wikipedia.org/w/index.php?title=Beauty_and_the_Beast_%28Disney_so...
Hope that helps, Quiddity / Nick
Thanks Arlo. I added a few.
But I'm not sure that it answers my original question: Will this be done every time a page happens to edited in VE and saved or will it be done globally on all pages in all wikis as some kind of a maintenance job?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-06-20 19:45 GMT+03:00 Arlo Breault abreault@wikimedia.org:
On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote:
There may be more - I'm still looking for these.
If you find any, please propose them on the Parsoid’s normalization talk page [0]. I’ve added the ones you’ve mentioned so far.
We’ve documented [1] what’s currently been implemented.
A few months back, Subbu solicited feedback [2] on what style norms should be enforced. We’ve since added a `scrubWikitext` parameter to Parsoid’s API that clients (like VE) can benefit from.
Cleaning up our past transgressions is great. Helping to prevent their continued existence is even better.
I was reading the discussion on gradually enabling VE for new accounts [3] and Kww writes there,
"Further, we still have issues with stray nowiki tags being scattered across articles. Until those are addressed, the notion that VE doesn't cause extra work for experienced editors is simply a sign that the metrics used to analyze effort were wrong. Jdforrester, can you explain how a study that was intended to measure whether VE caused extra work failed to note that even with the current limited use, it corrupts articles at this kind of volume [4]? Why would we want to encourage such a thing?”
Makes me sad.
[0] https://www.mediawiki.org/wiki/Talk:Parsoid/Normalizations [1] https://www.mediawiki.org/wiki/Parsoid/Normalizations [2] https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081453.html [3] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#Gradual... [4] https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&offset=&...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That would be nice to have a global cleanup at some point, but it won't be able to handle every situation. I don't think relying on VE to clean up is good:
- First, it will take a long time before all articles are edited with VE (maybe never) - Second, I'm not a big fan of VE changing wikitext in parts not modified by the user: experience shows that it messes the diffs, and makes watching what VE is doing a lot more difficult. It has been requested several times that VE doesn't start modifying wikitext in places not modified by the user.
Things that are probably safe to fix automatically:
- Whitespace characters between nowiki tags at the beginning of a line: remove everything including the whitespace characters. - Whitespace characters between nowiki tags not at the beginning of a line: remove the tags, keep the whitespace characters. - Some characters (letters, digits, ...) between nowiki tags: remove the tags, keep the characters - In a table, cell content with only a dash between nowiki: remove tha tags, add a whitespace characters before the dash
<nowiki /> are more difficult to fix automatically I think:
- Between quotes: allows to mix a real quote with italics formatting - After the end of a wikilink:prevents the wikilink to extend to the text (often an error due to a bug in VE, but sometimes it may be normal) - ...
Nico
On Sun, Jun 21, 2015 at 8:43 PM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
Thanks Arlo. I added a few.
But I'm not sure that it answers my original question: Will this be done every time a page happens to edited in VE and saved or will it be done globally on all pages in all wikis as some kind of a maintenance job?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-06-20 19:45 GMT+03:00 Arlo Breault abreault@wikimedia.org:
On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote:
There may be more - I'm still looking for these.
If you find any, please propose them on the Parsoid’s normalization talk page [0]. I’ve added the ones you’ve mentioned so far.
We’ve documented [1] what’s currently been implemented.
A few months back, Subbu solicited feedback [2] on what style norms
should
be enforced. We’ve since added a `scrubWikitext` parameter to Parsoid’s
API
that clients (like VE) can benefit from.
Cleaning up our past transgressions is great. Helping to prevent their continued existence is even better.
I was reading the discussion on gradually enabling VE for new accounts
[3]
and Kww writes there,
"Further, we still have issues with stray nowiki tags being scattered across articles. Until those are addressed, the notion that VE doesn't cause extra work
for
experienced editors is simply a sign that the metrics used to analyze effort were wrong. Jdforrester, can you explain how a study that was intended to measure whether VE caused extra work failed to note that even with the current limited use, it corrupts articles at this kind of volume [4]? Why would we want to encourage such a thing?”
Makes me sad.
[0] https://www.mediawiki.org/wiki/Talk:Parsoid/Normalizations [1] https://www.mediawiki.org/wiki/Parsoid/Normalizations [2] https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081453.html [3]
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#Gradual...
[4]
https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&offset=&...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, Jun 22, 2015 at 11:14 AM, Nicolas Vervelle nvervelle@gmail.com wrote:
- Second, I'm not a big fan of VE changing wikitext in parts not
modified by the user: experience shows that it messes the diffs, and makes watching what VE is doing a lot more difficult. It has been requested several times that VE doesn't start modifying wikitext in places not modified by the user.
In case it wasn't clear, this is already the case. Parsoid/VE uses "selective serialization" to avoid touching unmodified content. This feature has been present since the beginning. --scott
On Tue, Jun 30, 2015 at 10:31 PM, C. Scott Ananian cananian@wikimedia.org wrote:
On Mon, Jun 22, 2015 at 11:14 AM, Nicolas Vervelle nvervelle@gmail.com wrote:
- Second, I'm not a big fan of VE changing wikitext in parts not
modified by the user: experience shows that it messes the diffs, and makes watching what VE is doing a lot more difficult. It has been requested several times that VE doesn't start modifying wikitext in places not modified by the user.
In case it wasn't clear, this is already the case. Parsoid/VE uses "selective serialization" to avoid touching unmodified content. This feature has been present since the beginning.
Yes, I'm aware of that, but I was answering this because it was suggested previously in the discussion to use VE to do the cleanup...
Nico
Recently a little bird told me "Main roundtrip quality target achieved" for the Parsoid, having >99.95% percentage of clean roundtrip. Given this information, I would expect we can use the Parsoid to "cleanup" its own (previous) mess based on lots of bug fixes done during the time. Even if it can't do it, the process of doing this with parsoid is a kind of verification to bug fixes. (can we get to 99.95% percentage clean roundtrip for such cases?)
Sometimes the Parsoid doesn't have to deal with its own mess, but in this case maybe it is good idea to attach to a bug fix also a maintaince script to fix previous issues (similar to the requirement of attaching a unittest), rather than writing bots that work in specific wikis as the problems arise in many wikis, and it requires other devs time to understand the bug and come up with their own magic regex to fix issue which may not be fully compatible with the fix.
On Tue, Jun 30, 2015 at 11:55 PM, Nicolas Vervelle nvervelle@gmail.com wrote:
On Tue, Jun 30, 2015 at 10:31 PM, C. Scott Ananian <cananian@wikimedia.org
wrote:
On Mon, Jun 22, 2015 at 11:14 AM, Nicolas Vervelle nvervelle@gmail.com wrote:
- Second, I'm not a big fan of VE changing wikitext in parts not
modified by the user: experience shows that it messes the diffs, and makes watching what VE is doing a lot more difficult. It has been
requested
several times that VE doesn't start modifying wikitext in places not modified by the user.
In case it wasn't clear, this is already the case. Parsoid/VE uses "selective serialization" to avoid touching unmodified content. This feature has been present since the beginning.
Yes, I'm aware of that, but I was answering this because it was suggested previously in the discussion to use VE to do the cleanup...
Nico _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sunday, June 21, 2015 at 11:43 AM, Amir E. Aharoni wrote:
Thanks Arlo. I added a few.
But I'm not sure that it answers my original question: Will this be done every time a page happens to edited in VE and saved or will it be done globally on all pages in all wikis as some kind of a maintenance job?
Oh, sorry if I wasn’t clear.
The normalizations we’re adding will be applied to new content added through VE. A global cleanup of all the past unnecessary nowiki’ing is still necessary and desired.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-06-20 19:45 GMT+03:00 Arlo Breault <abreault@wikimedia.org (mailto:abreault@wikimedia.org)>:
On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote:
There may be more - I'm still looking for these.
If you find any, please propose them on the Parsoid’s normalization talk page [0]. I’ve added the ones you’ve mentioned so far.
We’ve documented [1] what’s currently been implemented.
A few months back, Subbu solicited feedback [2] on what style norms should be enforced. We’ve since added a `scrubWikitext` parameter to Parsoid’s API that clients (like VE) can benefit from.
Cleaning up our past transgressions is great. Helping to prevent their continued existence is even better.
I was reading the discussion on gradually enabling VE for new accounts [3] and Kww writes there,
"Further, we still have issues with stray nowiki tags being scattered across articles. Until those are addressed, the notion that VE doesn't cause extra work for experienced editors is simply a sign that the metrics used to analyze effort were wrong. Jdforrester, can you explain how a study that was intended to measure whether VE caused extra work failed to note that even with the current limited use, it corrupts articles at this kind of volume [4]? Why would we want to encourage such a thing?”
Makes me sad.
[0] https://www.mediawiki.org/wiki/Talk:Parsoid/Normalizations [1] https://www.mediawiki.org/wiki/Parsoid/Normalizations [2] https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081453.html [3] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#Gradual... [4] https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&offset=&...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org (mailto:Wikitech-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org (mailto:Wikitech-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2015-06-30 22:16 GMT+03:00 Arlo Breault abreault@wikimedia.org:
On Sunday, June 21, 2015 at 11:43 AM, Amir E. Aharoni wrote:
Thanks Arlo. I added a few.
But I'm not sure that it answers my original question: Will this be done every time a page happens to edited in VE and saved or will it be done globally on all pages in all wikis as some kind of a maintenance job?
Oh, sorry if I wasn’t clear.
The normalizations we’re adding will be applied to new content added through VE. A global cleanup of all the past unnecessary nowiki’ing is still necessary and desired.
Good to know. Is it something that could happen this year or a kind of a faraway dream? Is there a Phab ticket about it?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
wikitech-l@lists.wikimedia.org