https://bugzilla.wikimedia.org/show_bug.cgi?id=70607
Bug ID: 70607 Summary: replace.py does not work Product: Pywikibot Version: core (2.0) Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: Other scripts Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: jan.dudik@gmail.com Web browser: --- Mobile Platform: ---
In compat: replace.py -regex -nocase -file:aa.log "==\s*Externí odkazy(.*?)\r\n{{Commonscat" "== Externí odkazy\1\n* {{Commonscat" -summary:"řádková verze {{Commonscat}}"
Getting 60 pages from wikipedia:cs... ... No changes were necessary in [[Roman Polák (lední hokejista)]]
Roman Polanski <<<
- {{Commonscat|Roman Polanski}} + * {{Commonscat|Roman Polanski}}
In core, the same command: pwb.py replace -regex -nocase -file:aa.log "==\s*Externí odkazy(. *?)\r\n{{Commonscat" "== Externí odkazy\1\n* {{Commonscat" -summary:"řádková verze {{Commonscat}}"
Retrieving 50 pages from wikipedia:cs. ... No changes were necessary in [[Roman Polanski]] No changes were necessary in [[Roman Polák (lední hokejista)]] No changes were necessary in [[Roman Romaněnko]]
Why?
https://bugzilla.wikimedia.org/show_bug.cgi?id=70607
--- Comment #1 from JAn Dudík jan.dudik@gmail.com --- After some testing - core does not recognize \r\n, but only \n
https://bugzilla.wikimedia.org/show_bug.cgi?id=70607
JAn Dudík jan.dudik@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|replace.py does not work |replace.py does not | |recognize "\r\n" pattern
https://bugzilla.wikimedia.org/show_bug.cgi?id=70607
Merlijn van Deen valhallasw@arctus.nl changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|Unprioritized |Lowest CC| |valhallasw@arctus.nl Severity|normal |enhancement
--- Comment #2 from Merlijn van Deen valhallasw@arctus.nl --- There is a bug in Compat's PreloadingpageGenerator which makes it return page content (incorrectly) with '\r\n' instead of '\n'. compat's page.get() /does/ return '\n' by default.
I think using \n makes much more sense (and note that this works for both \n *and* \r\n due to python's universal newlines system), so I'm not even sure whether we should support \r\n at all.
Marking it as low-priority feature request for now.
https://bugzilla.wikimedia.org/show_bug.cgi?id=70607
Fabian CommodoreFabianus@gmx.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |CommodoreFabianus@gmx.de
--- Comment #3 from Fabian CommodoreFabianus@gmx.de --- Is that a problem of the bot then? Shouldn't it suffice to edit the regex (and if you want to be sure you could use (?:\r|\r\n|\n) instead of exactly \r\n.
https://bugzilla.wikimedia.org/show_bug.cgi?id=70607
--- Comment #4 from Merlijn van Deen valhallasw@arctus.nl --- Okay, I'm a bit confused about the newlines now, as
re.match(r'\n', '\r\n')
does not work. However,
python replace.py -lang:cs -regex -nocase -page:"Roman Polanski" '==\s*Externí odkazy(.*?)\n{{Commonscat' '== Externí odkazy\1\n* {{Commonscat' -summary:"řádkováverze {{Commonscat}}"
*did* work in compat (i.e. the variant without \r in it). I'm not sure why, though.
https://bugzilla.wikimedia.org/show_bug.cgi?id=70607
xqt info@gno.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |info@gno.de Resolution|--- |WORKSFORME
--- Comment #5 from xqt info@gno.de --- compat retrieves \r\n as linefeed via special export whereas core always get \n. See also config.line_separator variable.
You may use \r?\n for the regex for both framework branches.
pywikipedia-bugs@lists.wikimedia.org