The Wikimedia Phabricator team needs help from someone familiar with PERL.
The Bugzilla API has a bug, which we tried to fix with a patch, but now that patch creates another problem. Now we either break comments or binary attachments. The details:
Upstream Bugzilla XML-RPC API issue creates invalid XML https://phabricator.wikimedia.org/T815
Your help is welcome! It doesn't seem to be too complicated. The task doesn't require any background on Phabricator or Bugzilla.
Isn't Marc expert? :P
I will have a look as well...
On Fri, Oct 24, 2014 at 10:55 PM, Quim Gil qgil@wikimedia.org wrote:
The Wikimedia Phabricator team needs help from someone familiar with PERL.
The Bugzilla API has a bug, which we tried to fix with a patch, but now that patch creates another problem. Now we either break comments or binary attachments. The details:
Upstream Bugzilla XML-RPC API issue creates invalid XML https://phabricator.wikimedia.org/T815
Your help is welcome! It doesn't seem to be too complicated. The task doesn't require any background on Phabricator or Bugzilla.
-- Quim Gil Engineering Community Manager @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
After short investigation the answer is pretty straight forward and explained in https://bugzilla.mozilla.org/show_bug.cgi?id=839023
quoting:
U+0000-U+001F are illegal in HTML 4.0 and XML 1.0 (except the characters HR, LF and CR). And it's not permitted to use named character references such as  either (although it is permitted in XML 1.1, except for NUL): http://www.w3.org/International/questions/qa-controls
possible fixes:
* Run SQL query that find and replace these characters * Patch bugzilla so that it replace them during xml conversion
Inside Bugzilla/WebService/Server/XMLRPC.pm, in _strip_undefs, at the end of the function (around line 250):
if (ref $initial eq '') { $initial =~ s/([\x01-\x08\x0b\x0c\x0f-\x1f])/sprintf "\x%02x", ord($1)/ge; }
should do the trick but that, indeed, damages some binaries. Do we actually want to export them? Because XML is not a good format for exports of binary files as it doesn't allow some characters. What about getting the out using some SQL query? Why do we even need to use XML? Is it only way to import to phab?
On Fri, Oct 24, 2014 at 11:12 PM, Petr Bena benapetr@gmail.com wrote:
Isn't Marc expert? :P
I will have a look as well...
On Fri, Oct 24, 2014 at 10:55 PM, Quim Gil qgil@wikimedia.org wrote:
The Wikimedia Phabricator team needs help from someone familiar with PERL.
The Bugzilla API has a bug, which we tried to fix with a patch, but now that patch creates another problem. Now we either break comments or binary attachments. The details:
Upstream Bugzilla XML-RPC API issue creates invalid XML https://phabricator.wikimedia.org/T815
Your help is welcome! It doesn't seem to be too complicated. The task doesn't require any background on Phabricator or Bugzilla.
-- Quim Gil Engineering Community Manager @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org