On Tuesday, January 22, 2013 at 2:28 AM, David Schoonover wrote:
Yeah, irregularities like that are obviously an issue.
I believe the inclusion of the byte-offset at all (and thus, the tab character) is an
artifact of the Kafka2Hadoop importer; it's certainly not intended to be included in
the files at all. The use of semicolon+space in extended headers like "Content-Type:
text/plain; charset=utf8;" is in-spec, but the edge should obviously be escaping the
space.
I filed a bug to track this issue here:
https://bugzilla.wikimedia.org/show_bug.cgi?id=44236
--
Ori Livneh