On Tuesday, January 22, 2013 at 2:28 AM, David Schoonover wrote:
Yeah, irregularities like that are obviously an issue. I believe the inclusion of the byte-offset at all (and thus, the tab character) is an artifact of the Kafka2Hadoop importer; it's certainly not intended to be included in the files at all. The use of semicolon+space in extended headers like "Content-Type: text/plain; charset=utf8;" is in-spec, but the edge should obviously be escaping the space.
I filed a bug to track this issue here: https://bugzilla.wikimedia.org/show_bug.cgi?id=44236
-- Ori Livneh