Hi Oliver,
On Sun, Feb 22, 2015 at 06:46:37PM -0500, Oliver Keyes wrote:
And, an additional point; I don't understand why, if dupes is the problem, the Hive query was not hit as badly by this as the equivalent UDF.
just shooting in the dark, since you did not provide your query, but if you by accident had been querying the
wmf_raw.webrequest
(database name ending in “_raw”) table instead of
wmf.webrequest
(no “_raw” in the database name), the difference you described would be plausible (and given the patching of GHOST, they'd even be expected).
Have fun, Christian