We should address automatic duplicate cleaning very soon, as Christian warned a while ago. He manually cleaned up duplicates a few times but we know it's a problem that needs solving.
On Mon, Feb 23, 2015 at 6:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Oliver,
On Sun, Feb 22, 2015 at 06:46:37PM -0500, Oliver Keyes wrote:
And, an additional point; I don't understand why, if dupes is the problem, the Hive query was not hit as badly by this as the equivalent UDF.
just shooting in the dark, since you did not provide your query, but if you by accident had been querying the
wmf_raw.webrequest
(database name ending in “_raw”) table instead of
wmf.webrequest
(no “_raw” in the database name), the difference you described would be plausible (and given the patching of GHOST, they'd even be expected).
Have fun, Christian
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics