Hi Andrew,
On Mon, Feb 23, 2015 at 09:35:48AM -0500, Andrew Otto wrote:
https://gerrit.wikimedia.org/r/#/c/177522/ https://gerrit.wikimedia.org/r/#/c/177522/
Seeing as this was merged on Jan 26, it is possible that it was not deployed when on Jan 27 when Oliver is noticing duplicates.
That should not be the case.
Back when you decided that deduplication should happen during refining from wmf_raw.webrequest to wmf.webrequest, and the above change got implemented, all of 2015 got deduped and backfilled on wmf.webrequest.
So all of 2015 in wmf.webrequest is deduped (with the known limitations).
Have fun, Christian
P.S.: And all the wmf.webrequest based jobs from
https://commons.wikimedia.org/w/index.php?title=File:Refinery-oozie-overview...
that exist for 2015 got re-run on this deduped data too.
So no dupes for the corresponding legacy tsvs, pagecounts-all-sites, ...