Thank you, Dan, and everyone else who's been
involved in fixing this.
Dan
On 15 September 2015 at 19:23, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
I confirmed this on IRC, but just feeding the
archives here. I'm also
convinced that the client IP hashing bug we just found explains this
problem. It's good we took a look at the other problems, but the main one
seems the IP hashing. We'll brain bounce more tomorrow on how to fix that.
On Tue, Sep 15, 2015 at 6:23 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
> Update; I read Dan's thread about hashing, read this thread, and a
> penny dropped ;).
>
> This is totally explainable by the fact that we /expect/ to see
> multiple pageIDs per IP. And we are! The hashing problem just means
> those aren't /appearing/ to be the same IP.
>
> On 15 September 2015 at 18:05, Erik Bernhardson
> <ebernhardson(a)wikimedia.org> wrote:
> > We've deployed the change to bucketing, but we are still seeing the
> same
> > issue in the collected data.
> >
> > Again we are generating a unique 64 bit random number when the user
> gets to
> > the page. We are seeing this same 64 bit unique number being
> reported by
> > multiple ip addresses.
> >
> > Since deploying the new schema number with the updated bucket
> selection we
> > have seen 13 distinct tokens coming from 42 distinct ip addresses.
> This
> > shouldn't be possible.
> >
> > mysql:research@analytics-store.eqiad.wmnet [log]> select
> count(distinct
> > clientIp) from CompletionSugges
> > tions_13630018;
> > +--------------------------+
> > | count(distinct clientIp) |
> > +--------------------------+
> > | 42 |
> > +--------------------------+
> > 1 row in set (0.00 sec)
> >
> > mysql:research@analytics-store.eqiad.wmnet [log]> select
> count(distinct
> > event_pageViewToken) from CompletionSuggestions_13630018;
> >
> > +-------------------------------------+
> > | count(distinct event_pageViewToken) |
> > +-------------------------------------+
> > | 13 |
> > +-------------------------------------+
> > 1 row in set (0.00 sec)
> >
> >
> >
> > My best guess at this point is that something has changed in the way
> these
> > clientIp's are collected and is incorrect.
> >
> >
> > On Mon, Sep 14, 2015 at 1:32 PM, Erik Bernhardson
> > <ebernhardson(a)wikimedia.org> wrote:
> >>
> >> Thanks for taking a look over this. I've incorperated your
> suggestions
> >> into a patch[1] and if all looks good will send that out in SWAT.
> We should
> >> be able to look at the data collected overnight and see if things
> are more
> >> sane tomorrow.
> >>
> >> [1]
https://gerrit.wikimedia.org/r/#/c/238306/
> >>
> >> On Mon, Sep 14, 2015 at 11:56 AM, Gergo Tisza <gtisza(a)wikimedia.org
> >
> >> wrote:
> >>>
> >>> You are queueing a logging callback every time a request is sent
> (which
> >>> is roughly every time the user types another character in the
> search box)
> >>> until the tracking module finishes loading and
> mw.searchSuggest.request is
> >>> restored. On a slow connection the user might type several
> characters and
> >>> trigger several log events by then. If you filter for queries from
> the same
> >>> non-unique IP, you will probably see something like "a",
"ab",
> "abc"...
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics(a)lists.wikimedia.org
> >>>
https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics(a)lists.wikimedia.org
> >
https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org