FYI, we yesterday we deployed a fix that makes eventlogging look up the hash salt in etcd.  This allows multiple processes to hash IPs against the same salt.  So, now, hashing remains consistent between parallelized server side and client side processors, and is also consistent across eventlogging restarts.

This morning, I re provisioned 12 parallel client side processors.  A preliminary check of IP hash frequencies looks like everything is working.  Let us know if you see any problems!

Thanks!
-Ao


On Wed, Sep 16, 2015 at 9:40 AM, Dan Andreescu <dandreescu@wikimedia.org> wrote:
Andrew just deployed the change to go back to a single eventlogging processor.  So as of right now-ish, IPs should be hashed consistently.  Going forward, we'll only add parallel processors when we can ensure consistent hashing across them.

On Wed, Sep 16, 2015 at 10:53 AM, Andrew Otto <otto@wikimedia.org> wrote:
How urgent is this?  An easy fix right now would be to turn of the parallelized processors and run just one.  We haven't yet increased traffic, so we can run eventlogging like we were before, with only one processor.  If not urgent, we will implement a proper fix.  If urgent, it is easy for us to fix in the short term.

On Tue, Sep 15, 2015 at 11:51 PM, Dan Garry <dgarry@wikimedia.org> wrote:
Thank you, Dan, and everyone else who's been involved in fixing this.

Dan

On 15 September 2015 at 19:23, Dan Andreescu <dandreescu@wikimedia.org> wrote:
I confirmed this on IRC, but just feeding the archives here.  I'm also convinced that the client IP hashing bug we just found explains this problem.  It's good we took a look at the other problems, but the main one seems the IP hashing.  We'll brain bounce more tomorrow on how to fix that.

On Tue, Sep 15, 2015 at 6:23 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Update; I read Dan's thread about hashing, read this thread, and a
penny dropped ;).

This is totally explainable by the fact that we /expect/ to see
multiple pageIDs per IP. And we are! The hashing problem just means
those aren't /appearing/ to be the same IP.

On 15 September 2015 at 18:05, Erik Bernhardson
<ebernhardson@wikimedia.org> wrote:
> We've deployed the change to bucketing, but we are still seeing the same
> issue in the collected data.
>
> Again we are generating a unique 64 bit random number when the user gets to
> the page. We are seeing this same 64 bit unique number being reported by
> multiple ip addresses.
>
> Since deploying the new schema number with the updated bucket selection we
> have seen 13 distinct tokens coming from 42 distinct ip addresses. This
> shouldn't be possible.
>
> mysql:research@analytics-store.eqiad.wmnet [log]> select count(distinct
> clientIp) from CompletionSugges
> tions_13630018;
> +--------------------------+
> | count(distinct clientIp) |
> +--------------------------+
> |                       42 |
> +--------------------------+
> 1 row in set (0.00 sec)
>
> mysql:research@analytics-store.eqiad.wmnet [log]> select count(distinct
> event_pageViewToken) from CompletionSuggestions_13630018;
>
> +-------------------------------------+
> | count(distinct event_pageViewToken) |
> +-------------------------------------+
> |                                  13 |
> +-------------------------------------+
> 1 row in set (0.00 sec)
>
>
>
> My best guess at this point is that something has changed in the way these
> clientIp's are collected and is incorrect.
>
>
> On Mon, Sep 14, 2015 at 1:32 PM, Erik Bernhardson
> <ebernhardson@wikimedia.org> wrote:
>>
>> Thanks for taking a look over this. I've incorperated your suggestions
>> into a patch[1] and if all looks good will send that out in SWAT. We should
>> be able to look at the data collected overnight and see if things are more
>> sane tomorrow.
>>
>> [1] https://gerrit.wikimedia.org/r/#/c/238306/
>>
>> On Mon, Sep 14, 2015 at 11:56 AM, Gergo Tisza <gtisza@wikimedia.org>
>> wrote:
>>>
>>> You are queueing a logging callback every time a request is sent (which
>>> is roughly every time the user types another character in the search box)
>>> until the tracking module finishes loading and mw.searchSuggest.request is
>>> restored. On a slow connection the user might type several characters and
>>> trigger several log events by then. If you filter for queries from the same
>>> non-unique IP, you will probably see something like "a", "ab", "abc"...
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics