Schema:Edit contains no useful information that isn't already in the database apart from which button people use to thank each other,

I assume you mean Schema:Echo? :)

On Tue, Mar 1, 2016 at 11:58 PM, Roan Kattouw <rkattouw@wikimedia.org> wrote:
[Reviving old thread]

I was looking at our EventLogging data today, and discovered that Schema:Edit contains no useful information that isn't already in the database apart from which button people use to thank each other, and if we really care about that we can measure it separately without producing nine gigs of unused data.

Feel free to delete the data associated with Schema:Echo (but not Schema:EchoInteraction! We do use that one) with extreme prejudice. I've also written a config patch to stop us from producing these events ( https://gerrit.wikimedia.org/r/#/c/274345/ ) which I will deploy in the SWAT on Thursday.

I also found that a long-standing issue with duplicate events in Schema:EchoInteraction wasn't fixed yet, so I wrote a patch for that too: https://gerrit.wikimedia.org/r/274342

On Tue, Dec 15, 2015 at 11:16 AM, Jonathan Morgan <jmorgan@wikimedia.org> wrote:
Hi Nuria!

Speaking for my own particular scenario, that solution sounds like it will be fine, since I don't plan on immediately performing research with these data.

But it's obviously still the Collab team's call here--they likely have needs I know nothing about. Cc'ing Joe Matazzoni in case he's not following this already...

J



On Tue, Dec 15, 2015 at 9:50 AM, Nuria Ruiz <nuria@wikimedia.org> wrote:

>We could blacklist this schema from the mysql database, and still keep producing it.  It would be available in Hadoop either way.

Right but I would also like to drop the table if it is not being used, if data is not going to be looked at soonish there is no point in storing as it will likely be deleted before it gets looked at. 

Thanks, 

Nuria

On Tue, Dec 15, 2015 at 9:35 AM, Andrew Otto <aotto@wikimedia.org> wrote:
We could blacklist this schema from the mysql database, and still keep producing it.  It would be available in Hadoop either way.


On Dec 15, 2015, at 12:22, Jonathan Morgan <jmorgan@wikimedia.org> wrote:

Hi Nuria,

FWIW: Although I'm not using this right now, but I could see it being useful for understanding the impact of new notification updates that are coming down the pike.[1][2]

What are the costs involved in keeping this schema up?

Best,

On Tue, Dec 15, 2015 at 8:22 AM, Nuria Ruiz <nuria@wikimedia.org> wrote:
Roan:

The data for Echo schema(https://meta.wikimedia.org/wiki/Schema:Echo) is quite large and we are not sure is even used. 

Can you confirm either way? If it is no longer used we will stop collecting it.


Thanks, 

Nuria

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Neil P. Quinn, product analyst
Wikimedia Foundation