*Schema:Edit contains no useful information that isn't already in the database apart from which button people use to thank each other,*
I assume you mean Schema:Echo? :)
On Tue, Mar 1, 2016 at 11:58 PM, Roan Kattouw rkattouw@wikimedia.org wrote:
[Reviving old thread]
I was looking at our EventLogging data today, and discovered that Schema:Edit contains no useful information that isn't already in the database apart from which button people use to thank each other, and if we really care about that we can measure it separately without producing nine gigs of unused data.
Feel free to delete the data associated with Schema:Echo (but not Schema:EchoInteraction! We do use that one) with extreme prejudice. I've also written a config patch to stop us from producing these events ( https://gerrit.wikimedia.org/r/#/c/274345/ ) which I will deploy in the SWAT on Thursday.
I also found that a long-standing issue with duplicate events in Schema:EchoInteraction wasn't fixed yet, so I wrote a patch for that too: https://gerrit.wikimedia.org/r/274342
On Tue, Dec 15, 2015 at 11:16 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
Hi Nuria!
Speaking for *my own particular scenario*, that solution sounds like it will be fine, since I don't plan on immediately performing research with these data.
But it's obviously still the Collab team's call here--they likely have needs I know nothing about. Cc'ing Joe Matazzoni in case he's not following this already...
J
On Tue, Dec 15, 2015 at 9:50 AM, Nuria Ruiz nuria@wikimedia.org wrote:
We could blacklist this schema from the mysql database, and still keep
producing it. It would be available in Hadoop either way.
Right but I would also like to drop the table if it is not being used, if data is not going to be looked at soonish there is no point in storing as it will likely be deleted before it gets looked at.
Thanks,
Nuria
On Tue, Dec 15, 2015 at 9:35 AM, Andrew Otto aotto@wikimedia.org wrote:
We could blacklist this schema from the mysql database, and still keep producing it. It would be available in Hadoop either way.
On Dec 15, 2015, at 12:22, Jonathan Morgan jmorgan@wikimedia.org wrote:
Hi Nuria,
FWIW: Although I'm not using this right now, but I could see it being useful for understanding the impact of new notification updates that are coming down the pike.[1][2]
What are the costs involved in keeping this schema up?
Best, J
https://meta.wikimedia.org/wiki/Research:Cross-wiki_notifications_user_resea... 2. https://phabricator.wikimedia.org/T116741
On Tue, Dec 15, 2015 at 8:22 AM, Nuria Ruiz nuria@wikimedia.org wrote:
Roan:
The data for Echo schema(https://meta.wikimedia.org/wiki/Schema:Echo) is quite large and we are not sure is even used.
Can you confirm either way? If it is no longer used we will stop collecting it.
Thanks,
Nuria
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics