+analytics
On Wed, Nov 13, 2013 at 5:09 PM, Jon Robson jrobson@wikimedia.org wrote:
Thanks so much Juliusz for exploring this and great work fixing the schema (apologies for me not predicting that might be an issue) and sorry for all the pain this must have caused you.
We can't be the only teams using Limn in the Foundation. It might be worth pulling everyone together. Am I right in thinking that Limn is a child of the analytics team? Maybe we should at least spend some with them getting our use case resolved.. I guess this is why we have an analytics department? I can raise this issue in the next Scrum of Scrums if it is not resolved by then.
On Wed, Nov 13, 2013 at 3:54 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
For the past few days (or more) graphs at http://mobile-reportcard.wmflabs.org/ stopped updating. The dashboard consists of two parts: Limn, which displays the data, and backend scripts that generate the graph data based on Event Logging data. The issue was caused by two independent problems in the second component:
- A change of MobileWebEditing schema was incorrectly addressed in the
scripts' config and caused the script to throw an exception. 2. Backend scripts are stupid and not optimized at all.
The first thing is fixed. To work around the second thing I had to
disable
updates of "Editors registered on mobile who made 5+ edits on enwiki (mobile+desktop)" graph [1] for now (the query was timing out and
causing an
exception too) and removed the performance graph, since we'll be using ganglia (and soon graphite) for that [2]. Graphs should get updated soon.
So why are those backend scripts stupid? Because they run every hour and recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are
still
recalculated every hour. This was a quick and easy solution for
generating
graphs, but as Event Logging tables keep growing, we add more graphs and those graphs show more and more data, it's no longer performing.
I discussed this briefly with Ori and I think we agree on the general direction. We should definitely schedule some time for working on this.
We
could start with a spike investigating if there is a framework for aggregating the sums that we could use and asking what other teams in the foundation use for generating their graph data. The results of this spike and possible following work could be useful not only for the mobile team.
http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&tab=v&v...
-- Juliusz
Hi all --
Limn is indeed the child of the Analytics team and we are happy to work with you to resolve these issues. Always feel free to reach out to this list if you have any questions/issues.
A couple of specific comments -- right now Limn will graph what you tell it to so while we can certainly help optimize the processing, the scripts themselves are custom to the data being graphed.
We have hired an engineer to focus on Limn specifically who will start in December. We're in the process of putting together some requirements and use cases for her to work on and I will circulate this broadly once I have incorporated feedback from our quarterly reviews.
We'll also reach out to Mobile directly.
-Toby
On Wed, Nov 13, 2013 at 4:14 PM, Arthur Richards arichards@wikimedia.orgwrote:
+analytics
On Wed, Nov 13, 2013 at 5:09 PM, Jon Robson jrobson@wikimedia.org wrote:
Thanks so much Juliusz for exploring this and great work fixing the schema (apologies for me not predicting that might be an issue) and sorry for all the pain this must have caused you.
We can't be the only teams using Limn in the Foundation. It might be worth pulling everyone together. Am I right in thinking that Limn is a child of the analytics team? Maybe we should at least spend some with them getting our use case resolved.. I guess this is why we have an analytics department? I can raise this issue in the next Scrum of Scrums if it is not resolved by then.
On Wed, Nov 13, 2013 at 3:54 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
For the past few days (or more) graphs at http://mobile-reportcard.wmflabs.org/ stopped updating. The dashboard consists of two parts: Limn, which displays the data, and backend
scripts
that generate the graph data based on Event Logging data. The issue was caused by two independent problems in the second component:
- A change of MobileWebEditing schema was incorrectly addressed in the
scripts' config and caused the script to throw an exception. 2. Backend scripts are stupid and not optimized at all.
The first thing is fixed. To work around the second thing I had to
disable
updates of "Editors registered on mobile who made 5+ edits on enwiki (mobile+desktop)" graph [1] for now (the query was timing out and
causing an
exception too) and removed the performance graph, since we'll be using ganglia (and soon graphite) for that [2]. Graphs should get updated
soon.
So why are those backend scripts stupid? Because they run every hour and recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are
still
recalculated every hour. This was a quick and easy solution for
generating
graphs, but as Event Logging tables keep growing, we add more graphs and those graphs show more and more data, it's no longer performing.
I discussed this briefly with Ori and I think we agree on the general direction. We should definitely schedule some time for working on this.
We
could start with a spike investigating if there is a framework for aggregating the sums that we could use and asking what other teams in
the
foundation use for generating their graph data. The results of this
spike
and possible following work could be useful not only for the mobile
team.
http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&tab=v&v...
-- Juliusz
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
It's true that we're hoping to re-focus on Limn. It hasn't seen any work since last March. However, I think this issue has little to do with Limn and more with how we're generating data for it. I would love to talk about how we're currently generating this data and how we can improve, I've been thinking a lot about the problem and I think Toby is making solutions to it an upcoming priority.
On Wed, Nov 13, 2013 at 8:00 PM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi all --
Limn is indeed the child of the Analytics team and we are happy to work with you to resolve these issues. Always feel free to reach out to this list if you have any questions/issues.
A couple of specific comments -- right now Limn will graph what you tell it to so while we can certainly help optimize the processing, the scripts themselves are custom to the data being graphed.
We have hired an engineer to focus on Limn specifically who will start in December. We're in the process of putting together some requirements and use cases for her to work on and I will circulate this broadly once I have incorporated feedback from our quarterly reviews.
We'll also reach out to Mobile directly.
-Toby
On Wed, Nov 13, 2013 at 4:14 PM, Arthur Richards arichards@wikimedia.orgwrote:
+analytics
On Wed, Nov 13, 2013 at 5:09 PM, Jon Robson jrobson@wikimedia.orgwrote:
Thanks so much Juliusz for exploring this and great work fixing the schema (apologies for me not predicting that might be an issue) and sorry for all the pain this must have caused you.
We can't be the only teams using Limn in the Foundation. It might be worth pulling everyone together. Am I right in thinking that Limn is a child of the analytics team? Maybe we should at least spend some with them getting our use case resolved.. I guess this is why we have an analytics department? I can raise this issue in the next Scrum of Scrums if it is not resolved by then.
On Wed, Nov 13, 2013 at 3:54 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
For the past few days (or more) graphs at http://mobile-reportcard.wmflabs.org/ stopped updating. The dashboard consists of two parts: Limn, which displays the data, and backend
scripts
that generate the graph data based on Event Logging data. The issue was caused by two independent problems in the second component:
- A change of MobileWebEditing schema was incorrectly addressed in the
scripts' config and caused the script to throw an exception. 2. Backend scripts are stupid and not optimized at all.
The first thing is fixed. To work around the second thing I had to
disable
updates of "Editors registered on mobile who made 5+ edits on enwiki (mobile+desktop)" graph [1] for now (the query was timing out and
causing an
exception too) and removed the performance graph, since we'll be using ganglia (and soon graphite) for that [2]. Graphs should get updated
soon.
So why are those backend scripts stupid? Because they run every hour
and
recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are
still
recalculated every hour. This was a quick and easy solution for
generating
graphs, but as Event Logging tables keep growing, we add more graphs
and
those graphs show more and more data, it's no longer performing.
I discussed this briefly with Ori and I think we agree on the general direction. We should definitely schedule some time for working on
this. We
could start with a spike investigating if there is a framework for aggregating the sums that we could use and asking what other teams in
the
foundation use for generating their graph data. The results of this
spike
and possible following work could be useful not only for the mobile
team.
http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&tab=v&v...
-- Juliusz
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
This message by me went only to mobile-tech:
Sure, we should raise it. Limn is made by analytics, but Limn itself is not the problem here. Still, analytics might have some suggestions of what we could use for the backend.
On 11/13/2013 06:03 PM, Dan Andreescu wrote:
It's true that we're hoping to re-focus on Limn. It hasn't seen any work since last March. However, I think this issue has little to do with Limn and more with how we're generating data for it. I would love to talk about how we're currently generating this data and how we can improve, I've been thinking a lot about the problem and I think Toby is making solutions to it an upcoming priority.
On Wed, Nov 13, 2013 at 8:00 PM, Toby Negrin <tnegrin@wikimedia.org mailto:tnegrin@wikimedia.org> wrote:
Hi all -- Limn is indeed the child of the Analytics team and we are happy to work with you to resolve these issues. Always feel free to reach out to this list if you have any questions/issues. A couple of specific comments -- right now Limn will graph what you tell it to so while we can certainly help optimize the processing, the scripts themselves are custom to the data being graphed. We have hired an engineer to focus on Limn specifically who will start in December. We're in the process of putting together some requirements and use cases for her to work on and I will circulate this broadly once I have incorporated feedback from our quarterly reviews. We'll also reach out to Mobile directly. -Toby On Wed, Nov 13, 2013 at 4:14 PM, Arthur Richards <arichards@wikimedia.org <mailto:arichards@wikimedia.org>> wrote: +analytics On Wed, Nov 13, 2013 at 5:09 PM, Jon Robson <jrobson@wikimedia.org <mailto:jrobson@wikimedia.org>> wrote: Thanks so much Juliusz for exploring this and great work fixing the schema (apologies for me not predicting that might be an issue) and sorry for all the pain this must have caused you. We can't be the only teams using Limn in the Foundation. It might be worth pulling everyone together. Am I right in thinking that Limn is a child of the analytics team? Maybe we should at least spend some with them getting our use case resolved.. I guess this is why we have an analytics department? I can raise this issue in the next Scrum of Scrums if it is not resolved by then. On Wed, Nov 13, 2013 at 3:54 PM, Juliusz Gonera <jgonera@wikimedia.org <mailto:jgonera@wikimedia.org>> wrote: > For the past few days (or more) graphs at > http://mobile-reportcard.wmflabs.org/ stopped updating. The dashboard > consists of two parts: Limn, which displays the data, and backend scripts > that generate the graph data based on Event Logging data. The issue was > caused by two independent problems in the second component: > > 1. A change of MobileWebEditing schema was incorrectly addressed in the > scripts' config and caused the script to throw an exception. > 2. Backend scripts are stupid and not optimized at all. > > The first thing is fixed. To work around the second thing I had to disable > updates of "Editors registered on mobile who made 5+ edits on enwiki > (mobile+desktop)" graph [1] for now (the query was timing out and causing an > exception too) and removed the performance graph, since we'll be using > ganglia (and soon graphite) for that [2]. Graphs should get updated soon. > > So why are those backend scripts stupid? Because they run every hour and > recalculate _all_ the values for every single graph. For example, even > though total unique editors for June 2013 will never change, they are still > recalculated every hour. This was a quick and easy solution for generating > graphs, but as Event Logging tables keep growing, we add more graphs and > those graphs show more and more data, it's no longer performing. > > I discussed this briefly with Ori and I think we agree on the general > direction. We should definitely schedule some time for working on this. We > could start with a spike investigating if there is a framework for > aggregating the sums that we could use and asking what other teams in the > foundation use for generating their graph data. The results of this spike > and possible following work could be useful not only for the mobile team. > > [1] https://gerrit.wikimedia.org/r/#/c/95298/ > [2] > http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&tab=v&vn=Mobile+Web&hide-hf=false > > -- > Juliusz -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 <tel:%2B1-415-839-6885%20x6687> _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics
HI,
On Wed, Nov 13, 2013 at 05:14:51PM -0700, Arthur Richards wrote:
[...]
On Wed, Nov 13, 2013 at 3:54 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
Because they run every hour and recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are
still
recalculated every hour.
Several of our jobs had to overcome the same problem. The solution there was the same as you proposed: A container to store aggregated, historic data and reusing this data when generating the graphs.
Adding yesterday's data to the container is one cron job. Generating the graphs from the data in the container is a separate cron job. This separation proved to be useful on many occasions.
For some jobs the container itself is a separate database (e.g.: geowiki), and for other jobs the container is a set of plain files (e.g.: Wikipedia Zero). Both approaches come with the obvious (dis-)advantages: Querying a database is efficient and easy. But putting data under version control and monitor changes when having to rerun aggregation for say the last two weeks is easier when working with plain files.
We could start with a spike investigating if there is a framework for aggregating the sums [...]
Our approaches are hard-wired into our legacy code. So we do not use a common, solid framework for it.
I haven't done any research on whether or not such frameworks exist. But if you find some good framework, please let us know, it would certainly be interesting.
Best regards, Christian
We could start with a spike investigating if there is a framework for aggregating the sums [...]
Our approaches are hard-wired into our legacy code. So we do not use a common, solid framework for it.
I haven't done any research on whether or not such frameworks exist. But if you find some good framework, please let us know, it would certainly be interesting.
There are certainly products like Cassandra and Spark that make working with big (or bunches of small) data easy and fast.
There are more sophisticated but less mature products like Druid that work with dimensional data.
We have solid options, we just have to decide that this is a priority and move on it. The pageviews API sprint was nice but we abandoned it after a week of work because of changed priorities.
Analytics folks, what would you say to setting aside some time to collaborate on this? Maybe pair a mobile web engineer with an analytics engineer for a sprint or two? It would be great if we can get this figured out sooner rather than later since we rely so heavily on the data and its presentation.
On Thu, Nov 14, 2013 at 6:40 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:
We could start with a spike investigating if there is a framework for aggregating the sums [...]
Our approaches are hard-wired into our legacy code. So we do not use a common, solid framework for it.
I haven't done any research on whether or not such frameworks exist. But if you find some good framework, please let us know, it would certainly be interesting.
There are certainly products like Cassandra and Spark that make working with big (or bunches of small) data easy and fast.
There are more sophisticated but less mature products like Druid that work with dimensional data.
We have solid options, we just have to decide that this is a priority and move on it. The pageviews API sprint was nice but we abandoned it after a week of work because of changed priorities.
I'm for it, and would love to help. You'd have to talk to my boss though. Toby? :)
Also, in the meantime, we started experimenting with 10% time. I'd be happy to have at least an initial meeting and hack session one day next week. If anyone on the mobile team is interested, just pick a day.
Dan
On Thu, Nov 14, 2013 at 2:02 PM, Arthur Richards arichards@wikimedia.orgwrote:
Analytics folks, what would you say to setting aside some time to collaborate on this? Maybe pair a mobile web engineer with an analytics engineer for a sprint or two? It would be great if we can get this figured out sooner rather than later since we rely so heavily on the data and its presentation.
On Thu, Nov 14, 2013 at 6:40 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:
We could start with a spike investigating if there is a framework for aggregating the sums [...]
Our approaches are hard-wired into our legacy code. So we do not use a common, solid framework for it.
I haven't done any research on whether or not such frameworks exist. But if you find some good framework, please let us know, it would certainly be interesting.
There are certainly products like Cassandra and Spark that make working with big (or bunches of small) data easy and fast.
There are more sophisticated but less mature products like Druid that work with dimensional data.
We have solid options, we just have to decide that this is a priority and move on it. The pageviews API sprint was nice but we abandoned it after a week of work because of changed priorities.
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Waddaya say Toby?
Almost the entire mobile web team is traveling this week and next; however Jon Robson is around and may have some time - what do you think Jon?
On Thu, Nov 14, 2013 at 1:48 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
I'm for it, and would love to help. You'd have to talk to my boss though. Toby? :)
Also, in the meantime, we started experimenting with 10% time. I'd be happy to have at least an initial meeting and hack session one day next week. If anyone on the mobile team is interested, just pick a day.
Dan
On Thu, Nov 14, 2013 at 2:02 PM, Arthur Richards arichards@wikimedia.orgwrote:
Analytics folks, what would you say to setting aside some time to collaborate on this? Maybe pair a mobile web engineer with an analytics engineer for a sprint or two? It would be great if we can get this figured out sooner rather than later since we rely so heavily on the data and its presentation.
On Thu, Nov 14, 2013 at 6:40 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:
> We > could start with a spike investigating if there is a framework for > aggregating the sums [...]
Our approaches are hard-wired into our legacy code. So we do not use a common, solid framework for it.
I haven't done any research on whether or not such frameworks exist. But if you find some good framework, please let us know, it would certainly be interesting.
There are certainly products like Cassandra and Spark that make working with big (or bunches of small) data easy and fast.
There are more sophisticated but less mature products like Druid that work with dimensional data.
We have solid options, we just have to decide that this is a priority and move on it. The pageviews API sprint was nice but we abandoned it after a week of work because of changed priorities.
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Hi Dan, Happy to pair but how the Python scripts get used is a little black magic to me and I'm only vaguely familiar with them. That said I'm sure if we combined heads we would be able to work this out.
Do you want to send me a calendar invite this week to organise an hour to do this?
Jon
On Thu, Nov 14, 2013 at 1:06 PM, Arthur Richards arichards@wikimedia.org wrote:
Waddaya say Toby?
Almost the entire mobile web team is traveling this week and next; however Jon Robson is around and may have some time - what do you think Jon?
On Thu, Nov 14, 2013 at 1:48 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
I'm for it, and would love to help. You'd have to talk to my boss though. Toby? :)
Also, in the meantime, we started experimenting with 10% time. I'd be happy to have at least an initial meeting and hack session one day next week. If anyone on the mobile team is interested, just pick a day.
Dan
On Thu, Nov 14, 2013 at 2:02 PM, Arthur Richards arichards@wikimedia.org wrote:
Analytics folks, what would you say to setting aside some time to collaborate on this? Maybe pair a mobile web engineer with an analytics engineer for a sprint or two? It would be great if we can get this figured out sooner rather than later since we rely so heavily on the data and its presentation.
On Thu, Nov 14, 2013 at 6:40 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
> > We > > could start with a spike investigating if there is a framework > > for > > aggregating the sums [...]
Our approaches are hard-wired into our legacy code. So we do not use a common, solid framework for it.
I haven't done any research on whether or not such frameworks exist. But if you find some good framework, please let us know, it would certainly be interesting.
There are certainly products like Cassandra and Spark that make working with big (or bunches of small) data easy and fast.
There are more sophisticated but less mature products like Druid that work with dimensional data.
We have solid options, we just have to decide that this is a priority and move on it. The pageviews API sprint was nice but we abandoned it after a week of work because of changed priorities.
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
On Mon, Nov 18, 2013 at 7:30 PM, Jon Robson jrobson@wikimedia.org wrote:
Hi Dan, Happy to pair but how the Python scripts get used is a little black magic to me and I'm only vaguely familiar with them. That said I'm sure if we combined heads we would be able to work this out.
Do you want to send me a calendar invite this week to organise an hour to do this?
K, I made it between Scrum of Scrums and lunch - short notice but makes sense. Feel free to reschedule.
On 11/13/2013 07:14 PM, Arthur Richards wrote:
So why are those backend scripts stupid? Because they run every hour and recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are still recalculated every hour.
Is it really true that they will never change? I think many of the metrics are written such that when a page is deleted, it reduces edits in the past. So if I delete a page today (November 2013) that happened to be edited in June 2013, that affects the June 2013 edit counts.
That isn't intuitive anyway, but if there's a change in this regard, it needs to be communicated.
Matt Flaschen
On Tue, Nov 19, 2013 at 2:07 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 11/13/2013 07:14 PM, Arthur Richards wrote:
So why are those backend scripts stupid? Because they run every hour and
recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are still recalculated every hour.
Is it really true that they will never change? I think many of the metrics are written such that when a page is deleted, it reduces edits in the past. So if I delete a page today (November 2013) that happened to be edited in June 2013, that affects the June 2013 edit counts.
That isn't intuitive anyway, but if there's a change in this regard, it needs to be communicated.
Yeah, we refer to that as deletion drift. Dario is heading an effort to make these metrics more standard and intuitive ( https://meta.wikimedia.org/wiki/Research:Refining_the_definition_of_monthly_...). We'll have to see what we need for these dashboards and if the new definitions would help.
K, so a quick follow-up on this. Jon and I worked today and identified two short term problems.
1. http://mobile-reportcard.wmflabs.org/graphs/edits-monthly-5plus-editors no longer updates because the query used for it takes too long to finish 2. the scripts run hourly even for graphs that only need to be updated daily
For 1, I fiddled with the SQL until it performed a little better. It was also not correct, as I believe it was getting "the number of people who created an account in month X and made >= 5 edits anytime". I changed it to what I assumed we wanted, which is "the number of people who created an account and made >= 5 edits in month X". This new query ( https://gist.github.com/milimetric/7554108) takes 4 minutes to run 3 months' worth. Juliusz, any idea what the timeout is on that job? I'm running the query now for 13 months and if it's < timeout, we can just deploy it. Otherwise, we can maybe run one month at a time and concat results. Let me know what you think and I'll make a Change
For 2, Jon made a Change: https://gerrit.wikimedia.org/r/#/c/96315/ and once merged, things will run at their configured frequencies
For the bigger picture, I'll be in SF in mid-December. We should totally get together and figure out how to do this in the general case. For example, notice in my query above I'm materializing all active editors for all months as a sub-query. I think that would be a hugely useful materialized view (in Hive or MySQL or etc.). Basically everyone would use it, and we could do the same thing for any standardized metric.
On Tue, Nov 19, 2013 at 9:16 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Tue, Nov 19, 2013 at 2:07 AM, Matthew Flaschen <mflaschen@wikimedia.org
wrote:
On 11/13/2013 07:14 PM, Arthur Richards wrote:
So why are those backend scripts stupid? Because they run every hour and
recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are still recalculated every hour.
Is it really true that they will never change? I think many of the metrics are written such that when a page is deleted, it reduces edits in the past. So if I delete a page today (November 2013) that happened to be edited in June 2013, that affects the June 2013 edit counts.
That isn't intuitive anyway, but if there's a change in this regard, it needs to be communicated.
Yeah, we refer to that as deletion drift. Dario is heading an effort to make these metrics more standard and intuitive ( https://meta.wikimedia.org/wiki/Research:Refining_the_definition_of_monthly_...). We'll have to see what we need for these dashboards and if the new definitions would help.
awesome Dan -- thanks for the help!
On Tue, Nov 19, 2013 at 3:26 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
K, so a quick follow-up on this. Jon and I worked today and identified two short term problems.
longer updates because the query used for it takes too long to finish 2. the scripts run hourly even for graphs that only need to be updated daily
For 1, I fiddled with the SQL until it performed a little better. It was also not correct, as I believe it was getting "the number of people who created an account in month X and made >= 5 edits anytime". I changed it to what I assumed we wanted, which is "the number of people who created an account and made >= 5 edits in month X". This new query ( https://gist.github.com/milimetric/7554108) takes 4 minutes to run 3 months' worth. Juliusz, any idea what the timeout is on that job? I'm running the query now for 13 months and if it's < timeout, we can just deploy it. Otherwise, we can maybe run one month at a time and concat results. Let me know what you think and I'll make a Change
For 2, Jon made a Change: https://gerrit.wikimedia.org/r/#/c/96315/ and once merged, things will run at their configured frequencies
For the bigger picture, I'll be in SF in mid-December. We should totally get together and figure out how to do this in the general case. For example, notice in my query above I'm materializing all active editors for all months as a sub-query. I think that would be a hugely useful materialized view (in Hive or MySQL or etc.). Basically everyone would use it, and we could do the same thing for any standardized metric.
On Tue, Nov 19, 2013 at 9:16 AM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Tue, Nov 19, 2013 at 2:07 AM, Matthew Flaschen < mflaschen@wikimedia.org> wrote:
On 11/13/2013 07:14 PM, Arthur Richards wrote:
So why are those backend scripts stupid? Because they run every hour
and recalculate _all_ the values for every single graph. For example, even though total unique editors for June 2013 will never change, they are still recalculated every hour.
Is it really true that they will never change? I think many of the metrics are written such that when a page is deleted, it reduces edits in the past. So if I delete a page today (November 2013) that happened to be edited in June 2013, that affects the June 2013 edit counts.
That isn't intuitive anyway, but if there's a change in this regard, it needs to be communicated.
Yeah, we refer to that as deletion drift. Dario is heading an effort to make these metrics more standard and intuitive ( https://meta.wikimedia.org/wiki/Research:Refining_the_definition_of_monthly_...). We'll have to see what we need for these dashboards and if the new definitions would help.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks Dan!
On Tue, Nov 19, 2013 at 4:26 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
K, so a quick follow-up on this. Jon and I worked today and identified two short term problems.
longer updates because the query used for it takes too long to finish 2. the scripts run hourly even for graphs that only need to be updated daily
For 1, I fiddled with the SQL until it performed a little better. It was also not correct, as I believe it was getting "the number of people who created an account in month X and made >= 5 edits anytime". I changed it to what I assumed we wanted, which is "the number of people who created an account and made >= 5 edits in month X". This new query ( https://gist.github.com/milimetric/7554108) takes 4 minutes to run 3 months' worth. Juliusz, any idea what the timeout is on that job? I'm running the query now for 13 months and if it's < timeout, we can just deploy it. Otherwise, we can maybe run one month at a time and concat results. Let me know what you think and I'll make a Change
This is actually not correct - the query is supposed to be who created an account in month X and made >= 5 edits ever. But, if we can't make that kind of query performant enough... are you ok with this change, Kenan?
see https://meta.wikimedia.org/wiki/Research:Mobile_editor_engagement/Mobile_act...
On Nov 19, 2013, at 4:04 PM, Arthur Richards arichards@wikimedia.org wrote:
Thanks Dan!
On Tue, Nov 19, 2013 at 4:26 PM, Dan Andreescu dandreescu@wikimedia.org wrote: K, so a quick follow-up on this. Jon and I worked today and identified two short term problems.
- http://mobile-reportcard.wmflabs.org/graphs/edits-monthly-5plus-editors no longer updates because the query used for it takes too long to finish
- the scripts run hourly even for graphs that only need to be updated daily
For 1, I fiddled with the SQL until it performed a little better. It was also not correct, as I believe it was getting "the number of people who created an account in month X and made >= 5 edits anytime". I changed it to what I assumed we wanted, which is "the number of people who created an account and made >= 5 edits in month X". This new query (https://gist.github.com/milimetric/7554108) takes 4 minutes to run 3 months' worth. Juliusz, any idea what the timeout is on that job? I'm running the query now for 13 months and if it's < timeout, we can just deploy it. Otherwise, we can maybe run one month at a time and concat results. Let me know what you think and I'll make a Change
This is actually not correct - the query is supposed to be who created an account in month X and made >= 5 edits ever. But, if we can't make that kind of query performant enough... are you ok with this change, Kenan?
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
This is actually not correct - the query is supposed to be who created an account in month X and made >= 5 edits ever. But, if we can't make that kind of query performant enough... are you ok with this change, Kenan?
Oh that's really interesting. Both Jon and I thought this must've been a mistake. I think people are likely to confuse this with the active editor metric. Well, either way, neither is more performant than the other. I can try to tune the query as originally intended if you'd like.
That would be good. It is suppose to be in line with the active editor metric.
On Tue, Nov 19, 2013 at 7:11 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
This is actually not correct - the query is supposed to be who created an
account in month X and made >= 5 edits ever. But, if we can't make that kind of query performant enough... are you ok with this change, Kenan?
Oh that's really interesting. Both Jon and I thought this must've been a mistake. I think people are likely to confuse this with the active editor metric. Well, either way, neither is more performant than the other. I can try to tune the query as originally intended if you'd like.
Thanks for the work Dan and Jon!
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote:
That would be good. It is suppose to be in line with the active editor metric.
On Tue, Nov 19, 2013 at 7:11 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
This is actually not correct - the query is supposed to be who created
an account in month X and made >= 5 edits ever. But, if we can't make that kind of query performant enough... are you ok with this change, Kenan?
Oh that's really interesting. Both Jon and I thought this must've been a mistake. I think people are likely to confuse this with the active editor metric. Well, either way, neither is more performant than the other. I can try to tune the query as originally intended if you'd like.
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote:
That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
1. How many people, of those that created their account on mobile, made at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
2. How many people created their account on mobile in month X, and made at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
Dario -- can you please provide some input? Let's nail down what we're trying to measure here and if possible make it consistent.
thanks,
-Toby
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote:
That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile, made at
least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and made at
least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
We have two options:
1) if this is supposed to become a canonical definition of “mobile active editors” (i.e. a definition that we would be comfortable exposing on the official reportcard), we cannot nail down the exact query until we’ve made a decision based on:
[1] https://meta.wikimedia.org/wiki/Research:Refining_the_definition_of_monthly_... [2] https://meta.wikimedia.org/wiki/Research:Mobile_editor_engagement/Mobile_act...
We need to evaluate the pros and cons of different definitions (and once we’ve settled on a canonical definition, we won’t be changing it again). I don’t expect this to happen within a few hours or days.
2) if instead we want to restore data for the mobile dashboards ASAP based on an interim definition (with the understanding that this may change once there is consensus on the canonical one), I recommend we go for Dan’s definition #1 and follow the assumptions described in the Mobile active editor page above (option C). Someone correct me if I am wrong, but I don’t think Dan’s definition #2 is a use case I’ve heard of from the Mobile team (cumulative lifetime 5+ editors for cohorts defined by the month of registration).
Dario
On Nov 20, 2013, at 12:58 PM, Toby Negrin tnegrin@wikimedia.org wrote:
Dario -- can you please provide some input? Let's nail down what we're trying to measure here and if possible make it consistent.
thanks,
-Toby
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu dandreescu@wikimedia.org wrote: On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote: That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
How many people, of those that created their account on mobile, made at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
How many people created their account on mobile in month X, and made at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
- if instead we want to restore data for the mobile dashboards ASAP based
on an interim definition (with the understanding that this may change once there is consensus on the canonical one), I recommend we go for Dan’s definition #1 and follow the assumptions described in the Mobile active editor page above (option C). Someone correct me if I am wrong, but I don’t think Dan’s definition #2 is a use case I’ve heard of from the Mobile team (cumulative lifetime 5+ editors for cohorts defined by the month of registration).
I would vote for this option, and we can migrate to the new definition once it's ready. As far as the difference between my #1 and #2, the facts are:
* the query was initially doing #2 * The Mingle card uses my #1: https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1261 * Arthur replied a while ago that my #2 definition was correct
Kenan / Arthur, can you guys chime in and disambiguate?
Dan
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote:
That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile, made at
least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and made at
least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
I confirmed with Kenan that the metric is similar to the activation metric used in WikiMetrics.
-Toby
On Thu, Nov 21, 2013 at 2:07 PM, Kenan Wang kwang@wikimedia.org wrote:
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote:
That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile, made
at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and made
at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
The metric in WikiMetrics could be used for this, but the cohort would have to be re-generated all the time. I implemented the new logic that Kenan described. The problem is, this query runs crazy slow on the slaves:
https://gist.github.com/milimetric/7554108#file-active-editors-from-accounts...
I'm gonna try generating a huge cohort, upload it to wikimetrics and see where I get.
On Thu, Nov 21, 2013 at 5:28 PM, Toby Negrin tnegrin@wikimedia.org wrote:
I confirmed with Kenan that the metric is similar to the activation metric used in WikiMetrics.
-Toby
On Thu, Nov 21, 2013 at 2:07 PM, Kenan Wang kwang@wikimedia.org wrote:
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.orgwrote:
That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile, made
at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and made
at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
OK, so to sum this up:
* wikimetrics won't help us yet because timeseries isn't implemented for threshold * the query is just friggin slow, and I don't think there's anything we can do about it * SILVER LINING: with this new metric definition, we don't have to re-compute the entire history every day. So my proposal is to do this daily:
1. dump that day's worth of data into a temp table in enwiki_p 2. join that to revision_userindex and run the query I wrote 3. concatenate the results manually, maybe with some simple python
It's a bitch, but it's all we can do until we tackle this problem the right way (big data, data warehouse, etc.)
On Thu, Nov 21, 2013 at 5:48 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
The metric in WikiMetrics could be used for this, but the cohort would have to be re-generated all the time. I implemented the new logic that Kenan described. The problem is, this query runs crazy slow on the slaves:
https://gist.github.com/milimetric/7554108#file-active-editors-from-accounts...
I'm gonna try generating a huge cohort, upload it to wikimetrics and see where I get.
On Thu, Nov 21, 2013 at 5:28 PM, Toby Negrin tnegrin@wikimedia.orgwrote:
I confirmed with Kenan that the metric is similar to the activation metric used in WikiMetrics.
-Toby
On Thu, Nov 21, 2013 at 2:07 PM, Kenan Wang kwang@wikimedia.org wrote:
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu < dandreescu@wikimedia.org> wrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.orgwrote:
That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile, made
at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and made
at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Heh, besides the fact that we don't have threshold, this ran on wikimetrics in like 10 seconds:
https://metrics.wmflabs.org/reports/result/c06d3de0-060f-4434-9bd1-ba5606e15...
Which confirms my suspicion that it's not the mediawiki tables that are problematic, but the cross-db join.
On Thu, Nov 21, 2013 at 6:10 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
OK, so to sum this up:
- wikimetrics won't help us yet because timeseries isn't implemented for
threshold
- the query is just friggin slow, and I don't think there's anything we
can do about it
- SILVER LINING: with this new metric definition, we don't have to
re-compute the entire history every day. So my proposal is to do this daily:
- dump that day's worth of data into a temp table in enwiki_p
- join that to revision_userindex and run the query I wrote
- concatenate the results manually, maybe with some simple python
It's a bitch, but it's all we can do until we tackle this problem the right way (big data, data warehouse, etc.)
On Thu, Nov 21, 2013 at 5:48 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
The metric in WikiMetrics could be used for this, but the cohort would have to be re-generated all the time. I implemented the new logic that Kenan described. The problem is, this query runs crazy slow on the slaves:
https://gist.github.com/milimetric/7554108#file-active-editors-from-accounts...
I'm gonna try generating a huge cohort, upload it to wikimetrics and see where I get.
On Thu, Nov 21, 2013 at 5:28 PM, Toby Negrin tnegrin@wikimedia.orgwrote:
I confirmed with Kenan that the metric is similar to the activation metric used in WikiMetrics.
-Toby
On Thu, Nov 21, 2013 at 2:07 PM, Kenan Wang kwang@wikimedia.org wrote:
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu < dandreescu@wikimedia.org> wrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.orgwrote:
> That would be good. It is suppose to be in line with the active > editor metric. >
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile,
made at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and
made at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
OK let me talk to Dario to make sure but I think we should do:
How many people created their account on mobile in month X, and made at least 5 edits in month X + the month after?
On Thu, Nov 21, 2013 at 3:14 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
Heh, besides the fact that we don't have threshold, this ran on wikimetrics in like 10 seconds:
https://metrics.wmflabs.org/reports/result/c06d3de0-060f-4434-9bd1-ba5606e15...
Which confirms my suspicion that it's not the mediawiki tables that are problematic, but the cross-db join.
On Thu, Nov 21, 2013 at 6:10 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
OK, so to sum this up:
- wikimetrics won't help us yet because timeseries isn't implemented for
threshold
- the query is just friggin slow, and I don't think there's anything we
can do about it
- SILVER LINING: with this new metric definition, we don't have to
re-compute the entire history every day. So my proposal is to do this daily:
- dump that day's worth of data into a temp table in enwiki_p
- join that to revision_userindex and run the query I wrote
- concatenate the results manually, maybe with some simple python
It's a bitch, but it's all we can do until we tackle this problem the right way (big data, data warehouse, etc.)
On Thu, Nov 21, 2013 at 5:48 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
The metric in WikiMetrics could be used for this, but the cohort would have to be re-generated all the time. I implemented the new logic that Kenan described. The problem is, this query runs crazy slow on the slaves:
https://gist.github.com/milimetric/7554108#file-active-editors-from-accounts...
I'm gonna try generating a huge cohort, upload it to wikimetrics and see where I get.
On Thu, Nov 21, 2013 at 5:28 PM, Toby Negrin tnegrin@wikimedia.orgwrote:
I confirmed with Kenan that the metric is similar to the activation metric used in WikiMetrics.
-Toby
On Thu, Nov 21, 2013 at 2:07 PM, Kenan Wang kwang@wikimedia.orgwrote:
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu < dandreescu@wikimedia.org> wrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.orgwrote: > >> That would be good. It is suppose to be in line with the active >> editor metric. >> > Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile,
made at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and
made at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Dan here is what I'm looking for:
How many users registered on enwiki in month X and reached 5 edits within 30 days
I talked with Dario and we're hoping that restricting it to enwiki solves the cross-db join issue that you were facing.
On Mon, Nov 25, 2013 at 2:31 PM, Kenan Wang kwang@wikimedia.org wrote:
OK let me talk to Dario to make sure but I think we should do:
How many people created their account on mobile in month X, and made at least 5 edits in month X + the month after?
On Thu, Nov 21, 2013 at 3:14 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
Heh, besides the fact that we don't have threshold, this ran on wikimetrics in like 10 seconds:
https://metrics.wmflabs.org/reports/result/c06d3de0-060f-4434-9bd1-ba5606e15...
Which confirms my suspicion that it's not the mediawiki tables that are problematic, but the cross-db join.
On Thu, Nov 21, 2013 at 6:10 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
OK, so to sum this up:
- wikimetrics won't help us yet because timeseries isn't implemented for
threshold
- the query is just friggin slow, and I don't think there's anything we
can do about it
- SILVER LINING: with this new metric definition, we don't have to
re-compute the entire history every day. So my proposal is to do this daily:
- dump that day's worth of data into a temp table in enwiki_p
- join that to revision_userindex and run the query I wrote
- concatenate the results manually, maybe with some simple python
It's a bitch, but it's all we can do until we tackle this problem the right way (big data, data warehouse, etc.)
On Thu, Nov 21, 2013 at 5:48 PM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
The metric in WikiMetrics could be used for this, but the cohort would have to be re-generated all the time. I implemented the new logic that Kenan described. The problem is, this query runs crazy slow on the slaves:
https://gist.github.com/milimetric/7554108#file-active-editors-from-accounts...
I'm gonna try generating a huge cohort, upload it to wikimetrics and see where I get.
On Thu, Nov 21, 2013 at 5:28 PM, Toby Negrin tnegrin@wikimedia.orgwrote:
I confirmed with Kenan that the metric is similar to the activation metric used in WikiMetrics.
-Toby
On Thu, Nov 21, 2013 at 2:07 PM, Kenan Wang kwang@wikimedia.orgwrote:
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu < dandreescu@wikimedia.org> wrote:
> On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.orgwrote: >> >>> That would be good. It is suppose to be in line with the active >>> editor metric. >>> >> > Wait, so it *is* supposed to be like the active editor metric? To > be very clear, we have two possible questions we can be answering here, and > only one of them is like the active editor metric. Both of these questions > will show results for X <- {each of the last 12 months} > > 1. How many people, of those that created their account on mobile, > made at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR] > > 2. How many people created their account on mobile in month X, and > made at least 5 edits since then? > > I've implemented both, but there is a pretty big difference so we > need to figure that out before moving on. > > Dan > > >
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Dan,
shall we touch base to make sure that we’re using consistent definitions (I’m working on a threshold-based metric proposal for new user activation [1])?
Dario
[1] https://meta.wikimedia.org/wiki/Research:New_editor
On Nov 27, 2013, at 11:41 AM, Kenan Wang kwang@wikimedia.org wrote:
Dan here is what I'm looking for:
How many users registered on enwiki in month X and reached 5 edits within 30 days
I talked with Dario and we're hoping that restricting it to enwiki solves the cross-db join issue that you were facing.
On Mon, Nov 25, 2013 at 2:31 PM, Kenan Wang kwang@wikimedia.org wrote: OK let me talk to Dario to make sure but I think we should do:
How many people created their account on mobile in month X, and made at least 5 edits in month X + the month after?
On Thu, Nov 21, 2013 at 3:14 PM, Dan Andreescu dandreescu@wikimedia.org wrote: Heh, besides the fact that we don't have threshold, this ran on wikimetrics in like 10 seconds:
https://metrics.wmflabs.org/reports/result/c06d3de0-060f-4434-9bd1-ba5606e15...
Which confirms my suspicion that it's not the mediawiki tables that are problematic, but the cross-db join.
On Thu, Nov 21, 2013 at 6:10 PM, Dan Andreescu dandreescu@wikimedia.org wrote: OK, so to sum this up:
- wikimetrics won't help us yet because timeseries isn't implemented for threshold
- the query is just friggin slow, and I don't think there's anything we can do about it
- SILVER LINING: with this new metric definition, we don't have to re-compute the entire history every day. So my proposal is to do this daily:
- dump that day's worth of data into a temp table in enwiki_p
- join that to revision_userindex and run the query I wrote
- concatenate the results manually, maybe with some simple python
It's a bitch, but it's all we can do until we tackle this problem the right way (big data, data warehouse, etc.)
On Thu, Nov 21, 2013 at 5:48 PM, Dan Andreescu dandreescu@wikimedia.org wrote: The metric in WikiMetrics could be used for this, but the cohort would have to be re-generated all the time. I implemented the new logic that Kenan described. The problem is, this query runs crazy slow on the slaves:
https://gist.github.com/milimetric/7554108#file-active-editors-from-accounts...
I'm gonna try generating a huge cohort, upload it to wikimetrics and see where I get.
On Thu, Nov 21, 2013 at 5:28 PM, Toby Negrin tnegrin@wikimedia.org wrote: I confirmed with Kenan that the metric is similar to the activation metric used in WikiMetrics.
-Toby
On Thu, Nov 21, 2013 at 2:07 PM, Kenan Wang kwang@wikimedia.org wrote: Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu dandreescu@wikimedia.org wrote: On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote: That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
How many people, of those that created their account on mobile, made at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
How many people created their account on mobile in month X, and made at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Wed, Nov 27, 2013 at 2:41 PM, Kenan Wang kwang@wikimedia.org wrote:
Dan here is what I'm looking for:
How many users registered on enwiki in month X and reached 5 edits within 30 days
I talked with Dario and we're hoping that restricting it to enwiki solves the cross-db join issue that you were facing.
Thank you. I'll see if I can tune the query to do this efficiently. The cross-db issue comes from joining the Event Logging table with the mediawiki table. If my tuning doesn't yield results, the only viable solution is to import the event logging stuff into a temp table in labsdb/enwiki_p. Then they'll be on the same database and the query should fly. Is that possible with the schema you're capturing for mobile registrations? In other words, can that data be shared publicly?
On Wed, Nov 27, 2013 at 12:18 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Wed, Nov 27, 2013 at 2:41 PM, Kenan Wang kwang@wikimedia.org wrote:
Dan here is what I'm looking for:
How many users registered on enwiki in month X and reached 5 edits within 30 days
I talked with Dario and we're hoping that restricting it to enwiki solves the cross-db join issue that you were facing.
Thank you. I'll see if I can tune the query to do this efficiently. The cross-db issue comes from joining the Event Logging table with the mediawiki table. If my tuning doesn't yield results, the only viable solution is to import the event logging stuff into a temp table in labsdb/enwiki_p. Then they'll be on the same database and the query should fly. Is that possible with the schema you're capturing for mobile registrations? In other words, can that data be shared publicly?
It doesn't make sense to do it that way. Instead of inferring that something must have happened by cross-referencing conditions across datasets, just do the following: in MediaWiki, every time a user makes an edit, check their registration date and edit count. If the date is within the last thirty days and the edit count is 5, log an event. Doing it this way will easily scale to the entire cluster, not just enwiki, and to any number of bins, not just 5 edits.
Patch at https://gerrit.wikimedia.org/r/#/c/98079/; you can take it from there if you like.
On Thu, Nov 28, 2013 at 3:17 AM, Ori Livneh ori@wikimedia.org wrote:
It doesn't make sense to do it that way. Instead of inferring that something must have happened by cross-referencing conditions across datasets, just do the following: in MediaWiki, every time a user makes an edit, check their registration date and edit count. If the date is within the last thirty days and the edit count is 5, log an event. Doing it this way will easily scale to the entire cluster, not just enwiki, and to any number of bins, not just 5 edits.
Patch at https://gerrit.wikimedia.org/r/#/c/98079/; you can take it from there if you like.
Thanks Ori - this sounds and looks viable to me, and seems like a better solution. Kenan, Jon, Dario, Dan, et al - can we move forward with this?
It sounds good to me. Dario, Dan?
On Mon, Dec 2, 2013 at 1:35 PM, Arthur Richards arichards@wikimedia.orgwrote:
On Thu, Nov 28, 2013 at 3:17 AM, Ori Livneh ori@wikimedia.org wrote:
It doesn't make sense to do it that way. Instead of inferring that something must have happened by cross-referencing conditions across datasets, just do the following: in MediaWiki, every time a user makes an edit, check their registration date and edit count. If the date is within the last thirty days and the edit count is 5, log an event. Doing it this way will easily scale to the entire cluster, not just enwiki, and to any number of bins, not just 5 edits.
Patch at https://gerrit.wikimedia.org/r/#/c/98079/; you can take it from there if you like.
Thanks Ori - this sounds and looks viable to me, and seems like a better solution. Kenan, Jon, Dario, Dan, et al - can we move forward with this?
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
On Mon, Dec 2, 2013 at 5:45 PM, Kenan Wang kwang@wikimedia.org wrote:
It sounds good to me. Dario, Dan?
On Mon, Dec 2, 2013 at 1:35 PM, Arthur Richards arichards@wikimedia.orgwrote:
On Thu, Nov 28, 2013 at 3:17 AM, Ori Livneh ori@wikimedia.org wrote:
It doesn't make sense to do it that way. Instead of inferring that something must have happened by cross-referencing conditions across datasets, just do the following: in MediaWiki, every time a user makes an edit, check their registration date and edit count. If the date is within the last thirty days and the edit count is 5, log an event. Doing it this way will easily scale to the entire cluster, not just enwiki, and to any number of bins, not just 5 edits.
Patch at https://gerrit.wikimedia.org/r/#/c/98079/; you can take it from there if you like.
Thanks Ori - this sounds and looks viable to me, and seems like a better solution. Kenan, Jon, Dario, Dan, et al - can we move forward with this?
I'm ok with this. I do see it as a temporary measure though. What Ori says here, "inferring that something must have happened", is sort of the whole reason SQL exists. In my opinion, the problem is that these two data sources can't be joined efficiently to do analytics work on them. But since that's a harder problem at the moment, I agree with Ori's solution.
Jon/Arthur, who set up your Event Logging solution and do you need help reviewing / merging this Change? I don't know much about Event Logging but I'm happy to learn and help if you need.
Dan
If Kenan schedules a task we can update the schema to record this for newly created data and given the issues with this it seems like a good idea.
That said we will have a lot of historic data that will still need to be joined and saved as a new table... via a UNION i guess?
On Mon, Dec 2, 2013 at 2:52 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
On Mon, Dec 2, 2013 at 5:45 PM, Kenan Wang kwang@wikimedia.org wrote:
It sounds good to me. Dario, Dan?
On Mon, Dec 2, 2013 at 1:35 PM, Arthur Richards arichards@wikimedia.org wrote:
On Thu, Nov 28, 2013 at 3:17 AM, Ori Livneh ori@wikimedia.org wrote:
It doesn't make sense to do it that way. Instead of inferring that something must have happened by cross-referencing conditions across datasets, just do the following: in MediaWiki, every time a user makes an edit, check their registration date and edit count. If the date is within the last thirty days and the edit count is 5, log an event. Doing it this way will easily scale to the entire cluster, not just enwiki, and to any number of bins, not just 5 edits.
Patch at https://gerrit.wikimedia.org/r/#/c/98079/; you can take it from there if you like.
Thanks Ori - this sounds and looks viable to me, and seems like a better solution. Kenan, Jon, Dario, Dan, et al - can we move forward with this?
I'm ok with this. I do see it as a temporary measure though. What Ori says here, "inferring that something must have happened", is sort of the whole reason SQL exists. In my opinion, the problem is that these two data sources can't be joined efficiently to do analytics work on them. But since that's a harder problem at the moment, I agree with Ori's solution.
Jon/Arthur, who set up your Event Logging solution and do you need help reviewing / merging this Change? I don't know much about Event Logging but I'm happy to learn and help if you need.
Dan
You can backfill the events according to Ori's new logic. Then your query is simple going forward.
On Mon, Dec 2, 2013 at 6:55 PM, Jon Robson jrobson@wikimedia.org wrote:
If Kenan schedules a task we can update the schema to record this for newly created data and given the issues with this it seems like a good idea.
That said we will have a lot of historic data that will still need to be joined and saved as a new table... via a UNION i guess?
On Mon, Dec 2, 2013 at 2:52 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
On Mon, Dec 2, 2013 at 5:45 PM, Kenan Wang kwang@wikimedia.org wrote:
It sounds good to me. Dario, Dan?
On Mon, Dec 2, 2013 at 1:35 PM, Arthur Richards <
arichards@wikimedia.org>
wrote:
On Thu, Nov 28, 2013 at 3:17 AM, Ori Livneh ori@wikimedia.org wrote:
It doesn't make sense to do it that way. Instead of inferring that something must have happened by cross-referencing conditions across datasets, just do the following: in MediaWiki, every time a user
makes an
edit, check their registration date and edit count. If the date is
within
the last thirty days and the edit count is 5, log an event. Doing it
this
way will easily scale to the entire cluster, not just enwiki, and to
any
number of bins, not just 5 edits.
Patch at https://gerrit.wikimedia.org/r/#/c/98079/; you can take it from there if you like.
Thanks Ori - this sounds and looks viable to me, and seems like a
better
solution. Kenan, Jon, Dario, Dan, et al - can we move forward with
this?
I'm ok with this. I do see it as a temporary measure though. What Ori
says
here, "inferring that something must have happened", is sort of the whole reason SQL exists. In my opinion, the problem is that these two data sources can't be joined efficiently to do analytics work on them. But
since
that's a harder problem at the moment, I agree with Ori's solution.
Jon/Arthur, who set up your Event Logging solution and do you need help reviewing / merging this Change? I don't know much about Event Logging
but
I'm happy to learn and help if you need.
Dan
On Mon, Dec 2, 2013 at 4:09 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
You can backfill the events according to Ori's new logic. Then your query is simple going forward.
+1
OK I'll add a task!
On Mon, Dec 2, 2013 at 4:34 PM, Ori Livneh ori@wikimedia.org wrote:
On Mon, Dec 2, 2013 at 4:09 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
You can backfill the events according to Ori's new logic. Then your query is simple going forward.
+1
We're planning to tackle this in our upcoming iteration starting on Monday. Dan, just to clarify - are you able to backfill the data, or will we need to do that?
On Mon, Dec 2, 2013 at 5:58 PM, Kenan Wang kwang@wikimedia.org wrote:
OK I'll add a task!
On Mon, Dec 2, 2013 at 4:34 PM, Ori Livneh ori@wikimedia.org wrote:
On Mon, Dec 2, 2013 at 4:09 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
You can backfill the events according to Ori's new logic. Then your query is simple going forward.
+1
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation
We're planning to tackle this in our upcoming iteration starting on Monday. Dan, just to clarify - are you able to backfill the data, or will we need to do that?
I can help write the script, but I don't think I have write access to the Event Logging table where the results would go. The script shouldn't be the hard part though, testing Ori's patch is what needs to be scrutinized the most here. Let me know if you guys need any help with that too.
to write data into SQL, we can use the prod db on db1047/s1-analytics.
On Dec 4, 2013, at 2:38 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
We're planning to tackle this in our upcoming iteration starting on Monday. Dan, just to clarify - are you able to backfill the data, or will we need to do that?
I can help write the script, but I don't think I have write access to the Event Logging table where the results would go. The script shouldn't be the hard part though, testing Ori's patch is what needs to be scrutinized the most here. Let me know if you guys need any help with that too. _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Wed, Dec 4, 2013 at 2:38 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
We're planning to tackle this in our upcoming iteration starting on
Monday. Dan, just to clarify - are you able to backfill the data, or will we need to do that?
I can help write the script, but I don't think I have write access to the Event Logging table where the results would go.
I can pass along the credentials, in the interest of furthering the EventLogging handover. Ping me whenever.
it doesn’t strike me as a great idea to write non-EL data into the log DB. Can we host these tables into prod or staging instead? You’ll be able to perform UNIONS with tables in log.
On Dec 4, 2013, at 6:27 PM, Ori Livneh ori@wikimedia.org wrote:
On Wed, Dec 4, 2013 at 2:38 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
We're planning to tackle this in our upcoming iteration starting on Monday. Dan, just to clarify - are you able to backfill the data, or will we need to do that?
I can help write the script, but I don't think I have write access to the Event Logging table where the results would go.
I can pass along the credentials, in the interest of furthering the EventLogging handover. Ping me whenever. _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
it doesn’t strike me as a great idea to write non-EL data into the log DB. Can we host these tables into prod or staging instead? You’ll be able to perform UNIONS with tables in log.
I think the plan is to change this EL instance to log the event "this user has now made X edits within their first 30 days", where X is in {1, 5, 10, 25, 50, 100}. That will start happening when this patch: https://gerrit.wikimedia.org/r/#/c/98079/1/WikimediaEvents.php is merged. So my idea is to backfill these milestone events so new the query can just be:
select month, count(*) from EL_table where event = 'NewEditorMilestone' and milestone = 5 group by month
On 12/05/2013 09:49 AM, Dan Andreescu wrote:
I think the plan is to change this EL instance to log the event "this user has now made X edits within their first 30 days", where X is in {1,5,10,25,50,100}. That will start happening when this patch: https://gerrit.wikimedia.org/r/#/c/98079/1/WikimediaEvents.php is merged. So my idea is to backfill these milestone events so new the query can just be:
Yeah, but if it's backfilled, it's no longer an actual EL event. I agree with Dario it's cleaner to have a separate table (holding only the backfilled data), and do a UNION.
Matt Flaschen
On Thu, Dec 5, 2013 at 4:18 PM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 12/05/2013 09:49 AM, Dan Andreescu wrote:
I think the plan is to change this EL instance to log the event "this user has now made X edits within their first 30 days", where X is in {1,5,10,25,50,100}. That will start happening when this patch:
https://gerrit.wikimedia.org/r/#/c/98079/1/WikimediaEvents.php is merged. So my idea is to backfill these milestone events so new the query can just be:
Yeah, but if it's backfilled, it's no longer an actual EL event. I agree with Dario it's cleaner to have a separate table (holding only the backfilled data), and do a UNION.
Matt Flaschen
I guess it's up to the mobile web team, but I disagree on YAGNI grounds. What's an example of a situation in which you'd care that these milestone EL events are "actual" vs. back-filled?
some thoughts:
• All data in the log DB strictly follows https://meta.wikimedia.org/wiki/Schema:EventCapsule. This includes fields such as seqId and uuid that allow recovery of data from the raw JSON dumps. Should something catastrophic happen, we could restore the entire DB by re-importing raw JSON data, which is guaranteed to match the EventCapsule specs. This wouldn’t apply to any custom table created in the DB with data from a different source.
• For the same reason, should a global change apply to EventCapsule (for example https://bugzilla.wikimedia.org/show_bug.cgi?id=52295 ) all tables would need to have their schema updated. Hosting custom tables with arbitrary schemas not matching EventCapsule specs would make global updates unnecessarily complicated.
• Writing of data into the log DB is intentionally restricted to the eventlog user, which was created for the unique purpose to autogenerate tables and write data into SQL when new schemas are deployed in production. Making an exception to this principle sets a precedent whereby humans and other scripts can arbitrarily manipulate data or create tables in the DB, which is a first step towards turning the log db into the same zoo that the staging db is.
On Dec 6, 2013, at 6:39 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
On Thu, Dec 5, 2013 at 4:18 PM, Matthew Flaschen mflaschen@wikimedia.org wrote: On 12/05/2013 09:49 AM, Dan Andreescu wrote: I think the plan is to change this EL instance to log the event "this user has now made X edits within their first 30 days", where X is in {1,5,10,25,50,100}. That will start happening when this patch:
https://gerrit.wikimedia.org/r/#/c/98079/1/WikimediaEvents.php is merged. So my idea is to backfill these milestone events so new the query can just be:
Yeah, but if it's backfilled, it's no longer an actual EL event. I agree with Dario it's cleaner to have a separate table (holding only the backfilled data), and do a UNION.
Matt Flaschen
I guess it's up to the mobile web team, but I disagree on YAGNI grounds. What's an example of a situation in which you'd care that these milestone EL events are "actual" vs. back-filled? _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, Dec 6, 2013 at 10:17 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
some thoughts:
• All data in the log DB strictly follows https://meta.wikimedia.org/wiki/Schema:EventCapsule. This includes fields such as seqId and uuid that allow recovery of data from the raw JSON dumps. Should something catastrophic happen, we could restore the entire DB by re-importing raw JSON data, which is guaranteed to match the EventCapsule specs. This wouldn’t apply to any custom table created in the DB with data from a different source.
• For the same reason, should a global change apply to EventCapsule (for example https://bugzilla.wikimedia.org/show_bug.cgi?id=52295 ) all tables would need to have their schema updated. Hosting custom tables with arbitrary schemas not matching EventCapsule specs would make global updates unnecessarily complicated.
• Writing of data into the log DB is intentionally restricted to the eventlog user, which was created for the unique purpose to autogenerate tables and write data into SQL when new schemas are deployed in production. Making an exception to this principle sets a precedent whereby humans and other scripts can arbitrarily manipulate data or create tables in the DB, which is a first step towards turning the log db into the same zoo that the staging db is.
/me quickly and quietly creeps back into the corner /me came from
That makes sense Dario. Guys, feel free to set this up however you best see fit, and let me know if you need any help.
Ah; my mistake - I was going off the information in the story card for this (https://wikimedia.mingle.thoughtworks.com/projects/mobile/cards/1253) which states that it "doesn't matter when the previous 4 edits were made, doesn't matter where any of the edits were made" - I didn't realize our metric had changed. Apologies for the confusion.
On Thu, Nov 21, 2013 at 3:07 PM, Kenan Wang kwang@wikimedia.org wrote:
Dario and I chatted about this. The original purpose of this dashboard was to be more of an acquisition dashboard. Thus to fit that requirement Dario suggested that we use an activation metric.
i.e. How many users in month X reached 5 edits within 30 days.
Is that possible?
On Wed, Nov 20, 2013 at 12:05 PM, Dan Andreescu dandreescu@wikimedia.orgwrote:
On Wed, Nov 20, 2013 at 11:34 AM, Kenan Wang kwang@wikimedia.org wrote:
That would be good. It is suppose to be in line with the active editor metric.
Wait, so it *is* supposed to be like the active editor metric? To be very clear, we have two possible questions we can be answering here, and only one of them is like the active editor metric. Both of these questions will show results for X <- {each of the last 12 months}
- How many people, of those that created their account on mobile, made
at least 5 edits in month X? [THIS IS LIKE ACTIVE EDITOR]
- How many people created their account on mobile in month X, and made
at least 5 edits since then?
I've implemented both, but there is a pretty big difference so we need to figure that out before moving on.
Dan
--
Kenan Wang Product Manager, Mobile Wikimedia Foundation