Thanks Dan. I told Toby he should subscribe to this list :)
Regarding #1 another option could be to:
• only allow user_ids as keys (after all most JSON consumers prefer to work with user_ids)
but add user_names as attributes
• return both user_ids and user_names as separate columns in flat CSVs.
Either way, it sounds like this would be a great addition for Wikimetrics customers.
Dario
On Nov 26, 2013, at 5:01 PM, Dan Andreescu <dandreescu(a)wikimedia.org> wrote:
These things have definitely been discussed before, so
it's time to get them prioritized. CC-ed Toby directly so he can follow up:
1. wikimetrics should allow user_name to be the key in report outputs. Right now, only
user_id is allowed and this is not great. LiAnna, Jaime, and Jessie are definitely
interested in this, and have mentioned it a few times.
2. wikimetrics should allow "generated cohorts" as implemented by user metrics
api. These are cohorts defined by reports on other cohorts. For example, if we run
report R on cohort C, then generated cohort (GC) would be: GC = {user | user in C and
R(user) is true}. Dario is definitely interested in this, and Jaime might be as well.
On Tue, Nov 26, 2013 at 6:59 PM, LiAnna Davis <ldavis(a)wikimedia.org> wrote:
I would LOVE it if the output gave user names instead of user IDs. Often the data makes
me want to investigate the individual stories of contributors who added a lot of
content/made a lot of edits/etc., but there's no way of doing that with user IDs since
I can't convert user IDs to usernames.
On Tue, Nov 26, 2013 at 2:46 PM, Dario Taraborelli <dtaraborelli(a)wikimedia.org>
wrote:
thanks for the clarification Jaimee – it sounds like we should consider adding user_names
to the output if this is the main cause of the problem instead of building functionality
at the input to deal with this. Dan, any thoughts?
BTW this notion of rerunning cohort analysis for members of a previous cohort who meet
specific criteria is a use case that Product/Editor Engagement is also interested in. We
used to call these “generated cohorts” in the old design plans for UserMetrics and I’d
love if we revisited this feature requests and its relative priority.
D
On Nov 26, 2013, at 2:35 PM, Jaime Anstee <janstee(a)wikimedia.org> wrote:
Missed the question back to me, sorry. Mixed
cohorts might occur due to the output as user IDs while collection is of usernames - say
someone has a repeating events and has a csv output of data for those new users that were
retained at a certain activity level from Point A to B and then has new cohort members opt
in at Point B but only wants to include those that already survived from Point A and new
at Point B cohort members for examining at another Point C. Without the output of
usernames to create the active Point B cohort separately this would make the Point C
cohort a mix of qualified user ids and new user names. There are several ways of dealing
with this, it was just the first scenario I could think of that could cause this. Seems
we still need to revisit the possibility of accessing usernames as output, also for
reasons of matching to other data points where most users and most program leaders do not
know user ids - Jaime
--
Jaime Anstee, Ph.D
Program Evaluation Specialist
Wikimedia Foundation
+1.415.839.6885 ext 6869
www.wikimediafoundation.org
Imagine a world in which every single human being can freely share in the sum of all
knowledge. Help us make it a reality!
https://donate.wikimedia.org
On Fri, Nov 22, 2013 at 4:04 PM, Dario Taraborelli <dtaraborelli(a)wikimedia.org>
wrote:
that works for me, thanks!
Jaimee – can you give us more details on the use case for mixed cohorts that you had in
mind?
On Nov 22, 2013, at 3:28 PM, Dan Andreescu <dandreescu(a)wikimedia.org> wrote:
So, for now, until I figure out how to fix this, it will always prefer user_names before
user_ids.
I think this is an argument for making users specifying whether it's names or ids up
front, and not allowing mixtures. Assuming it might be a mixture and looking for names
first is almost certain to produce inaccurate results at some point. We have ids precisely
to avoid collisions with names, allowing for renaming users, and other cases.
Yep, I just learned this the hard way and made a fool of myself in front of a bunch of
people I admire. So, I'd be glad if I'm the only one that this happens to. If
nobody objects, I'm going to allow the user to select whether their cohort contains
user_ids OR user_names, and strictly prohibit mixtures.
_______________________________________________
Wikimetrics mailing list
Wikimetrics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
LiAnna Davis
Wikipedia Education Program Communications Manager
Wikimedia Foundation
http://education.wikimedia.org
(415) 839-6885 x6649
ldavis(a)wikimedia.org
_______________________________________________
Wikimetrics mailing list
Wikimetrics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics