Dear Wikimetrics users,
I've just deployed asynchronous cohort upload. This is feature #818:
https://mingle.corp.wikimedia.org/projects/analytics/cards/818 and basically allows you to upload larger cohorts because validation is happening behind the scenes. I'll go over how the new functionality works here, and will rely on one of you to point me to the appropriate on-wiki place to update documentation.
So basically, visiting /cohorts and clicking "Upload Cohort" works as before. But once you click "Upload CSV", your form is validated, processed, and you're taken back to the cohorts page. Your new cohort is immediately created but is not yet validated. While it validates, you'll see the validation status and have a few options:
* Remove Cohort. This is destructive and will remove this cohort from your list. Use this in case you made a mistake, uploaded the wrong file, etc.
* Validate Again. This will run validation again. One possible use for it is, let's say you upload a cohort with some *very* newly registered users. And because of replication lag to the labsdb databases, most of them come up invalid. You can then run validation again.
* Refresh. This just refreshes the status of the validation and will update the counts that show up below.
You will not have the "Create Report" option until validation is done. And when you do create a report, only valid users will be considered and used in the output.
One caveat. Validation is still slow. And the time limit for the asynchronous task is set to 1 hour. I have some ideas for making this faster by batching, and I can increase the time limit per task (but that has other repercussions). For now, just keep in mind that the theoretical maximum cohort size you should upload is roughly 18,000 users. I would love some feedback about whether it's ok to increase the time limit or if people want me to focus on making validation faster.
Dan
_______________________________________________