Hi everyone,
Tried Wikimetric today and it looks like a good start to me. Some feedback: * Google/Twitter account, should be something WMF like the labs/Gerrit LDAP * Should use https by default * O wait, invalid certificate, filed bug at https://bugzilla.wikimedia.org/show_bug.cgi?id=53892 * Only English? It should be multilingual like all our software. The people at translatewiki will be happy to translate for you * Upload csv user lists is not very convenient. Are you planning to come up with a easier/better system? * Project "en" is a bit weird. You're probably using <project>wiki_p for the database. Can you add a link to available projects? Or how to construct it? Say for example I want the German Wikivoyage. * Description seems to be missing for some fields at http://metrics.wmflabs.org/metrics/ * You could probably grab namespaces on the fly from the Mediawiki api * Can you add an option to give output per time period (month would be nice)? * Can you add bytes uploaded as a metric? * Can you split out the result per namespace? * http://metrics.wmflabs.org/support contains a to the empty page http://www.mediawiki.org/wiki/Wikimetrics/FAQ . Can you make that link https by default? * Where is the code? Can we submit new metrics? See for example http://toolserver.org/~reports/?wiki=nl.wikipedia.org for a similar service * Are you planning to offer some visual output besides csv/json? See for example https://toolserver.org/~emijrp/wlm/stats.php * I see you have sql queries. What tables are available? All (non-private) tables like on the Toolserver and Toollabs? * Do you have some metrics on the usage of wikimetrics? :-)
Maarten
Hey Maarten, thanks for the feedback! replies inside.
On Sat, Sep 7, 2013 at 8:14 AM, Maarten Dammers maarten@mdammers.nl wrote:
Hi everyone,
Tried Wikimetric today and it looks like a good start to me. Some feedback:
- Google/Twitter account, should be something WMF like the labs/Gerrit LDAP
Yes, we will migrate to Mediawiki OAuth once it's stable.
- Should use https by default
Agree
- O wait, invalid certificate, filed bug at https://bugzilla.wikimedia.**
org/show_bug.cgi?id=53892https://bugzilla.wikimedia.org/show_bug.cgi?id=53892
Yes, because WMF has not yet figured out a policy for SSL certificates in Labs.
- Only English? It should be multilingual like all our software. The
people at translatewiki will be happy to translate for you
Yes, but rather wait with that until we have reached a stable version 1.0 but tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1143
- Upload csv user lists is not very convenient. Are you planning to come
up with a easier/better system?
What is not convenient?
- Project "en" is a bit weird. You're probably using <project>wiki_p for
the database. Can you add a link to available projects? Or how to construct it? Say for example I want the German Wikivoyage.
Yes, some explanation on how to construct it would be useful.
- Description seems to be missing for some fields at
http://metrics.wmflabs.org/**metrics/http://metrics.wmflabs.org/metrics/
Which fields in particular?
- You could probably grab namespaces on the fly from the Mediawiki api
Sure, but why? we do have a validation step that should verify whether the report you want to run is valid.
- Can you add an option to give output per time period (month would be
nice)?
We are working on roll-up of results, should be released shortly.
- Can you add bytes uploaded as a metric?
Created https://mingle.corp.wikimedia.org/projects/analytics/cards/1141; I can't promise this because there are more urgent metrics that we would like to implement first.
- Can you split out the result per namespace?
Tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1142
- http://metrics.wmflabs.org/**supporthttp://metrics.wmflabs.org/supportcontains a to the empty page
http://www.mediawiki.org/wiki/**Wikimetrics/FAQhttp://www.mediawiki.org/wiki/Wikimetrics/FAQ. Can you make that link https by default?
Sure, and please help us in creating the FAQ :)
- Where is the code? Can we submit new metrics? See for example
http://toolserver.org/~**reports/?wiki=nl.wikipedia.orghttp://toolserver.org/~reports/?wiki=nl.wikipedia.orgfor a similar service
Code is available at https://git.wikimedia.org/log/analytics%2Fwikimetrics/HEAD and https://github.com/wikimedia/analytics-wikimetrics
- Are you planning to offer some visual output besides csv/json? See for
example https://toolserver.org/~**emijrp/wlm/stats.phphttps://toolserver.org/~emijrp/wlm/stats.php
I would love to see integration with Limn but we have not yet made commitments to do so. The output format should be general enough to make it easy to use for a visualizations.
- I see you have sql queries. What tables are available? All (non-private)
tables like on the Toolserver and Toollabs?
We are querying the labsdb databases, so all tables in those databases are available.
- Do you have some metrics on the usage of wikimetrics? :-)
Not yet :(
Maarten
D
______________________________**_________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics
Hi,
On Sat, Sep 7, 2013 at 3:14 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
- Upload csv user lists is not very convenient. Are you planning to come
up with a easier/better system? What is not convenient?
I haven't played with the tool yet, but I would imagine that in certain situations (e.g. the cohort is small, or the data is not being exported from a database, or transcribed from a physical piece of paper), having a text input field on the create cohort page that allowed a list of usernames to be pasted in (e.g. from a wiki signup page) would be much more convenient than creating a csv file. (I imagine that once unified usernames really work, perhaps the requirement to add the project name to the end of each line could be eliminated and instead the project to produce metrics for could be selected on the "Create Analysis Report" page.)
Best regards, Bence
Bence Damokos, 07/09/2013 15:31:
Hi,
On Sat, Sep 7, 2013 at 3:14 PM, Diederik van Liere <dvanliere@wikimedia.org mailto:dvanliere@wikimedia.org> wrote:
* Upload csv user lists is not very convenient. Are you planning to come up with a easier/better system? What is not convenient?
I haven't played with the tool yet, but I would imagine that in certain situations (e.g. the cohort is small, or the data is not being exported from a database, or transcribed from a physical piece of paper), having a text input field on the create cohort page that allowed a list of usernames to be pasted in (e.g. from a wiki signup page) would be much more convenient than creating a csv file. (I imagine that once unified usernames really work, perhaps the requirement to add the project name to the end of each line could be eliminated and instead the project to produce metrics for could be selected on the "Create Analysis Report" page.)
By the way, does CSV here mean any CSV or is it picky about what kind of newlines, separators, field quoting and encoding you can use? CSV is a rather poorly defined format, it's quick and dirty but easy to mess it up.
Nemo
newlines: both Windows & Unix style separators: comma encoding: always utf-8 field quoting: not supported IIRC.
You think people won't mess up XML? D
On Sat, Sep 7, 2013 at 9:53 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Bence Damokos, 07/09/2013 15:31:
Hi,
On Sat, Sep 7, 2013 at 3:14 PM, Diederik van Liere <dvanliere@wikimedia.org <mailto:dvanliere@wikimedia.**orgdvanliere@wikimedia.org>> wrote:
* Upload csv user lists is not very convenient. Are you planning to come up with a easier/better system? What is not convenient?
I haven't played with the tool yet, but I would imagine that in certain situations (e.g. the cohort is small, or the data is not being exported from a database, or transcribed from a physical piece of paper), having a text input field on the create cohort page that allowed a list of usernames to be pasted in (e.g. from a wiki signup page) would be much more convenient than creating a csv file. (I imagine that once unified usernames really work, perhaps the requirement to add the project name to the end of each line could be eliminated and instead the project to produce metrics for could be selected on the "Create Analysis Report" page.)
By the way, does CSV here mean any CSV or is it picky about what kind of newlines, separators, field quoting and encoding you can use? CSV is a rather poorly defined format, it's quick and dirty but easy to mess it up.
Nemo
______________________________**_________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics
Diederik van Liere, 07/09/2013 16:51:
newlines: both Windows & Unix style separators: comma encoding: always utf-8 field quoting: not supported IIRC.
You think people won't mess up XML?
I don't dare suggesting what format could be better, I just had past experiences make my internal bells ring at the word "CSV" and wondered if the correct format is documented. Thanks for the clarification.
Nemo
On Sat, Sep 7, 2013 at 5:13 PM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Diederik van Liere, 07/09/2013 16:51:
You think people won't mess up XML?
I don't dare suggesting what format could be better, I just had past experiences make my internal bells ring at the word "CSV" and wondered if the correct format is documented. Thanks for the clarification.
It would depend on who the audience is. If it is for people with little research experience and with little experience in computer programming, then yes, csv is a must. It is compatible with open office and Microsoft Excel. It allows for a certain type of user to interact with it. I do not believe xml renders as nicely in either program as csv. Using a more complex file format makes it difficult for that audience to use it. I would assume that other users would be less likely to need a tool that gets this data because they could build their own an customize the output to their specific needs. Thus, their concerns seem like they should be secondary.
My problem now is the output names are complete garbage, and I cannot tell what the heck the file is from the random string generated.
On Sat, Sep 7, 2013 at 11:33 AM, Laura Hale laura@fanhistory.com wrote:
On Sat, Sep 7, 2013 at 5:13 PM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Diederik van Liere, 07/09/2013 16:51:
You think people won't mess up XML?
I don't dare suggesting what format could be better, I just had past experiences make my internal bells ring at the word "CSV" and wondered if the correct format is documented. Thanks for the clarification.
It would depend on who the audience is. If it is for people with little research experience and with little experience in computer programming, then yes, csv is a must. It is compatible with open office and Microsoft Excel. It allows for a certain type of user to interact with it. I do not believe xml renders as nicely in either program as csv. Using a more complex file format makes it difficult for that audience to use it. I would assume that other users would be less likely to need a tool that gets this data because they could build their own an customize the output to their specific needs. Thus, their concerns seem like they should be secondary.
My problem now is the output names are complete garbage, and I cannot tell what the heck the file is from the random string generated.
Can you describe your problem in more detail? Perhaps email me the cohort that you are trying to upload as well. D
-- twitter: purplepopple blog: ozziesport.com
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Sat, Sep 7, 2013 at 5:37 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
On Sat, Sep 7, 2013 at 11:33 AM, Laura Hale laura@fanhistory.com wrote:
On Sat, Sep 7, 2013 at 5:13 PM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Diederik van Liere, 07/09/2013 16:51:
You think people won't mess up XML?
I don't dare suggesting what format could be better, I just had past experiences make my internal bells ring at the word "CSV" and wondered if the correct format is documented. Thanks for the clarification.
It would depend on who the audience is. If it is for people with little research experience and with little experience in computer programming, then yes, csv is a must. It is compatible with open office and Microsoft Excel. It allows for a certain type of user to interact with it. I do not believe xml renders as nicely in either program as csv. Using a more complex file format makes it difficult for that audience to use it. I would assume that other users would be less likely to need a tool that gets this data because they could build their own an customize the output to their specific needs. Thus, their concerns seem like they should be secondary.
My problem now is the output names are complete garbage, and I cannot tell what the heck the file is from the random string generated.
Can you describe your problem in more detail? Perhaps email me the cohort that you are trying to upload as well.
I have no problem specifically other than the file output names. If you want to give meaning to 1113d4e4-ca39-457d-af97-52338e7388e4.csv so that just looking at the output, I have some idea what it means, that would be great. What does it contain compared to 8e59ef55-826f-4d87-830c-629bd67c83cb.csv ? Also 0eb786c7-ff4e-46bc-8489-0ca1c1c383cf.csv ? I have four cohorts uploaded. I have no idea what those files are based on file name.
But I would prefer csv as the output because I think for the intended audience who would derive the most benefit from this, it is the best file format. That is, people who have little experience in doing research and little experience in computer programming who suddenly need to produce metric data to produce reports to justify funding from the FDC, IEG and other WMF grant programs. If there is a different intended audience, then this needs to be made more explicitly clear so people doing work with that particular cohort of users can steer people away from this tool and towards one more suitable for their needs.
Hi everyone! Great to see people interested, and thank you very much for your feedback :)
* Upload csv user lists is not very convenient. Are you planning to
come up with a easier/better system? What is not convenient?
I haven't played with the tool yet, but I would imagine that in certain situations (e.g. the cohort is small, or the data is not being exported from a database, or transcribed from a physical piece of paper), having a text input field on the create cohort page that allowed a list of usernames to be pasted in (e.g. from a wiki signup page) would be much more convenient than creating a csv file. (I imagine that once unified usernames really work, perhaps the requirement to add the project name to the end of each line could be eliminated and instead the project to produce metrics for could be selected on the "Create Analysis Report" page.)
By the way, does CSV here mean any CSV or is it picky about what kind of newlines, separators, field quoting and encoding you can use? CSV is a rather poorly defined format, it's quick and dirty but easy to mess it up.
Diederik is right, and he remembers correctly, no field quoting support. I've put the majority of the effort here in sanitizing user names (trimming unicode whitespace, handling commas and special characters in usernames, etc.). The reason I haven't messed with the format is because I agree, we will make a cohort creation page where you can enter cohorts one at a time. However, this isn't as useful for some people as directly linking to lists of users involved in, say, a class from the Education Program Extension. So we might focus on that first.
And YES! on the visualization :) Limn requires like 6 lines of code to handle the JSON format from wikimetrics, and I'll get to that as soon as we can.
For anyone interested in contributing code, chat me up over email or in IRC and we can get started. I've made a start at writing up a tutorial here: https://github.com/wikimedia/analytics-wikimetrics#development-environment, let me know how I can improve it or just submit a patch :)
Dan
Hi Diederik,
Op 7-9-2013 15:14, Diederik van Liere schreef:
Hey Maarten, thanks for the feedback! replies inside.
You're welcome. I hope this helps to improve the tooling.
On Sat, Sep 7, 2013 at 8:14 AM, Maarten Dammers <maarten@mdammers.nl mailto:maarten@mdammers.nl> wrote:
Hi everyone, Tried Wikimetric today and it looks like a good start to me. Some feedback: * Google/Twitter account, should be something WMF like the labs/Gerrit LDAP
Yes, we will migrate to Mediawiki OAuth once it's stable.
* Should use https by default
Agree
* O wait, invalid certificate, filed bug at https://bugzilla.wikimedia.org/show_bug.cgi?id=53892
Yes, because WMF has not yet figured out a policy for SSL certificates in Labs.
Where is the ball in this case?
* Only English? It should be multilingual like all our software. The people at translatewiki will be happy to translate for you
Yes, but rather wait with that until we have reached a stable version 1.0 but tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1143
This is the reply I've seen with many many projects before you. Always something like "yes, we're going to add it, but first.....". Just add it and don't worry about maybe people doing one or two extra translations because you changed some part of the code.
* Upload csv user lists is not very convenient. Are you planning to come up with a easier/better system?
What is not convenient?
I have to compile a file in some ancient ill documented format. What I would like to have: Automatically generated cohorts. Let's take me on the English Wikipedia (https://en.wikipedia.org/wiki/User:Multichill). * I want to have a cohort for all users who are in a certain category (for example https://en.wikipedia.org/wiki/Category:Wikimedia_Commons_administrators) * I want to have a cohort for all users who use a certain template (for example https://en.wikipedia.org/w/index.php?title=Special%3AWhatLinksHere&targe...) * I want to have a cohort for all users who are linked from a certain page (for example https://en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Register_of_His...) * I want to have a cohort for all users who have edited a certain page (for example https://en.wikipedia.org/w/index.php?title=History_of_Arsenal_F.C._%281886%E...) These are simple queries, you could probably make up some more. You could also go more complex by combining aspects, for example users who edited a page in a certain category. These queries might explode. From a privacy point of view I might not even see what users are part of the cohort, just the end results.
* Project "en" is a bit weird. You're probably using <project>wiki_p for the database. Can you add a link to available projects? Or how to construct it? Say for example I want the German Wikivoyage.
Yes, some explanation on how to construct it would be useful.
Please expand at https://www.mediawiki.org/wiki/Wikimetrics/FAQ#What_is_the_project_code.3F
* Description seems to be missing for some fields at http://metrics.wmflabs.org/metrics/
Which fields in particular?
The fields where description is empty? So that's Start Date, End Date, Positive Only Sum, Negative Only Sum, Absolute Sum and Net Sum
* You could probably grab namespaces on the fly from the Mediawiki api
Sure, but why? we do have a validation step that should verify whether the report you want to run is valid.
Because most people wouldn't know the namespace numbers and would have to look them up. Do you know the id of the campaign namespace on Commons?
* Can you add an option to give output per time period (month would be nice)?
We are working on roll-up of results, should be released shortly.
* Can you add bytes uploaded as a metric?
Created https://mingle.corp.wikimedia.org/projects/analytics/cards/1141; I can't promise this because there are more urgent metrics that we would like to implement first.
* Can you split out the result per namespace?
Tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1142
* http://metrics.wmflabs.org/support contains a to the empty page http://www.mediawiki.org/wiki/Wikimetrics/FAQ . Can you make that link https by default?
Sure, and please help us in creating the FAQ :)
Created the page.
* Where is the code? Can we submit new metrics? See for example http://toolserver.org/~reports/?wiki=nl.wikipedia.org <http://toolserver.org/%7Ereports/?wiki=nl.wikipedia.org> for a similar service
Code is available at https://git.wikimedia.org/log/analytics%2Fwikimetrics/HEAD and https://github.com/wikimedia/analytics-wikimetrics
* Are you planning to offer some visual output besides csv/json? See for example https://toolserver.org/~emijrp/wlm/stats.php <https://toolserver.org/%7Eemijrp/wlm/stats.php>
I would love to see integration with Limn but we have not yet made commitments to do so. The output format should be general enough to make it easy to use for a visualizations.
* I see you have sql queries. What tables are available? All (non-private) tables like on the Toolserver and Toollabs?
We are querying the labsdb databases, so all tables in those databases are available.
* Do you have some metrics on the usage of wikimetrics? :-)
Not yet :(
Ooooh, the shame ;-)
Op 7-9-2013 16:57, Dan Andreescu schreef:
For anyone interested in contributing code, chat me up over email or in IRC and we can get started. I've made a start at writing up a tutorial here: https://github.com/wikimedia/analytics-wikimetrics#development-environment, let me know how I can improve it or just submit a patch :)
+1 on making documentation, -2 for the location. Documentation should *always* be on https://www.mediawiki.org/ .
Maarten
Maarten,
the use case of "generated cohorts" you refer to (cohorts created incrementally by running queries against the DB) is a high priority for data analysis at WMF too. We need to have automatically generated/updated cohorts for classes of users like "all mobile registrations", "commons uploaders", "VE first-time editors" etc.
In UserMetrics we used to call these cohorts "dynamic cohorts" (as opposed to fixed-membership or "static cohorts"). We haven't sorted out yet how these are going to be implemented in Wikimetrics, but this is definitely on our radar. A related idea that we're currently discussing is to expose Wikimetrics functionality via APIs that would allow authenticated clients to programmatically generate reports for arbitrarily defined list of users.
Dario
On Sep 7, 2013, at 8:51 AM, Maarten Dammers maarten@mdammers.nl wrote:
Hi Diederik,
Op 7-9-2013 15:14, Diederik van Liere schreef:
Hey Maarten, thanks for the feedback! replies inside.
You're welcome. I hope this helps to improve the tooling.
On Sat, Sep 7, 2013 at 8:14 AM, Maarten Dammers maarten@mdammers.nl wrote: Hi everyone,
Tried Wikimetric today and it looks like a good start to me. Some feedback:
- Google/Twitter account, should be something WMF like the labs/Gerrit LDAP
Yes, we will migrate to Mediawiki OAuth once it's stable.
- Should use https by default
Agree
- O wait, invalid certificate, filed bug at https://bugzilla.wikimedia.org/show_bug.cgi?id=53892
Yes, because WMF has not yet figured out a policy for SSL certificates in Labs.
Where is the ball in this case?
- Only English? It should be multilingual like all our software. The people at translatewiki will be happy to translate for you
Yes, but rather wait with that until we have reached a stable version 1.0 but tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1143
This is the reply I've seen with many many projects before you. Always something like "yes, we're going to add it, but first.....". Just add it and don't worry about maybe people doing one or two extra translations because you changed some part of the code.
- Upload csv user lists is not very convenient. Are you planning to come up with a easier/better system?
What is not convenient?
I have to compile a file in some ancient ill documented format. What I would like to have: Automatically generated cohorts. Let's take me on the English Wikipedia (https://en.wikipedia.org/wiki/User:Multichill).
- I want to have a cohort for all users who are in a certain category (for example https://en.wikipedia.org/wiki/Category:Wikimedia_Commons_administrators)
- I want to have a cohort for all users who use a certain template (for example https://en.wikipedia.org/w/index.php?title=Special%3AWhatLinksHere&targe...)
- I want to have a cohort for all users who are linked from a certain page (for example https://en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Register_of_His...)
- I want to have a cohort for all users who have edited a certain page (for example https://en.wikipedia.org/w/index.php?title=History_of_Arsenal_F.C._%281886%E...)
These are simple queries, you could probably make up some more. You could also go more complex by combining aspects, for example users who edited a page in a certain category. These queries might explode. From a privacy point of view I might not even see what users are part of the cohort, just the end results.
- Project "en" is a bit weird. You're probably using <project>wiki_p for the database. Can you add a link to available projects? Or how to construct it? Say for example I want the German Wikivoyage.
Yes, some explanation on how to construct it would be useful.
Please expand at https://www.mediawiki.org/wiki/Wikimetrics/FAQ#What_is_the_project_code.3F
- Description seems to be missing for some fields at http://metrics.wmflabs.org/metrics/
Which fields in particular?
The fields where description is empty? So that's Start Date, End Date, Positive Only Sum, Negative Only Sum, Absolute Sum and Net Sum
- You could probably grab namespaces on the fly from the Mediawiki api
Sure, but why? we do have a validation step that should verify whether the report you want to run is valid.
Because most people wouldn't know the namespace numbers and would have to look them up. Do you know the id of the campaign namespace on Commons?
- Can you add an option to give output per time period (month would be nice)?
We are working on roll-up of results, should be released shortly.
- Can you add bytes uploaded as a metric?
Created https://mingle.corp.wikimedia.org/projects/analytics/cards/1141; I can't promise this because there are more urgent metrics that we would like to implement first.
- Can you split out the result per namespace?
Tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1142
- http://metrics.wmflabs.org/support contains a to the empty page http://www.mediawiki.org/wiki/Wikimetrics/FAQ . Can you make that link https by default?
Sure, and please help us in creating the FAQ :)
Created the page.
- Where is the code? Can we submit new metrics? See for example http://toolserver.org/~reports/?wiki=nl.wikipedia.org for a similar service
Code is available at https://git.wikimedia.org/log/analytics%2Fwikimetrics/HEAD and https://github.com/wikimedia/analytics-wikimetrics
- Are you planning to offer some visual output besides csv/json? See for example https://toolserver.org/~emijrp/wlm/stats.php
I would love to see integration with Limn but we have not yet made commitments to do so. The output format should be general enough to make it easy to use for a visualizations.
- I see you have sql queries. What tables are available? All (non-private) tables like on the Toolserver and Toollabs?
We are querying the labsdb databases, so all tables in those databases are available.
- Do you have some metrics on the usage of wikimetrics? :-)
Not yet :(
Ooooh, the shame ;-)
Op 7-9-2013 16:57, Dan Andreescu schreef:
For anyone interested in contributing code, chat me up over email or in IRC and we can get started. I've made a start at writing up a tutorial here: https://github.com/wikimedia/analytics-wikimetrics#development-environment, let me know how I can improve it or just submit a patch :)
+1 on making documentation, -2 for the location. Documentation should *always* be on https://www.mediawiki.org/ .
Maarten
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
We actually do have an idea on how to implement this in Wikimetrics :) a proposal is drafted at https://mingle.corp.wikimedia.org/projects/analytics/cards/814 D
On Sat, Sep 7, 2013 at 12:33 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Maarten,
the use case of "generated cohorts" you refer to (cohorts created incrementally by running queries against the DB) is a high priority for data analysis at WMF too. We need to have automatically generated/updated cohorts for classes of users like "all mobile registrations", "commons uploaders", "VE first-time editors" etc.
In UserMetrics we used to call these cohorts "dynamic cohorts" (as opposed to fixed-membership or "static cohorts"). We haven't sorted out yet how these are going to be implemented in Wikimetrics, but this is definitely on our radar. A related idea that we're currently discussing is to expose Wikimetrics functionality via APIs that would allow authenticated clients to programmatically generate reports for arbitrarily defined list of users.
Dario
On Sep 7, 2013, at 8:51 AM, Maarten Dammers maarten@mdammers.nl wrote:
Hi Diederik,
Op 7-9-2013 15:14, Diederik van Liere schreef:
Hey Maarten, thanks for the feedback! replies inside.
You're welcome. I hope this helps to improve the tooling.
On Sat, Sep 7, 2013 at 8:14 AM, Maarten Dammers maarten@mdammers.nlwrote:
Hi everyone,
Tried Wikimetric today and it looks like a good start to me. Some feedback:
- Google/Twitter account, should be something WMF like the labs/Gerrit
LDAP
Yes, we will migrate to Mediawiki OAuth once it's stable.
- Should use https by default
Agree
- O wait, invalid certificate, filed bug at
Yes, because WMF has not yet figured out a policy for SSL certificates in Labs.
Where is the ball in this case?
- Only English? It should be multilingual like all our software. The
people at translatewiki will be happy to translate for you
Yes, but rather wait with that until we have reached a stable version 1.0 but tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1143
This is the reply I've seen with many many projects before you. Always something like "yes, we're going to add it, but first.....". Just add it and don't worry about maybe people doing one or two extra translations because you changed some part of the code.
- Upload csv user lists is not very convenient. Are you planning to
come up with a easier/better system?
What is not convenient?
I have to compile a file in some ancient ill documented format. What I would like to have: Automatically generated cohorts. Let's take me on the English Wikipedia (https://en.wikipedia.org/wiki/User:Multichill).
- I want to have a cohort for all users who are in a certain category (for
example https://en.wikipedia.org/wiki/Category:Wikimedia_Commons_administrators)
- I want to have a cohort for all users who use a certain template (for
example https://en.wikipedia.org/w/index.php?title=Special%3AWhatLinksHere&targe... )
- I want to have a cohort for all users who are linked from a certain page
(for example https://en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Register_of_His... )
- I want to have a cohort for all users who have edited a certain page
(for example https://en.wikipedia.org/w/index.php?title=History_of_Arsenal_F.C._%281886%E... ) These are simple queries, you could probably make up some more. You could also go more complex by combining aspects, for example users who edited a page in a certain category. These queries might explode. From a privacy point of view I might not even see what users are part of the cohort, just the end results.
- Project "en" is a bit weird. You're probably using <project>wiki_p
for the database. Can you add a link to available projects? Or how to construct it? Say for example I want the German Wikivoyage.
Yes, some explanation on how to construct it would be useful.
Please expand at https://www.mediawiki.org/wiki/Wikimetrics/FAQ#What_is_the_project_code.3F
- Description seems to be missing for some fields at
Which fields in particular?
The fields where description is empty? So that's Start Date, End Date, Positive Only Sum, Negative Only Sum, Absolute Sum and Net Sum
- You could probably grab namespaces on the fly from the Mediawiki api
Sure, but why? we do have a validation step that should verify whether the report you want to run is valid.
Because most people wouldn't know the namespace numbers and would have to look them up. Do you know the id of the campaign namespace on Commons?
- Can you add an option to give output per time period (month would be
nice)?
We are working on roll-up of results, should be released shortly.
- Can you add bytes uploaded as a metric?
Created https://mingle.corp.wikimedia.org/projects/analytics/cards/1141; I can't promise this because there are more urgent metrics that we would like to implement first.
- Can you split out the result per namespace?
Tracking at https://mingle.corp.wikimedia.org/projects/analytics/cards/1142
- http://metrics.wmflabs.org/support contains a to the empty page
http://www.mediawiki.org/wiki/Wikimetrics/FAQ . Can you make that link https by default?
Sure, and please help us in creating the FAQ :)
Created the page.
- Where is the code? Can we submit new metrics? See for example
http://toolserver.org/~reports/?wiki=nl.wikipedia.org for a similar service
Code is available at https://git.wikimedia.org/log/analytics%2Fwikimetrics/HEAD and https://github.com/wikimedia/analytics-wikimetrics
- Are you planning to offer some visual output besides csv/json? See for
I would love to see integration with Limn but we have not yet made commitments to do so. The output format should be general enough to make it easy to use for a visualizations.
- I see you have sql queries. What tables are available? All
(non-private) tables like on the Toolserver and Toollabs?
We are querying the labsdb databases, so all tables in those databases are available.
- Do you have some metrics on the usage of wikimetrics? :-)
Not yet :(
Ooooh, the shame ;-)
Op 7-9-2013 16:57, Dan Andreescu schreef:
For anyone interested in contributing code, chat me up over email or in IRC and we can get started. I've made a start at writing up a tutorial here: https://github.com/wikimedia/analytics-wikimetrics#development-environment, let me know how I can improve it or just submit a patch :)
+1 on making documentation, -2 for the location. Documentation should *always* be on https://www.mediawiki.org/ .
Maarten
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics