Dear Jennifer and the Product Analytics Team,

It is really nice that you prepared and updated this dataset! Thank you very much.

My first feedback:
1) I don't understand "overall size rank". Based on the definitions, it is the "count of unique devices which visited the wiki during that month". What does this do with the size of the wiki?
2) I miss the size of the database (main namespace/content pages), which together with the number of content pages would give an impression of the mean size of the articles (I know this is not perfect, but better than nothing).
3) Many more wishes about additional metrics :D

Best regards,
Samat


On Wed, 24 Feb 2021 at 10:55, Goran Milovanovic <goran.milovanovic_ext@wikimedia.de> wrote:
Hi Kate,

and thank you very much for your feedback.

I think I've forgotten to mention how the Wiki comparison 2020 dataset is so great that I will start using it in my R programming language classes as of today to help people learn more about hypothesis testing and join operations across the dataframes : )

Thank you for all the hard work!

> We'll keep an eye toward consistency, but we have not made the data extraction into a fully automated process.
I have seen the code, I know the pain too well... All the work and then in the end there is always an additional detail that was maybe not considered in the beginning, similar things made me cry in the past in my work on Wikidata... I sympathise with you and the team and I wish you all the best in your future work!

And by the way... The differences in column names are really not such a big deal, the variable semantics are so obvious so they match easily. Good work!

With best wishes,
Goran

Goran S. Milovanović, PhD
Data Scientist, Software Department
Wikimedia Deutschland

------------------------------------------------
"It's not the size of the dog in the fight,
it's the size of the fight in the dog."
- Mark Twain
------------------------------------------------



On Tue, Feb 23, 2021 at 11:38 PM Kate Zimmerman <kzimmerman@wikimedia.org> wrote:
Hi Goran,

We'll keep an eye toward consistency, but we have not made the data extraction into a fully automated process.

We identified 3 columns that had slightly different names and we'll fix them:
overall SIZE rank (2020) vs. overall size rank (2018, 2019)
second month editor retention (2020) vs. second-month new editor retention (2018, 2019)
monthly structured discussions messages (2020) vs. monthly structured discussions (Flow) messages (2018, 2019)

The "project code" column was duplicated in 2020; the duplicate has now been removed.

Finally, in 2019 we had added 3 new columns that we hadn't tracked in 2018: content pages, cumulative content edits, edits per content page. Please be aware that we may add or change columns in the future as needs evolve.

Warm regards,
Kate

On Tue, Feb 23, 2021 at 12:37 PM Goran Milovanovic <goran.milovanovic_ext@wikimedia.de> wrote:
Well, it would be desirable to maintain consistent column names across the years...

Best,
Goran

Goran S. Milovanović, PhD
Data Scientist, Software Department
Wikimedia Deutschland

------------------------------------------------
"It's not the size of the dog in the fight,
it's the size of the fight in the dog."
- Mark Twain
------------------------------------------------



On Tue, Feb 23, 2021 at 2:42 AM Jennifer Wang <jwang@wikimedia.org> wrote:
Hi all, 

For your reference we have updated wiki comparison dataset with 2020 dataIf you have any feedback or suggestions, please let us know via product-analytics@wikimedia.org.

Regards,
Jennifer & Product Analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics