Hi Kate,and thank you very much for your feedback.I think I've forgotten to mention how the Wiki comparison 2020 dataset is so great that I will start using it in my R programming language classes as of today to help people learn more about hypothesis testing and join operations across the dataframes : )Thank you for all the hard work!> We'll keep an eye toward consistency, but we have not made the data extraction into a fully automated process.I have seen the code, I know the pain too well... All the work and then in the end there is always an additional detail that was maybe not considered in the beginning, similar things made me cry in the past in my work on Wikidata... I sympathise with you and the team and I wish you all the best in your future work!And by the way... The differences in column names are really not such a big deal, the variable semantics are so obvious so they match easily. Good work!With best wishes,GoranWikimedia DeutschlandGoran S. Milovanović, PhDData Scientist, Software Department
------------------------------------------------
"It's not the size of the dog in the fight,
it's the size of the fight in the dog."
- Mark Twain
------------------------------------------------_______________________________________________On Tue, Feb 23, 2021 at 11:38 PM Kate Zimmerman <kzimmerman@wikimedia.org> wrote:Hi Goran,We'll keep an eye toward consistency, but we have not made the data extraction into a fully automated process.We identified 3 columns that had slightly different names and we'll fix them:overall SIZE rank (2020) vs. overall size rank (2018, 2019)second month editor retention (2020) vs. second-month new editor retention (2018, 2019)monthly structured discussions messages (2020) vs. monthly structured discussions (Flow) messages (2018, 2019)The "project code" column was duplicated in 2020; the duplicate has now been removed.Finally, in 2019 we had added 3 new columns that we hadn't tracked in 2018: content pages, cumulative content edits, edits per content page. Please be aware that we may add or change columns in the future as needs evolve.Warm regards,
KateOn Tue, Feb 23, 2021 at 12:37 PM Goran Milovanovic <goran.milovanovic_ext@wikimedia.de> wrote:Well, it would be desirable to maintain consistent column names across the years...Best,
GoranWikimedia DeutschlandGoran S. Milovanović, PhDData Scientist, Software Department
------------------------------------------------
"It's not the size of the dog in the fight,
it's the size of the fight in the dog."
- Mark Twain
------------------------------------------------On Tue, Feb 23, 2021 at 2:42 AM Jennifer Wang <jwang@wikimedia.org> wrote:_______________________________________________Hi all,For your reference we have updated wiki comparison dataset with 2020 data. If you have any feedback or suggestions, please let us know via product-analytics@wikimedia.org.Regards,Jennifer & Product Analytics
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics