Sum - count | Data | ||
date | Climate_change --> Global_warming | Global_warming --> Climate_change | Total Result |
2017-11 | 3904 | 950 | 4854 |
2017-12 | 3549 | 780 | 4329 |
2018-01 | 4508 | 1011 | 5519 |
2018-02 | 3548 | 998 | 4546 |
2018-03 | 3462 | 745 | 4207 |
2018-04 | 3726 | 755 | 4481 |
2018-05 | 3730 | 810 | 4540 |
2018-06 | 2971 | 862 | 3833 |
2018-07 | 3500 | 1602 | 5102 |
2018-08 | 4546 | 1644 | 6190 |
2018-09 | 3962 | 1472 | 5434 |
2018-10 | 6155 | 3048 | 9203 |
2018-11 | 5865 | 2617 | 8482 |
2018-12 | 5491 | 2227 | 7718 |
2019-01 | 5774 | 2911 | 8685 |
2019-02 | 6311 | 2845 | 9156 |
2019-03 | 6858 | 2514 | 9372 |
2019-04 | 6824 | 2199 | 9023 |
Sum - count | Data | ||
date | Air_pollution --> Smog | Smog --> Air_pollution | Total Result |
2017-11 | 82 | 263 | 345 |
2017-12 | 200 | 184 | 384 |
2018-01 | 65 | 140 | 205 |
2018-02 | 82 | 98 | 180 |
2018-03 | 418 | 149 | 567 |
2018-04 | 295 | 137 | 432 |
2018-05 | 215 | 95 | 310 |
2018-06 | 245 | 85 | 330 |
2018-07 | 233 | 70 | 303 |
2018-08 | 36 | 62 | 98 |
2018-09 | 45 | 81 | 126 |
2018-10 | 66 | 96 | 162 |
2018-11 | 128 | 135 | 263 |
2018-12 | 50 | 90 | 140 |
2019-01 | 68 | 92 | 160 |
2019-02 | 50 | 68 | 118 |
2019-03 | 49 | 72 | 121 |
2019-04 | 33 | 51 | 84 |
Total Result | 2360 | 1968 | 4328 |
Hi all,
I've got a question on the completeness of the clickstream dataset. I downloaded the dumps for 2018 from https://dumps.wikimedia.org/other/clickstream/ (English Wikipedia only). When I filter for the article pair "Climate change" and "Global warming" (either one being either prev or curr) for all of 2018, this is what I get:
prev curr type n month
<chr> <chr> <chr> <dbl> <chr>
1 Global_warming Climate_change link 755 2018-04
2 Global_warming Climate_change link 810 2018-05
3 Climate_change Global_warming link 3730 2018-05
4 Climate_change Global_warming link 3962 2018-09
5 Climate_change Global_warming link 5865 2018-11
6 Climate_change Global_warming link 5491 2018-12
7 Global_warming Climate_change link 2227 2018-12
The visit numbers seem plausible. But why is there no data on, e.g., January to March? And why is there data for both directions in May and December, but not for the others? This seems implausible given the popularity of the articles.
Here's another example:
prev curr type n month
<chr> <chr> <chr> <dbl> <chr>
1 Smog Air_pollution link 140 2018-01
2 Air_pollution Smog link 82 2018-02
3 Air_pollution Smog link 295 2018-04
4 Air_pollution Smog link 215 2018-05
5 Smog Air_pollution link 85 2018-06
6 Air_pollution Smog link 233 2018-07
7 Air_pollution Smog link 45 2018-09
8 Smog Air_pollution link 96 2018-10
9 Smog Air_pollution link 90 2018-12
Am I missing something here?
Thanks in advance,
Simon
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics