Hi,
we are currently bringing the device property, and platform computations back to life outside of Hadoop. Data for the last few days has been computed and the jobs are running.
However, I am not sure about the old data that we have. Should we blend that in?
* For device properties, I found that http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props seem to contain property data for 2013-03-01 until 2013-05-15. Since this data stopped already in mid-May, I assume we have more data to blend in (end of May, June, July) at a different place. Do we have such data? Do we know if the above data is good or it's just a relict from test runs?
* For platform data, I found that http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_p... has platform data from 2013-04-14 until 2013-07-20 in
However, I am not sure which of this data is valid. Naive, uneducated plausibility checks fail badly [1]. Do we know if/which data is good? Do we have a better or other sources for the platform job?
Best regards, Christian
[1] For example when only looking at the last few data points for Android for example Tuesdays we get [2]:
2013-04-16: 6438000 2013-04-23: 6300000 2013-04-30: 6559000 2013-05-06: 7267000 2013-05-13: 6954000 2013-05-27: 33335000 2013-06-04: 14388000 2013-06-11: 8563000 2013-06-18: 10241000 2013-06-25: 6896000 2013-07-09: 3454000 2013-07-16: 7206000
The highest value (33M) is 10 times as high as the lowest (3M)—within only three months. Even when considering those data points outliers (and we have readings that are even further out. Ranging from 1M–37M for Android), the lowest data point is half the highest data point. All on the same weekday! This looks suspicious.
[2] There is no data for 2013-05-20, and 2013-07-02.