Hi,
we are currently bringing the device property, and platform computations back to life outside of Hadoop. Data for the last few days has been computed and the jobs are running.
However, I am not sure about the old data that we have. Should we blend that in?
* For device properties, I found that http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props seem to contain property data for 2013-03-01 until 2013-05-15. Since this data stopped already in mid-May, I assume we have more data to blend in (end of May, June, July) at a different place. Do we have such data? Do we know if the above data is good or it's just a relict from test runs?
* For platform data, I found that http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_p... has platform data from 2013-04-14 until 2013-07-20 in
However, I am not sure which of this data is valid. Naive, uneducated plausibility checks fail badly [1]. Do we know if/which data is good? Do we have a better or other sources for the platform job?
Best regards, Christian
[1] For example when only looking at the last few data points for Android for example Tuesdays we get [2]:
2013-04-16: 6438000 2013-04-23: 6300000 2013-04-30: 6559000 2013-05-06: 7267000 2013-05-13: 6954000 2013-05-27: 33335000 2013-06-04: 14388000 2013-06-11: 8563000 2013-06-18: 10241000 2013-06-25: 6896000 2013-07-09: 3454000 2013-07-16: 7206000
The highest value (33M) is 10 times as high as the lowest (3M)—within only three months. Even when considering those data points outliers (and we have readings that are even further out. Ranging from 1M–37M for Android), the lowest data point is half the highest data point. All on the same weekday! This looks suspicious.
[2] There is no data for 2013-05-20, and 2013-07-02.
Yeah, I think let's keep it around and if people need it we can try to vet it more thoroughly. I think Tomasz only cares about the ratio between the different apps and not so much about the total numbers. So let's just go with the new data and make sure that's accurate. On Aug 21, 2013 6:13 AM, "Christian Aistleitner" christian@quelltextlich.at wrote:
Hi,
we are currently bringing the device property, and platform computations back to life outside of Hadoop. Data for the last few days has been computed and the jobs are running.
However, I am not sure about the old data that we have. Should we blend that in?
- For device properties, I found that http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props
seem to contain property data for 2013-03-01 until 2013-05-15. Since this data stopped already in mid-May, I assume we have more data to blend in (end of May, June, July) at a different place. Do we have such data? Do we know if the above data is good or it's just a relict from test runs?
- For platform data, I found that
http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_p... has platform data from 2013-04-14 until 2013-07-20 in
However, I am not sure which of this data is valid. Naive, uneducated plausibility checks fail badly [1]. Do we know if/which data is good? Do we have a better or other sources for the platform job?
Best regards, Christian
[1] For example when only looking at the last few data points for Android for example Tuesdays we get [2]:
2013-04-16: 6438000 2013-04-23: 6300000 2013-04-30: 6559000 2013-05-06: 7267000 2013-05-13: 6954000 2013-05-27: 33335000 2013-06-04: 14388000 2013-06-11: 8563000 2013-06-18: 10241000 2013-06-25: 6896000 2013-07-09: 3454000 2013-07-16: 7206000
The highest value (33M) is 10 times as high as the lowest (3M)—within only three months. Even when considering those data points outliers (and we have readings that are even further out. Ranging from 1M–37M for Android), the lowest data point is half the highest data point. All on the same weekday! This looks suspicious.
[2] There is no data for 2013-05-20, and 2013-07-02.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi,
On Wed, Aug 21, 2013 at 06:05:01AM -0700, Dan Andreescu wrote:
Yeah, I think let's keep it around and if people need it we can try to vet it more thoroughly.
Ok. Thanks. I blended the old data in, and warned against the different sources in the directory's README, So we should get best of both worlds.
For those who are interested, the total data is available at: http://stat1001.wikimedia.org/public-datasets/analytics/mobile/
But the data still contains some anomalies that need further discussion. So the data from Mid-July onwards might still change a bit.
Best regards, Christian