If we don’t hear any objections by Dec 30th, we will move forward with the plan to no
longer generate this data.
On Dec 11, 2015, at 12:40, Andrew Otto
<aotto(a)wikimedia.org> wrote:
Hi all,
Soon, we will be merging the mobile web cache requests with the text cache requests.
text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the webrequest table in Hive
will soon be empty, and all data that was previously in it will be found in the
webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the webrequest_source=‘mobile’
partition:
- /a/log/webrequest/archive/mobile
- /a/log/webrequest/archive/5xx-mobile
- /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year they have been
generated from Hadoop. With the upcoming cache merge, these jobs will have to parse
through all text requests, which will make Hadoop busier.
Do we know if these are being used? Would anyone be upset if we no longer generated
these datasets?
Thanks!
-Andrew
[1]
https://phabricator.wikimedia.org/T109286
<https://phabricator.wikimedia.org/T109286>