Hi all! I haven't heard any objections, so I will be stopping the jobs
that generate these datasets today. I won't delete the existing data until
further notice
On Mon, Dec 14, 2015 at 1:34 PM, Andrew Otto <aotto(a)wikimedia.org> wrote:
If we don’t hear any objections by Dec 30th, we will
move forward with the
plan to no longer generate this data.
On Dec 11, 2015, at 12:40, Andrew Otto <aotto(a)wikimedia.org> wrote:
Hi all,
Soon, we will be merging the mobile web cache requests with the text cache
requests. text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the webrequest
table in Hive will soon be empty, and all data that was previously in it
will be found in the webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the
webrequest_source=‘mobile’ partition:
- /a/log/webrequest/archive/mobile
- /a/log/webrequest/archive/5xx-mobile
- /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year
they have been generated from Hadoop. With the upcoming cache merge, these
jobs will have to parse through all text requests, which will make Hadoop
busier.
Do we know if these are being used? Would anyone be upset if we no longer
generated these datasets?
Thanks!
-Andrew
[1]
https://phabricator.wikimedia.org/T109286