Hi all,
Soon, we will be merging the mobile web cache requests with the text cache requests. text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the webrequest table in Hive will soon be empty, and all data that was previously in it will be found in the webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the webrequest_source=‘mobile’ partition:
- /a/log/webrequest/archive/mobile - /a/log/webrequest/archive/5xx-mobile - /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year they have been generated from Hadoop. With the upcoming cache merge, these jobs will have to parse through all text requests, which will make Hadoop busier.
Do we know if these are being used? Would anyone be upset if we no longer generated these datasets?
Thanks! -Andrew
[1] https://phabricator.wikimedia.org/T109286 https://phabricator.wikimedia.org/T109286
Not an answer to the question, but a question of my own; will the nature of the content being served still be present as /some/ field? FWIW I've found it very helpful to be able to use webrequest_source to trivially distinguish mobile and desktop requests.
On 11 December 2015 at 12:40, Andrew Otto aotto@wikimedia.org wrote:
Hi all,
Soon, we will be merging the mobile web cache requests with the text cache requests. text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the webrequest table in Hive will soon be empty, and all data that was previously in it will be found in the webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the webrequest_source=‘mobile’ partition:
- /a/log/webrequest/archive/mobile
- /a/log/webrequest/archive/5xx-mobile
- /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year they have been generated from Hadoop. With the upcoming cache merge, these jobs will have to parse through all text requests, which will make Hadoop busier.
Do we know if these are being used? Would anyone be upset if we no longer generated these datasets?
Thanks! -Andrew
[1] https://phabricator.wikimedia.org/T109286
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
@Oliver: I think the closest we'll have is the access-method field, that can take values desktop, mobile-web, mobile-app.
On Sun, Dec 13, 2015 at 8:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not an answer to the question, but a question of my own; will the nature of the content being served still be present as /some/ field? FWIW I've found it very helpful to be able to use webrequest_source to trivially distinguish mobile and desktop requests.
On 11 December 2015 at 12:40, Andrew Otto aotto@wikimedia.org wrote:
Hi all,
Soon, we will be merging the mobile web cache requests with the text
cache
requests. text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the
webrequest
table in Hive will soon be empty, and all data that was previously in it will be found in the webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the webrequest_source=‘mobile’ partition:
- /a/log/webrequest/archive/mobile
- /a/log/webrequest/archive/5xx-mobile
- /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year
they
have been generated from Hadoop. With the upcoming cache merge, these
jobs
will have to parse through all text requests, which will make Hadoop
busier.
Do we know if these are being used? Would anyone be upset if we no
longer
generated these datasets?
Thanks! -Andrew
[1] https://phabricator.wikimedia.org/T109286
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Gotcha! Long as it's set for every request, perfect :)
On 14 December 2015 at 04:50, Joseph Allemandou jallemandou@wikimedia.org wrote:
@Oliver: I think the closest we'll have is the access-method field, that can take values desktop, mobile-web, mobile-app.
On Sun, Dec 13, 2015 at 8:37 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Not an answer to the question, but a question of my own; will the nature of the content being served still be present as /some/ field? FWIW I've found it very helpful to be able to use webrequest_source to trivially distinguish mobile and desktop requests.
On 11 December 2015 at 12:40, Andrew Otto aotto@wikimedia.org wrote:
Hi all,
Soon, we will be merging the mobile web cache requests with the text cache requests. text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the webrequest table in Hive will soon be empty, and all data that was previously in it will be found in the webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the webrequest_source=‘mobile’ partition:
- /a/log/webrequest/archive/mobile
- /a/log/webrequest/archive/5xx-mobile
- /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year they have been generated from Hadoop. With the upcoming cache merge, these jobs will have to parse through all text requests, which will make Hadoop busier.
Do we know if these are being used? Would anyone be upset if we no longer generated these datasets?
Thanks! -Andrew
[1] https://phabricator.wikimedia.org/T109286
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Joseph Allemandou Data Engineer @ Wikimedia Foundation IRC: joal
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
If we don’t hear any objections by Dec 30th, we will move forward with the plan to no longer generate this data.
On Dec 11, 2015, at 12:40, Andrew Otto aotto@wikimedia.org wrote:
Hi all,
Soon, we will be merging the mobile web cache requests with the text cache requests. text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the webrequest table in Hive will soon be empty, and all data that was previously in it will be found in the webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the webrequest_source=‘mobile’ partition:
- /a/log/webrequest/archive/mobile
- /a/log/webrequest/archive/5xx-mobile
- /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year they have been generated from Hadoop. With the upcoming cache merge, these jobs will have to parse through all text requests, which will make Hadoop busier.
Do we know if these are being used? Would anyone be upset if we no longer generated these datasets?
Thanks! -Andrew
[1] https://phabricator.wikimedia.org/T109286 https://phabricator.wikimedia.org/T109286
Hi all! I haven't heard any objections, so I will be stopping the jobs that generate these datasets today. I won't delete the existing data until further notice
On Mon, Dec 14, 2015 at 1:34 PM, Andrew Otto aotto@wikimedia.org wrote:
If we don’t hear any objections by Dec 30th, we will move forward with the plan to no longer generate this data.
On Dec 11, 2015, at 12:40, Andrew Otto aotto@wikimedia.org wrote:
Hi all,
Soon, we will be merging the mobile web cache requests with the text cache requests. text caches will now serve requests for mobile web[1].
This means that the webrequest_source=‘mobile’ partition in the webrequest table in Hive will soon be empty, and all data that was previously in it will be found in the webrequest_source=‘text’ partition.
There are only 3 datasets that currently only use the webrequest_source=‘mobile’ partition:
- /a/log/webrequest/archive/mobile
- /a/log/webrequest/archive/5xx-mobile
- /a/log/webrequest/archive/zero
(These are paths on stat1002, but they also exist in HDFS.)
These datasets originally came from udp2log, but since early last year they have been generated from Hadoop. With the upcoming cache merge, these jobs will have to parse through all text requests, which will make Hadoop busier.
Do we know if these are being used? Would anyone be upset if we no longer generated these datasets?
Thanks! -Andrew