Is there any download link available for the
*webrequest *datasets ?
No, sorry, there is no download of webrequest data nor is it
kept long
term.
As I mentioned before the best dataset that might fit your needs is this
one:
https://analytics.wikimedia.org/datasets/archive/public-
datasets/analytics/caching/ which is a different dataset than webrequest
and does not include the same fields, just a subset.
On Wed, Apr 18, 2018 at 8:25 AM, Ta-Yuan Hsu <thsu4(a)uic.edu> wrote:
> Hi, Nuria:
>
> I reviewed the closest data to what I am looking for, phabricator
> T128132, from
https://analytics.wikimedia.org/datasets/archive/public-
> datasets/analytics/caching/
> and the *webrequest* datasets :
https://wikitech.wikimedia.org/wik
> i/Analytics/Data_Lake/Traffic/Webrequest. I still have a few questions.
>
> 1. Is `hashed_host_path' (in the cache dataset) the `hostname' or `
> uri_host '? Phabricator T128132 shows the two fields. However, the
> available data only shows ` hashed_host_path'.
>
> 2. There are 6 fields - hashed_host_path, uri_query,
> content_type, response_size, time_firstbyte, and x_cache - in the caching
> dataset, as shown in the attachment screen snapshot.
> Does the caching dataset not include page_id? The *webrequest* dataset
> seems to contain page_id.
> 3. I didn't find the sequence field in the caching dataset. I learned
> that sequence replaces time stamp. Is ` sequence' the file name of
> downloads in the caching dataset?
> 4. Does `dt' (in the *webrequest* dataset) mean a timestamp with ISO
> 8601 <https://en.wikipedia.org/wiki/en:ISO_8601> format ? Probably, the
> *webrequest* dataset might be what I am looking for, if it can provide
> access traces per-second.
>
> 5. According the the descriptions in the *webrequest* webpage, the *webrequest
> *datasets should contain at least `hostname', `page_id', and `dt'. If
> true, the *webrequest *datasets seem to cover most of my requirements.
Is there any download link available for the
*webrequest *datasets ?
>
> --
> Sincerely,
> TA-YUAN
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>