I reviewed the closest data to what I am looking for, phabricator
T128132, from https://analytics.wikimedia.org/datasets/archive/publi
and the *webrequest* datasets : https://wikitech.wikimedia.org/wik
i/Analytics/Data_Lake/Traffic/Webrequest. I still have a few questions.
1. Is `hashed_host_path' (in the cache dataset) the `hostname' or ` uri_host
'? Phabricator T128132 shows the two fields. However, the available data
only shows ` hashed_host_path'.
2. There are 6 fields - hashed_host_path, uri_query,
content_type, response_size, time_firstbyte, and x_cache - in the caching
dataset, as shown in the attachment screen snapshot.
Does the caching dataset not include page_id? The *webrequest* dataset
seems to contain page_id.
3. I didn't find the sequence field in the caching dataset. I learned that
sequence replaces time stamp. Is ` sequence' the file name of downloads in
the caching dataset?
4. Does `dt' (in the *webrequest* dataset) mean a timestamp with ISO 8601
<https://en.wikipedia.org/wiki/en:ISO_8601> format ? Probably, the
*webrequest* dataset might be what I am looking for, if it can provide
access traces per-second.
5. According the the descriptions in the *webrequest* webpage, the
*datasets should contain at least `hostname', `page_id', and `dt'. If true,
the *webrequest *datasets seem to cover most of my requirements. Is
there any download link available for the *webrequest *datasets ?