Hi, Nuria:
I reviewed the closest data to what I am looking for, phabricator T128132, from https://analytics.wikimedia.org/ datasets/archive/public- datasets/analytics/caching/
and the webrequest datasets : https://wikitech.wikimedia.org/wik . I still have a few questions.i/Analytics/Data_Lake/Traffic/ Webrequest
1. Is `hashed_host_path' (in the cache dataset) the `hostname' or ` uri_host '? Phabricator T128132 shows the two fields. However, the available data only shows ` hashed_host_path'.2. There are 6 fields - hashed_host_path, uri_query, content_type, response_size, time_firstbyte, and x_cache - in the caching dataset, as shown in the attachment screen snapshot. Does the caching dataset not include page_id? The webrequest dataset seems to contain page_id.3. I didn't find the sequence field in the caching dataset. I learned that sequence replaces time stamp. Is ` sequence' the file name of downloads in the caching dataset?4. Does `dt' (in the webrequest dataset) mean a timestamp with ISO 8601 format ? Probably, the webrequest dataset might be what I am looking for, if it can provide access traces per-second.5. According the the descriptions in the webrequest webpage, the webrequest datasets should contain at least `hostname', `page_id', and `dt'. If true, the webrequest datasets seem to cover most of my requirements. Is there any download link available for the webrequest datasets ? --Sincerely,TA-YUAN
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics