Forwarding to the analytics list for reference.
---------- Forwarded message --------- From: Ho Chung chungho4865@gmail.com Date: Mon, Mar 15, 2021 at 11:45 AM Subject: Re: [Analytics] About: refine_webrequest.hql To: Joseph Allemandou jallemandou@wikimedia.org
Hello
Thanks for your reply
Because i was research your Analytics team public discuss history and wikiteah about web request time stamp
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest
https://phabricator.wikimedia.org/T212529
I have been in doubt at that time, you're used java technology, but your HIVE version did not support java before October 2018.
The wmf.webrequest file is located in HIVE.
When collecting the privacy data of readership , whether the time stamp used the reader's computer system clock instead of the Wikipedia computer server clock when reading and browsing the page
Now I am more clear. On the public discussion page of your analysis team, said that all the time is utc by Ottomata
It’s just that you technicians don’t want to unify the expression of the time stamp format, but in fact all of them use UTC
在 2021年3月15日週一 16:14,Joseph Allemandou jallemandou@wikimedia.org 寫道:
Hi, the `dt` field is the time in UTC (no timezone specified) at which the request ends being processed by Varnish. Cheers Joseph
On Mon, Mar 15, 2021 at 8:36 AM Luca Toscano ltoscano@wikimedia.org wrote:
+A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. analytics@lists.wikimedia.org
Hi!
I added the Analytics mailing list in Cc so other people can chime in, this is the canonical way to follow up with us and the community, please avoid direct email if possible :)
Thanks!
Luca
On Sat, Mar 13, 2021 at 10:57 PM Ho Chung chungho4865@gmail.com wrote:
Hello
I have some problem request , about refine_webrequest.hql
In this file timestamp is use utc ?
This file is it connect wmf_raw.webrequest and wmf.webrequest ?
Because i can't read the code have add Z / +/- zone time
-- Hack to get a correct timestamp because of hive inconsistent conversion
CAST(unix_timestamp(dt, "yyyy-MM-dd'T'HH:mm:ss") * 1.0 as timestamp) as ts,
https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest...
I emailed wiki legal request 3 month they not sure , can you clearly ask me .
If not use utc, is use your server clock or , my computer clock?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Joseph Allemandou (joal) (he / him) Staff Data Engineer Wikimedia Foundation