Hi Adrian,
webrequest_source is a column by which data is partitioned (in addition to year/month/day/hour columns)
To my knowledge, WDQS-related requests go into the 'misc' partitions in the webrequest https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest table (from which this subset was extracted, while general traffic-related requests like Wikipedia pageviews are in the webrequest_source = 'text' partitions).
- Mikhail
On Mon, May 14, 2018 at 6:07 AM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Thanks for the pointers. From what I can gather (especially wdqs_extract.hql) my next questions are: a) what exactly does "webrequest_source = 'misc'" mean and b) what source table this was extracted from
On 08/05/18 09:22, Leila Zia wrote:
A couple of pointers as Stas was not involved in the details of the extraction.
Adrian: you can dig the history behind the extraction at https://phabricator.wikimedia.org/T146064
Please also check the codes at https://gerrit.wikimedia.org/r/#/c/311964/ for details, specifically wdqs_extract.hql .
Best, Leila
On Mon, May 7, 2018, 18:15 Andrew Otto otto@wikimedia.org wrote:
CCing Stas, he might know more.
On Sun, May 6, 2018 at 9:58 AM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Hello everyone,
I wanted to ask if anyone can tell me what wmf.wdqs_extract contains. I know generally that it is the query log of the SPARQL endpoint. However, I do not know if it is all requests, only uncached requests etc.
If anyone knows or knows where I can read up on it that would be great.
Greetings,
Adrian
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing listAnalytics@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics