Hello everyone,
I wanted to ask if anyone can tell me what wmf.wdqs_extract contains. I know generally that it is the query log of the SPARQL endpoint. However, I do not know if it is all requests, only uncached requests etc.
If anyone knows or knows where I can read up on it that would be great.
Greetings,
Adrian
CCing Stas, he might know more.
On Sun, May 6, 2018 at 9:58 AM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Hello everyone,
I wanted to ask if anyone can tell me what wmf.wdqs_extract contains. I know generally that it is the query log of the SPARQL endpoint. However, I do not know if it is all requests, only uncached requests etc.
If anyone knows or knows where I can read up on it that would be great.
Greetings,
Adrian
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
A couple of pointers as Stas was not involved in the details of the extraction.
Adrian: you can dig the history behind the extraction at https://phabricator.wikimedia.org/T146064
Please also check the codes at https://gerrit.wikimedia.org/r/#/c/311964/ for details, specifically wdqs_extract.hql .
Best, Leila
On Mon, May 7, 2018, 18:15 Andrew Otto otto@wikimedia.org wrote:
CCing Stas, he might know more.
On Sun, May 6, 2018 at 9:58 AM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Hello everyone,
I wanted to ask if anyone can tell me what wmf.wdqs_extract contains. I know generally that it is the query log of the SPARQL endpoint. However, I do not know if it is all requests, only uncached requests etc.
If anyone knows or knows where I can read up on it that would be great.
Greetings,
Adrian
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Adrian:
Please note that this table might disappear soon as the reserach it was created for has finished. Also, we will be rolling out (hopefully) next quarter similar tables that split our large dataset into smaller ones. That work is still WIP.
Thanks,
Nuria
On Tue, May 8, 2018 at 12:22 AM, Leila Zia leila@wikimedia.org wrote:
A couple of pointers as Stas was not involved in the details of the extraction.
Adrian: you can dig the history behind the extraction at https://phabricator.wikimedia.org/T146064
Please also check the codes at https://gerrit.wikimedia.org/r/#/c/311964/ for details, specifically wdqs_extract.hql .
Best, Leila
On Mon, May 7, 2018, 18:15 Andrew Otto otto@wikimedia.org wrote:
CCing Stas, he might know more.
On Sun, May 6, 2018 at 9:58 AM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Hello everyone,
I wanted to ask if anyone can tell me what wmf.wdqs_extract contains. I know generally that it is the query log of the SPARQL endpoint. However, I do not know if it is all requests, only uncached requests etc.
If anyone knows or knows where I can read up on it that would be great.
Greetings,
Adrian
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks for the pointers. From what I can gather (especially wdqs_extract.hql) my next questions are: a) what exactly does "webrequest_source = 'misc'" mean and b) what source table this was extracted from
On 08/05/18 09:22, Leila Zia wrote:
A couple of pointers as Stas was not involved in the details of the extraction.
Adrian: you can dig the history behind the extraction at https://phabricator.wikimedia.org/T146064
Please also check the codes at https://gerrit.wikimedia.org/r/#/c/311964/ for details, specifically wdqs_extract.hql .
Best, Leila
On Mon, May 7, 2018, 18:15 Andrew Otto <otto@wikimedia.org mailto:otto@wikimedia.org> wrote:
CCing Stas, he might know more. On Sun, May 6, 2018 at 9:58 AM, Adrian Bielefeldt <Adrian.Bielefeldt@mailbox.tu-dresden.de <mailto:Adrian.Bielefeldt@mailbox.tu-dresden.de>> wrote: Hello everyone, I wanted to ask if anyone can tell me what wmf.wdqs_extract contains. I know generally that it is the query log of the SPARQL endpoint. However, I do not know if it is all requests, only uncached requests etc. If anyone knows or knows where I can read up on it that would be great. Greetings, Adrian _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Adrian,
webrequest_source is a column by which data is partitioned (in addition to year/month/day/hour columns)
To my knowledge, WDQS-related requests go into the 'misc' partitions in the webrequest https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest table (from which this subset was extracted, while general traffic-related requests like Wikipedia pageviews are in the webrequest_source = 'text' partitions).
- Mikhail
On Mon, May 14, 2018 at 6:07 AM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Thanks for the pointers. From what I can gather (especially wdqs_extract.hql) my next questions are: a) what exactly does "webrequest_source = 'misc'" mean and b) what source table this was extracted from
On 08/05/18 09:22, Leila Zia wrote:
A couple of pointers as Stas was not involved in the details of the extraction.
Adrian: you can dig the history behind the extraction at https://phabricator.wikimedia.org/T146064
Please also check the codes at https://gerrit.wikimedia.org/r/#/c/311964/ for details, specifically wdqs_extract.hql .
Best, Leila
On Mon, May 7, 2018, 18:15 Andrew Otto otto@wikimedia.org wrote:
CCing Stas, he might know more.
On Sun, May 6, 2018 at 9:58 AM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Hello everyone,
I wanted to ask if anyone can tell me what wmf.wdqs_extract contains. I know generally that it is the query log of the SPARQL endpoint. However, I do not know if it is all requests, only uncached requests etc.
If anyone knows or knows where I can read up on it that would be great.
Greetings,
Adrian
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing listAnalytics@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics