I believe in this case data is being crunched, in hadoop, which is where the WDQS access logs are. And I think the page in question that Adrian wanted to load was https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples, at a guess he is looking at how often these example queries are requested via the service.
On Mon, 15 May 2017 at 00:22 Nuria Ruiz nuria@wikimedia.org wrote:
(i.e. implying that we need to collect the data somewhere else, and move
to production for number crunching only)? I think we should probably set up a sync up so you get an overview of how this works cause this is a brief response. Data is harvested in some production machines, it is processed (in different production machines) and moved to stats machines (also production but a sheltered environment). We do not use stats machines to harvest data. They just provide access to it and are sized so you can process and crunch data, this talk explains a bit how does this all works: https://www.youtube.com/watch?v=tx1pagZOsiM
We might be talking pass each other here, if so, a meeting might help.
Nuria, what exactly do you have in mind when you say "a development
instance of Wikidata"? If you need to look at a wikidata query and see what it shows on the logs when you query x or y, that step should be done on a (wikidata) *test environment* that logs the http requests for your queries as received by the server. So you can "test" your queries agains a server and see how those are received.
Thanks,
Nuria
On Sun, May 14, 2017 at 1:10 PM, Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Hi Addshore, thanks for the advice, I can now connect.
Greetings,
Adrian
On 05/13/2017 05:47 PM, Addshore wrote:
You should be able to connect to query.wikidata.org via the webproxy.
https://wikitech.wikimedia.org/wiki/HTTP_proxy
On Sat, 13 May 2017 at 15:23 Adrian Bielefeldt < Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
Hello Nuri,
I'm working on a project https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries analyzing the wikidata SPARQL-queries. We extract specific fields (e.g. uri_query, hour) from wmf.wdqs_extract, parse the queries with a java program using open_rdf as the parser and then analyze it for different metrics like variable count, which entities are being used and so on.
At the moment I'm working on checking which entries equal one of the example queries at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples using this https://github.com/Wikidata/QueryAnalysis/blob/master/src/main/java/general/Main.java#L339-L376 code. Unfortunately the program cannot connect to the website, so I'm assuming I have to create an exception for this request or ask for it to be created.
Greetings,
Adrian _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing listAnalytics@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics