Re: [Analytics] Connect to wikidata.org from stat1002.eqiad.wmnet

15 May 2017

I believe in this case data is being crunched, in hadoop, which is where
the WDQS access logs are.
And I think the page in question that Adrian wanted to load was
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples,
at a guess he is looking at how often these example queries are requested
via the service.

On Mon, 15 May 2017 at 00:22 Nuria Ruiz &lt;nuria(a)wikimedia.org&gt; wrote:

...
  (i.e. implying
that we need to collect the data somewhere else, and move  to production for number
crunching only)?
 I think we should probably set up a sync up so you get an overview of how
 this works cause this is a brief response. Data is harvested in some
 production machines, it is processed (in different production machines) and
 moved to stats machines (also production but a sheltered environment). We
 do not use stats machines to harvest data. They just provide access to it
 and are sized so you can process and crunch data, this talk explains a bit
 how does this all works: https://www.youtube.com/watch?v=tx1pagZOsiM

 We might be talking pass each other here, if so, a meeting might help.

 Nuria, what exactly do you have in mind when you
say "a development  instance of Wikidata"?
 If you need to look at a wikidata query and see what it shows on the logs
 when you  query x or y, that step should be done on a (wikidata) *test
 environment* that logs the http requests for your queries as received by
 the server. So you can "test" your queries agains a server and see how
 those are received.

 Thanks,

 Nuria

 On Sun, May 14, 2017 at 1:10 PM, Adrian Bielefeldt <
 Adrian.Bielefeldt(a)mailbox.tu-dresden.de&gt; wrote:

  Hi Addshore,
 thanks for the advice, I can now connect.

 Greetings,

 Adrian

 On 05/13/2017 05:47 PM, Addshore wrote:

 You should be able to connect to query.wikidata.org via the webproxy.

 https://wikitech.wikimedia.org/wiki/HTTP_proxy

 On Sat, 13 May 2017 at 15:23 Adrian Bielefeldt <
 Adrian.Bielefeldt(a)mailbox.tu-dresden.de&gt; wrote:

  Hello Nuri,

 I'm working on a project
 <https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>
 analyzing the wikidata SPARQL-queries. We extract specific fields (e.g.
 uri_query, hour) from wmf.wdqs_extract, parse the queries with a java
 program using open_rdf as the parser and then analyze it for different
 metrics like variable count, which entities are being used and so on.

 At the moment I'm working on checking which entries equal one of the
 example queries at
 https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
 using this

<https://github.com/Wikidata/QueryAnalysis/blob/master/src/main/java/general/Main.java#L339-L376>
 code. Unfortunately the program cannot connect to the website, so I'm
 assuming I have to create an exception for this request or ask for it to be
 created.

 Greetings,

 Adrian
 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 _______________________________________________
 Analytics mailing
listAnalytics@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/analytics

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

  _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Connect to wikidata.org from stat1002.eqiad.wmnet