Re: [Analytics] Connect to wikidata.org from stat1002.eqiad.wmnet

13 May 2017


      Hi Nuria and Adrian,
I also need to develop some Wikidata related Data Science things for WMDE,
where I've started working as a Data Analyst in March. I work from R
entirely (+SPARQL +SQL +etc). It is not impossible that I will be facing a
problem similar to Adrian's, so I have hoped you wouldn't mind if I hijack
this discussion for a mail or two.
A question for Nuria: I understand that the production machines can be used
to run analyses, as well as that we should avoid doing development there (I
have a Labs instance where the development will be taking place), but I do
not understand your following advise to Adrian: "It seems like you would
benefit from querying a development instance of wikidatata and looking at
development logs to know what to expect." - Nuria, what exactly do you have
in mind when you say "a development instance of Wikidata"?
Also, very important for me: are you implying that no attempts to access
the SPARQL endpoint from production should be made? If yes, why, and what
would be the alternative, suggested route to Wikidata from production? Or -
this is my final attempt at the correct interpretation of your words - do
you want to say that we should use production for statistics *exclusively*,
in a sense that no datasets (except for, say, weblogs and mariaDB replicas
on equiad) shoul be fetched from the production machines (i.e. implying
that we need to collect the data somewhere else, and move to production for
number crunching only)?
Thank you.
Best regards,
Goran Milovanović
Data Analyst, WMDE
On 13 May 2017 22:40, "Nuria Ruiz" nuria@wikimedia.org wrote:
Adrian,
...
At the moment I'm working on checking which entries equal one of the
example queries at https://www.wikidata.org/%3Ewiki/Wikidata:SPARQL_query_serv
ice/queries/examples
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
 using this
https://github.com/Wikidata/QueryAnalysis/blob/master/src/main/java/general/Main.java#L339-L376
 code.
The stats machines are useful to analyze data but we do not use them to do
development. It seems like you would benefit from querying a development
instance of wikidatata and looking at development logs to know what to
expect. We strongly advise against doing development in production, looking
at logs in a development environment would be synchronous so you can get
your answers fast.
Thanks,
Nuria
On Sat, May 13, 2017 at 5:47 PM, Addshore addshorewiki@gmail.com wrote:
...
You should be able to connect to query.wikidata.org via the webproxy.
https://wikitech.wikimedia.org/wiki/HTTP_proxy
On Sat, 13 May 2017 at 15:23 Adrian Bielefeldt <
Adrian.Bielefeldt@mailbox.tu-dresden.de> wrote:
...
Hello Nuri,
I'm working on a project
https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
analyzing the wikidata SPARQL-queries. We extract specific fields (e.g.
uri_query, hour) from wmf.wdqs_extract, parse the queries with a java
program using open_rdf as the parser and then analyze it for different
metrics like variable count, which entities are being used and so on.
At the moment I'm working on checking which entries equal one of the
example queries at https://www.wikidata.org/wi
ki/Wikidata:SPARQL_query_service/queries/examples using this
https://github.com/Wikidata/QueryAnalysis/blob/master/src/main/java/general/Main.java#L339-L376
code. Unfortunately the program cannot connect to the website, so I'm
assuming I have to create an exception for this request or ask for it to be
created.
Greetings,
Adrian
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Connect to wikidata.org from stat1002.eqiad.wmnet