>POST requests are more tricky, I suppose.
FYI that we do not have post data neither responses to either get or post requests, we just store urls and http codes for both get and post. Thus the body of the post is also not available.

>However, I don't think we are logging the contents of responses at all. I
>suppose that would have to be build into BlazeGraph somehow. 
You can instrument code to report responses into the cluster just like the search team does it, depending how easy is to fit the instrumenting code that can be little or a lot of work. The mediawiki API is also doing similar "custom" reporting. 


James:
I think before asking for a time estimate we would need more detail in your end as to what metrics are you interested in measuring. If you could describe your project in meta that would be best. Just in case you might not be familiar with meta this is an example of how research projects are described: https://meta.wikimedia.org/wiki/Research:HTTPS_Transition_and_Article_Censorship



On Fri, Jul 1, 2016 at 1:33 AM, Daniel Kinzler <daniel.kinzler@wikimedia.de> wrote:
Am 01.07.2016 um 01:42 schrieb Nuria Ruiz:
> Is this data always requested via http from an api endpoint that will hit a
> varnish cache? (Daniel can probably answer this)

Yes. Special:EntityData is a regular special page, and action=wbgetentities is a
regular MW web API request, as your example shows.

> If the data you are interested in can be inferred from these requests there is
> no additional data gathering needed.

Yay!

> Nor does it tell us how
>     often statements/RDF triples show up in the Wikidata Query Service.

I'm no expert on the query service, adding Stas for that. As far as I know,
SPARQL queries go through Varnish directly to BlazeGraph. In any case, they are
not processed by MediaWiki at all. Tracking how often an entity is mentioned in
a GET request to the SPARQL service should be possible based on the varnish
request logs, with a bit of regex magic. POST requests are more tricky, I suppose.

However, I don't think we are logging the contents of responses at all. I
suppose that would have to be build into BlazeGraph somehow. And even if we did
that, that would only tell use which entities were present in a result, not
which entities were used to answer a query. E.g. if you list all instances of a
class (including subclasses), the entities representing the classes are
essential to answering the query, but they are not present in the result (and
only the top-most class is present in the query).


--
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.