POST requests are more tricky, I suppose.
FYI
that we do not have post data neither responses to either get or post
requests, we just store urls and http codes for both get and post. Thus the
body of the post is also not available.
However, I don't think we are logging the contents
of responses at all. I
suppose that would have to be build into BlazeGraph somehow.
You can instrument code
to report responses into the cluster just like the
search team does it, depending how easy is to fit the instrumenting code that
can be little or a lot of work. The mediawiki API is also doing similar
"custom" reporting.
James:
I think before asking for a time estimate we would need more detail in your
end as to what metrics are you interested in measuring. If you could
describe your project in meta that would be best. Just in case you might
not be familiar with meta this is an example of how research projects are
described:
https://meta.wikimedia.org/wiki/Research:HTTPS_Transition_and_Article_Censo…
On Fri, Jul 1, 2016 at 1:33 AM, Daniel Kinzler <daniel.kinzler(a)wikimedia.de>
wrote:
Am 01.07.2016 um 01:42 schrieb Nuria Ruiz:
Is this data always requested via http from an
api endpoint that will
hit a
varnish cache? (Daniel can probably answer this)
Yes. Special:EntityData is a regular special page, and
action=wbgetentities is a
regular MW web API request, as your example shows.
If the data you are interested in can be inferred
from these requests
there is
no additional data gathering needed.
Yay!
Nor does it tell us how
often statements/RDF triples show up in the Wikidata Query Service.
I'm no expert on the query service, adding Stas for that. As far as I know,
SPARQL queries go through Varnish directly to BlazeGraph. In any case,
they are
not processed by MediaWiki at all. Tracking how often an entity is
mentioned in
a GET request to the SPARQL service should be possible based on the varnish
request logs, with a bit of regex magic. POST requests are more tricky, I
suppose.
However, I don't think we are logging the contents of responses at all. I
suppose that would have to be build into BlazeGraph somehow. And even if
we did
that, that would only tell use which entities were present in a result, not
which entities were used to answer a query. E.g. if you list all instances
of a
class (including subclasses), the entities representing the classes are
essential to answering the query, but they are not present in the result
(and
only the top-most class is present in the query).
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.