Adding analytics@ a public e-mail list where you can post questions such
as this one.
that doesn’t tell us how often entities are accessed
through
Special:EntityData or wbgetclaims
Does this data already exist, even in the form of raw
access logs?
Is this data always requested via http from an api endpoint that will
hit a
varnish cache? (Daniel can probably answer this)
From what I see on our data we have requests like the
following:
www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q633155
www.wikidata.org /w/api.php
?callback=jQuery11130020702992017004984_1465195743367&format=json&action=wbgetclaims&property=P373&entity=Q5296&_=1465195743368
www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q573612
www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q472729
www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q349797
www.wikidata.org /w/api.php
?action=compare&torev=344163911&fromrev=344163907&format=json
www.wikidata.org /w/api.php ?action=wbgetentities&format=xml&ids=Q2356135
www.wikidata.org /w/api.php ?action=wbgetentities&format=xml&ids=Q2355988
www.wikidata.org /w/api.php
?action=compare&torev=344164023&fromrev=344163948&format=json
If the data you are interested in can be inferred from these requests there
is no additional data gathering needed.
If not, what effort would be required to gather this
data? For the
purposes of my proposal to the U.S. Census Bureau I am estimating
around
six weeks of effort for this for one person working
full-time. If it will
take more time I will need to know.
I think I have mentioned this before on an e-mail thread but without
knowing the details of what you want to do we cannot give you a time
estimate. What are the exact metrics you are interested on? Is the
project described anywhere in meta?
Thanks,
Nuria
On Thu, Jun 30, 2016 at 11:45 AM, James Hare <james(a)hxstrategy.com> wrote:
Copying Lydia Pintscher and Daniel Kinzler (with whom
I’ve discussed this
very topic).
I am interested in metrics that describe how Wikidata is used. While we do
have views on individual pages, that doesn’t tell us how often entities are
accessed through Special:EntityData or wbgetclaims. Nor does it tell us how
often statements/RDF triples show up in the Wikidata Query Service. Does
this data already exist, even in the form of raw access logs? If not, what
effort would be required to gather this data? For the purposes of my
proposal to the U.S. Census Bureau I am estimating around six weeks of
effort for this for one person working full-time. If it will take more time
I will need to know.
Thank you,
James Hare
On Thursday, June 2, 2016 at 2:18 PM, Nuria Ruiz wrote:
James:
My current operating assumption is that it would
take one person,
working on a full time basis, around six weeks to go from raw
access logs
to a functioning API that would provide
information on how many times a
Wikidata entity was accessed through the various
APIs and the >query
service. Do you believe this to be an accurate level of effort estimation
based on your experience with past projects of this nature?
You are starting from the assumption that we do have the data you are
interested in in the logs which I am not sure it is the case, have you done
you checks on this regard with wikidata developers?
Analytics 'automagically' collects data from logs about *page* requests,
any other requests collections (and it seems that yours fit on this
scenario) need to be instrumented. I would send an e-mail to analytics@
public list and wikidata folks to ask about how to harvest the data you are
interested in, it doesn't sound like it is being collected at this time so
your project scope might be quite a bit bigger than you think.
Thanks,
Nuria
On Thu, Jun 2, 2016 at 5:06 AM, James Hare <james(a)hxstrategy.com> wrote:
Hello Nuria,
I am currently developing a proposal for the U.S. Census Bureau to
integrate their datasets with Wikidata. As part of this, I am interested in
getting Wikidata usage metrics beyond the page view data currently
available. My concern is that the page views API gives you information only
on how many times a *page* is accessed – but Wikidata is not really used
in this way. More often is it the case that Wikidata’s information is
accessed through the API endpoints (wbgetclaims etc.), through
Special:EntityData, and the Wikidata Query Service. If we have information
on usage through those mechanisms, that would give me much better
information on Wikidata’s usage.
To the extent these metrics are important to my prospective client, I am
willing to provide in-kind support to the analytics team to make this
information available, including expenses associated with the NDA process
(I understand that such a person may need to deal with raw access logs that
include PII.) My current operating assumption is that it would take one
person, working on a full time basis, around six weeks to go from raw
access logs to a functioning API that would provide information on how many
times a Wikidata entity was accessed through the various APIs and the query
service. Do you believe this to be an accurate level of effort estimation
based on your experience with past projects of this nature?
Please let me know if you have any questions. I am happy to discuss my
idea with you further.
Regards,
James Hare