Adding analytics@ a public e-mail list where you can post questions such as this one.
that doesn’t tell us how often entities are accessed through
Special:EntityData or wbgetclaims
Does this data already exist, even in the form of raw access logs?
Is this data always requested via http from an api endpoint that will hit a varnish cache? (Daniel can probably answer this)
From what I see on our data we have requests like the following:
www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q633155 www.wikidata.org /w/api.php ?callback=jQuery11130020702992017004984_1465195743367&format=json&action=wbgetclaims&property=P373&entity=Q5296&_=1465195743368 www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q573612 www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q472729 www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q349797 www.wikidata.org /w/api.php ?action=compare&torev=344163911&fromrev=344163907&format=json www.wikidata.org /w/api.php ?action=wbgetentities&format=xml&ids=Q2356135 www.wikidata.org /w/api.php ?action=wbgetentities&format=xml&ids=Q2355988 www.wikidata.org /w/api.php ?action=compare&torev=344164023&fromrev=344163948&format=json
If the data you are interested in can be inferred from these requests there is no additional data gathering needed.
If not, what effort would be required to gather this data? For the
purposes of my proposal to the U.S. Census Bureau I am estimating around
six weeks of effort for this for one person working full-time. If it will
take more time I will need to know. I think I have mentioned this before on an e-mail thread but without knowing the details of what you want to do we cannot give you a time estimate. What are the exact metrics you are interested on? Is the project described anywhere in meta?
Thanks,
Nuria
On Thu, Jun 30, 2016 at 11:45 AM, James Hare james@hxstrategy.com wrote:
Copying Lydia Pintscher and Daniel Kinzler (with whom I’ve discussed this very topic).
I am interested in metrics that describe how Wikidata is used. While we do have views on individual pages, that doesn’t tell us how often entities are accessed through Special:EntityData or wbgetclaims. Nor does it tell us how often statements/RDF triples show up in the Wikidata Query Service. Does this data already exist, even in the form of raw access logs? If not, what effort would be required to gather this data? For the purposes of my proposal to the U.S. Census Bureau I am estimating around six weeks of effort for this for one person working full-time. If it will take more time I will need to know.
Thank you, James Hare
On Thursday, June 2, 2016 at 2:18 PM, Nuria Ruiz wrote:
James:
My current operating assumption is that it would take one person,
working on a full time basis, around six weeks to go from raw access logs
to a functioning API that would provide information on how many times a
Wikidata entity was accessed through the various APIs and the >query service. Do you believe this to be an accurate level of effort estimation based on your experience with past projects of this nature? You are starting from the assumption that we do have the data you are interested in in the logs which I am not sure it is the case, have you done you checks on this regard with wikidata developers?
Analytics 'automagically' collects data from logs about *page* requests, any other requests collections (and it seems that yours fit on this scenario) need to be instrumented. I would send an e-mail to analytics@ public list and wikidata folks to ask about how to harvest the data you are interested in, it doesn't sound like it is being collected at this time so your project scope might be quite a bit bigger than you think.
Thanks,
Nuria
On Thu, Jun 2, 2016 at 5:06 AM, James Hare james@hxstrategy.com wrote:
Hello Nuria,
I am currently developing a proposal for the U.S. Census Bureau to integrate their datasets with Wikidata. As part of this, I am interested in getting Wikidata usage metrics beyond the page view data currently available. My concern is that the page views API gives you information only on how many times a *page* is accessed – but Wikidata is not really used in this way. More often is it the case that Wikidata’s information is accessed through the API endpoints (wbgetclaims etc.), through Special:EntityData, and the Wikidata Query Service. If we have information on usage through those mechanisms, that would give me much better information on Wikidata’s usage.
To the extent these metrics are important to my prospective client, I am willing to provide in-kind support to the analytics team to make this information available, including expenses associated with the NDA process (I understand that such a person may need to deal with raw access logs that include PII.) My current operating assumption is that it would take one person, working on a full time basis, around six weeks to go from raw access logs to a functioning API that would provide information on how many times a Wikidata entity was accessed through the various APIs and the query service. Do you believe this to be an accurate level of effort estimation based on your experience with past projects of this nature?
Please let me know if you have any questions. I am happy to discuss my idea with you further.
Regards, James Hare