Hi,

Thanks very much for the responses. They were very insightful. 

Daniel, I have a follow up question with using “Wikidata entities used in this page” found at, for example: https://en.wikipedia.org/w/index.php?title=South_Pole_Telescope&action=info
  1. In the “Wikidata entities used in this page” section, are the entities used dependent on, for example, the logic of the templates through which they are referenced? If entities are listed in this section, are they for sure always coming from Wikidata?
  2. Sometimes “other (statements)” is specified in the “Wikidata entities used in this page” section. Is it possible to determine what those statements are?

Thanks,
Andrew


On Nov 23, 2016, at 2:33 PM, Andrew Hall <hall1467@umn.edu> wrote:

Hi,

I’m a PhD student/researcher at the University of Minnesota who (along with Max Klein and another grad student/researcher) has been interested in understanding the extent to which Wikidata is used in (English, for now) Wikipedia.

There seems to be no easy way to determine Wikidata usage in Wikipedia pages so I’ll describe two approaches we’ve considered as our best attempts at solving this problem. I’ll also describe shortcomings of each approach.

The first approach involves analyzing Wikipedia templates to look for explicit references (i.e. “#property:P<some number>”) across all templates. For a given template containing a certain property reference, we then assume that the statement corresponding to the Wikidata property is used in all Wikipedia pages that transclude that template. However, there are two clear limitations to this approach:
  1. If we assume that the statement corresponding to the Wikidata property is used in all Wikipedia pages that transclude that template, this results in a sort of upper bound on the number of actual property usages in Wikipedia. However, we have no sense of what the actual usage looks like since each template has its own set of logic and, whether or not a given property would get rendered in Wikipedia is dependent on that (sometimes quite complicated) logic. A possible way to get a sense of usage would be to sample a small set of random pages (that use templates using Wikidata)  and manually look up whether or not the Wikidata statement for the given Wikidata item is exactly the same as that rendered in the corresponding Wikipedia page. If it was, then we might assume the property is being used. Of course, this is not a perfect approach since it's possible that a Wikidata statement is used in Wikipedia but it is formatted differently in Wikidata versus in Wikipedia (e.g. a date is rendered using a different format).
  2. This approach does not account for Lua modules, which can be referenced from within templates. The modules can (and sometimes do) contain code that supplies Wikidata to Wikipedia pages that are transcluded by the given templates containing the module references. Without understanding and accounting for the logic in all Lua modules that use Wikidata, it does not seem possible to actually know which Wikidata properties are being introduced to Wikipedia pages through this method.

The second approach involves expanding (using the MediaWiki API, see https://www.mediawiki.org/wiki/API:Expandtemplates) already transcluded templates into HTML tables in two ways: 1) in the context of the appropriate Wikipedia page and 2) out of context of the appropriate Wikipedia page (e.g. in my own sandbox). It’s my understanding that if the Wikipedia page uses Wikidata, then that Wikidata should show up in the expansion if the template is expanded in the context of its page, and not when expanded elsewhere (e.g. in my sandbox). We would then check to see if there is a difference between the two expansions by html diff-ing. The difference between the two expanded templates would presumably be due to Wikidata. Of course, there are limitations to this approach as well:
  1. It's possible that a Wikipedia contributor manually entered in data (into a transcluded template) that exactly matches data in Wikidata and thus, the expansions would be the same across the diff-ing — Wikidata would not be recognizable in this case.
  2. Once we identify (through diff-ing) where Wikidata is being used in expanded templates, it's not obvious what specific Wikidata properties/statements were used. In other words, "linking" Wikidata to corresponding html (table) rows in an expanded template seems challenging.

Any insight about how we can approach this problem would be greatly appreciated!

Thanks,
Andrew Hall