Hi,
Thanks very much for the responses. They were very insightful.
Daniel, I have a follow up question with using “Wikidata entities used in this page” found
at, for example:
In the “Wikidata entities used in this page” section, are the entities used dependent on,
for example, the logic of the templates through which they are referenced? If entities are
listed in this section, are they for sure always coming from Wikidata?
Sometimes “other (statements)” is specified in the “Wikidata entities used in this page”
section. Is it possible to determine what those statements are?
Thanks,
Andrew
On Nov 23, 2016, at 2:33 PM, Andrew Hall
<hall1467(a)umn.edu> wrote:
Hi,
I’m a PhD student/researcher at the University of Minnesota who (along with Max Klein and
another grad student/researcher) has been interested in understanding the extent to which
Wikidata is used in (English, for now) Wikipedia.
There seems to be no easy way to determine Wikidata usage in Wikipedia pages so I’ll
describe two approaches we’ve considered as our best attempts at solving this problem.
I’ll also describe shortcomings of each approach.
The first approach involves analyzing Wikipedia templates to look for explicit references
(i.e. “#property:P<some number>”) across all templates. For a given template
containing a certain property reference, we then assume that the statement corresponding
to the Wikidata property is used in all Wikipedia pages that transclude that template.
However, there are two clear limitations to this approach:
If we assume that the statement corresponding to the Wikidata property is used in all
Wikipedia pages that transclude that template, this results in a sort of upper bound on
the number of actual property usages in Wikipedia. However, we have no sense of what the
actual usage looks like since each template has its own set of logic and, whether or not a
given property would get rendered in Wikipedia is dependent on that (sometimes quite
complicated) logic. A possible way to get a sense of usage would be to sample a small set
of random pages (that use templates using Wikidata) and manually look up whether or not
the Wikidata statement for the given Wikidata item
<https://www.wikidata.org/wiki/Help:Items> is exactly the same as that rendered in
the corresponding Wikipedia page. If it was, then we might assume the property is being
used. Of course, this is not a perfect approach since it's possible that a Wikidata
statement is used in Wikipedia but it is formatted differently in Wikidata versus in
Wikipedia (e.g. a date is rendered using a different format).
This approach does not account for Lua modules, which can be referenced from within
templates. The modules can (and sometimes do) contain code that supplies Wikidata to
Wikipedia pages that are transcluded by the given templates containing the module
references. Without understanding and accounting for the logic in all Lua modules that use
Wikidata, it does not seem possible to actually know which Wikidata properties are being
introduced to Wikipedia pages through this method.
The second approach involves expanding (using the MediaWiki API, see
https://www.mediawiki.org/wiki/API:Expandtemplates
<https://www.mediawiki.org/wiki/API:Expandtemplates>) already transcluded templates
into HTML tables in two ways: 1) in the context of the appropriate Wikipedia page and 2)
out of context of the appropriate Wikipedia page (e.g. in my own sandbox). It’s my
understanding that if the Wikipedia page uses Wikidata, then that Wikidata should show up
in the expansion if the template is expanded in the context of its page, and not when
expanded elsewhere (e.g. in my sandbox). We would then check to see if there is a
difference between the two expansions by html diff-ing. The difference between the two
expanded templates would presumably be due to Wikidata. Of course, there are limitations
to this approach as well:
It's possible that a Wikipedia contributor manually entered in data (into a
transcluded template) that exactly matches data in Wikidata and thus, the expansions would
be the same across the diff-ing — Wikidata would not be recognizable in this case.
Once we identify (through diff-ing) where Wikidata is being used in expanded templates,
it's not obvious what specific Wikidata properties/statements were used. In other
words, "linking" Wikidata to corresponding html (table) rows in an expanded
template seems challenging.
Any insight about how we can approach this problem would be greatly appreciated!
Thanks,
Andrew Hall