Hi,
I’m a PhD student/researcher at the University of Minnesota who (along with Max Klein and another grad student/researcher) has been interested in understanding the extent to which Wikidata is used in (English, for now) Wikipedia.
There seems to be no easy way to determine Wikidata usage in Wikipedia pages so I’ll describe two approaches we’ve considered as our best attempts at solving this problem. I’ll also describe shortcomings of each approach.
The first approach involves analyzing Wikipedia templates to look for explicit references (i.e. “#property:P<some number>”) across all templates. For a given template containing a certain property reference, we then assume that the statement corresponding to the Wikidata property is used in all Wikipedia pages that transclude that template. However, there are two clear limitations to this approach: If we assume that the statement corresponding to the Wikidata property is used in all Wikipedia pages that transclude that template, this results in a sort of upper bound on the number of actual property usages in Wikipedia. However, we have no sense of what the actual usage looks like since each template has its own set of logic and, whether or not a given property would get rendered in Wikipedia is dependent on that (sometimes quite complicated) logic. A possible way to get a sense of usage would be to sample a small set of random pages (that use templates using Wikidata) and manually look up whether or not the Wikidata statement for the given Wikidata item https://www.wikidata.org/wiki/Help:Items is exactly the same as that rendered in the corresponding Wikipedia page. If it was, then we might assume the property is being used. Of course, this is not a perfect approach since it's possible that a Wikidata statement is used in Wikipedia but it is formatted differently in Wikidata versus in Wikipedia (e.g. a date is rendered using a different format). This approach does not account for Lua modules, which can be referenced from within templates. The modules can (and sometimes do) contain code that supplies Wikidata to Wikipedia pages that are transcluded by the given templates containing the module references. Without understanding and accounting for the logic in all Lua modules that use Wikidata, it does not seem possible to actually know which Wikidata properties are being introduced to Wikipedia pages through this method.
The second approach involves expanding (using the MediaWiki API, see https://www.mediawiki.org/wiki/API:Expandtemplates https://www.mediawiki.org/wiki/API:Expandtemplates) already transcluded templates into HTML tables in two ways: 1) in the context of the appropriate Wikipedia page and 2) out of context of the appropriate Wikipedia page (e.g. in my own sandbox). It’s my understanding that if the Wikipedia page uses Wikidata, then that Wikidata should show up in the expansion if the template is expanded in the context of its page, and not when expanded elsewhere (e.g. in my sandbox). We would then check to see if there is a difference between the two expansions by html diff-ing. The difference between the two expanded templates would presumably be due to Wikidata. Of course, there are limitations to this approach as well: It's possible that a Wikipedia contributor manually entered in data (into a transcluded template) that exactly matches data in Wikidata and thus, the expansions would be the same across the diff-ing — Wikidata would not be recognizable in this case. Once we identify (through diff-ing) where Wikidata is being used in expanded templates, it's not obvious what specific Wikidata properties/statements were used. In other words, "linking" Wikidata to corresponding html (table) rows in an expanded template seems challenging.
Any insight about how we can approach this problem would be greatly appreciated!
Thanks, Andrew Hall
Great idea.
The first approach involves analyzing Wikipedia templates to look for
explicit references (i.e. “#property:P<some number>”) across all templates.
This syntax is rarely never used on fr.wiki (it's even now forbidden in the main namespace) where almost all calls to Wikidata is done through modules (we had an RFC which *kind of* make it mandatory).
Meanwhile on fr.wiki, we extensively use categories (automatically added) for tracking Wikidata, main cat is https://fr.wikipedia.org/wiki/Cat%C3%A9gorie:Page_utilisant_Wikidata_par_pro... ; I'm curious to have a comparison to see if a lot of Wikidata calls are untracked by categories.
Cdlt, ~nicolas
Hello,
That's a very interesting topic, I'm looking forward to seeing the results :)
You may already know that we're already tracking entity usage on Wikipedia. Example : https://en.wikipedia.org/wiki/Barack_Obama?action=info section "Wikidata entities used in this page" Documentation and several ways to access these informations : https://www.wikidata.org/wiki/Wikidata:Entity_Usage
If you have any questions, feel free to contact me. Bests,
On 24 November 2016 at 14:52, Nicolas VIGNERON vigneron.nicolas@gmail.com wrote:
Great idea.
The first approach involves analyzing Wikipedia templates to look for
explicit references (i.e. “#property:P<some number>”) across all templates.
This syntax is rarely never used on fr.wiki (it's even now forbidden in the main namespace) where almost all calls to Wikidata is done through modules (we had an RFC which *kind of* make it mandatory).
Meanwhile on fr.wiki, we extensively use categories (automatically added) for tracking Wikidata, main cat is https://fr.wikipedia.org/wiki/ Cat%C3%A9gorie:Page_utilisant_Wikidata_par_propri%C3%A9t%C3%A9 ; I'm curious to have a comparison to see if a lot of Wikidata calls are untracked by categories.
Cdlt, ~nicolas
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 23.11.2016 um 21:33 schrieb Andrew Hall:
Hi,
I’m a PhD student/researcher at the University of Minnesota who (along with Max Klein and another grad student/researcher) has been interested in understanding the extent to which Wikidata is used in (English, for now) Wikipedia.
There seems to be no easy way to determine Wikidata usage in Wikipedia pages so I’ll describe two approaches we’ve considered as our best attempts at solving this problem. I’ll also describe shortcomings of each approach.
There is two pretty easy ways, which you may not have found because they were added only a couple of months ago:
You can look at the "page information" (action=info, linked from the sidebar), e.g. https://en.wikipedia.org/w/index.php?title=South_Pole_Telescope&action=info. Near the bottom you can find "Wikidata entities used in this page".
The same information is available via an API module, https://en.wikipedia.org/w/api.php?action=query&prop=wbentityusage&titles=South_Pole_Telescope. See https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bwbentityusage for documentation.
These URLs will list all direct and indirect usages, and also indicate what part or aspect of the entity was used.
HTH
Hi,
A related DBpedia GSoC project from this summer is described here http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg0782...
Some preliminary results that bootstraped this project from ~1y ago are here https://lists.wikimedia.org/pipermail/wikidata/2015-December/007757.html
On Thu, Nov 24, 2016 at 3:07 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de
wrote:
Am 23.11.2016 um 21:33 schrieb Andrew Hall:
Hi,
I’m a PhD student/researcher at the University of Minnesota who (along
with Max
Klein and another grad student/researcher) has been interested in
understanding
the extent to which Wikidata is used in (English, for now) Wikipedia.
There seems to be no easy way to determine Wikidata usage in Wikipedia
pages so
I’ll describe two approaches we’ve considered as our best attempts at
solving
this problem. I’ll also describe shortcomings of each approach.
There is two pretty easy ways, which you may not have found because they were added only a couple of months ago:
You can look at the "page information" (action=info, linked from the sidebar), e.g. https://en.wikipedia.org/w/index.php?title=South_Pole_ Telescope&action=info. Near the bottom you can find "Wikidata entities used in this page".
The same information is available via an API module, https://en.wikipedia.org/w/api.php?action=query&prop= wbentityusage&titles=South_Pole_Telescope. See https://en.wikipedia.org/w/api.php?action=help&modules= query%2Bwbentityusage for documentation.
These URLs will list all direct and indirect usages, and also indicate what part or aspect of the entity was used.
HTH
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Supplying this information as a use case -- from my personal Wikipedia editing.
For editathons held in NYC via our local Wikipedia chapter, I do a scrub for edits that includes Commons and Wikidata edits.
I use the *Global contributions (GUC)* interface (https://tools.wmflabs.org/ guc/?user=BrillLyle)
Which can be found on Wikipedia: - Wikipedia's User page - Left side selection: User contributions -- https://en.wikipedia.org/wiki/Special:Contributions/BrillLyle - or on-Wiki link Special:Contributions/BrillLyle.
The GUC collocates usage data.
Regarding your item #2 (first section):
The thing I do a lot is update Authority control data that shows up on Wikipedia as "only" the template {{Authority control}} but displays information that is updated on Wikipedia.
Twitter links are also done this way {{Twitter}} is input on Wikidata, as are quite a few others that were automatically migrated by bots.
Best,
- Erika
*Erika Herzog* Wikipedia *User:BrillLyle https://en.wikipedia.org/wiki/User:BrillLyle*
On Wed, Nov 23, 2016 at 3:33 PM, Andrew Hall hall1467@umn.edu wrote:
Hi,
I’m a PhD student/researcher at the University of Minnesota who (along with Max Klein and another grad student/researcher) has been interested in understanding the extent to which Wikidata is used in (English, for now) Wikipedia.
There seems to be no easy way to determine Wikidata usage in Wikipedia pages so I’ll describe two approaches we’ve considered as our best attempts at solving this problem. I’ll also describe shortcomings of each approach.
The first approach involves analyzing Wikipedia templates to look for explicit references (i.e. “#property:P<some number>”) across all templates. For a given template containing a certain property reference, we then assume that the statement corresponding to the Wikidata property is used in all Wikipedia pages that transclude that template. However, there are two clear limitations to this approach:
- If we assume that the statement corresponding to the Wikidata
property is used in all Wikipedia pages that transclude that template, this results in a sort of upper bound on the number of actual property usages in Wikipedia. However, we have no sense of what the actual usage looks like since each template has its own set of logic and, whether or not a given property would get rendered in Wikipedia is dependent on that (sometimes quite complicated) logic. A possible way to get a sense of usage would be to sample a small set of random pages (that use templates using Wikidata) and manually look up whether or not the Wikidata statement for the given Wikidata item https://www.wikidata.org/wiki/Help:Items is exactly the same as that rendered in the corresponding Wikipedia page. If it was, then we might assume the property is being used. Of course, this is not a perfect approach since it's possible that a Wikidata statement is used in Wikipedia but it is formatted differently in Wikidata versus in Wikipedia (e.g. a date is rendered using a different format). 2. This approach does not account for Lua modules, which can be referenced from within templates. The modules can (and sometimes do) contain code that supplies Wikidata to Wikipedia pages that are transcluded by the given templates containing the module references. Without understanding and accounting for the logic in all Lua modules that use Wikidata, it does not seem possible to actually know which Wikidata properties are being introduced to Wikipedia pages through this method.
The second approach involves expanding (using the MediaWiki API, see https://www.mediawiki.org/wiki/API:Expandtemplates) already transcluded templates into HTML tables in two ways: 1) in the context of the appropriate Wikipedia page and 2) out of context of the appropriate Wikipedia page (e.g. in my own sandbox). It’s my understanding that if the Wikipedia page uses Wikidata, then that Wikidata should show up in the expansion if the template is expanded in the context of its page, and not when expanded elsewhere (e.g. in my sandbox). We would then check to see if there is a difference between the two expansions by html diff-ing. The difference between the two expanded templates would presumably be due to Wikidata. Of course, there are limitations to this approach as well:
- It's possible that a Wikipedia contributor manually entered in data
(into a transcluded template) that exactly matches data in Wikidata and thus, the expansions would be the same across the diff-ing — Wikidata would not be recognizable in this case. 2. Once we identify (through diff-ing) where Wikidata is being used in expanded templates, it's not obvious what specific Wikidata properties/statements were used. In other words, "linking" Wikidata to corresponding html (table) rows in an expanded template seems challenging.
Any insight about how we can approach this problem would be greatly appreciated!
Thanks, Andrew Hall
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi,
Thanks very much for the responses. They were very insightful.
Daniel, I have a follow up question with using “Wikidata entities used in this page” found at, for example: https://en.wikipedia.org/w/index.php?title=South_Pole_Telescope&action=i... In the “Wikidata entities used in this page” section, are the entities used dependent on, for example, the logic of the templates through which they are referenced? If entities are listed in this section, are they for sure always coming from Wikidata? Sometimes “other (statements)” is specified in the “Wikidata entities used in this page” section. Is it possible to determine what those statements are?
Thanks, Andrew
On Nov 23, 2016, at 2:33 PM, Andrew Hall hall1467@umn.edu wrote:
Hi,
I’m a PhD student/researcher at the University of Minnesota who (along with Max Klein and another grad student/researcher) has been interested in understanding the extent to which Wikidata is used in (English, for now) Wikipedia.
There seems to be no easy way to determine Wikidata usage in Wikipedia pages so I’ll describe two approaches we’ve considered as our best attempts at solving this problem. I’ll also describe shortcomings of each approach.
The first approach involves analyzing Wikipedia templates to look for explicit references (i.e. “#property:P<some number>”) across all templates. For a given template containing a certain property reference, we then assume that the statement corresponding to the Wikidata property is used in all Wikipedia pages that transclude that template. However, there are two clear limitations to this approach: If we assume that the statement corresponding to the Wikidata property is used in all Wikipedia pages that transclude that template, this results in a sort of upper bound on the number of actual property usages in Wikipedia. However, we have no sense of what the actual usage looks like since each template has its own set of logic and, whether or not a given property would get rendered in Wikipedia is dependent on that (sometimes quite complicated) logic. A possible way to get a sense of usage would be to sample a small set of random pages (that use templates using Wikidata) and manually look up whether or not the Wikidata statement for the given Wikidata item https://www.wikidata.org/wiki/Help:Items is exactly the same as that rendered in the corresponding Wikipedia page. If it was, then we might assume the property is being used. Of course, this is not a perfect approach since it's possible that a Wikidata statement is used in Wikipedia but it is formatted differently in Wikidata versus in Wikipedia (e.g. a date is rendered using a different format). This approach does not account for Lua modules, which can be referenced from within templates. The modules can (and sometimes do) contain code that supplies Wikidata to Wikipedia pages that are transcluded by the given templates containing the module references. Without understanding and accounting for the logic in all Lua modules that use Wikidata, it does not seem possible to actually know which Wikidata properties are being introduced to Wikipedia pages through this method.
The second approach involves expanding (using the MediaWiki API, see https://www.mediawiki.org/wiki/API:Expandtemplates https://www.mediawiki.org/wiki/API:Expandtemplates) already transcluded templates into HTML tables in two ways: 1) in the context of the appropriate Wikipedia page and 2) out of context of the appropriate Wikipedia page (e.g. in my own sandbox). It’s my understanding that if the Wikipedia page uses Wikidata, then that Wikidata should show up in the expansion if the template is expanded in the context of its page, and not when expanded elsewhere (e.g. in my sandbox). We would then check to see if there is a difference between the two expansions by html diff-ing. The difference between the two expanded templates would presumably be due to Wikidata. Of course, there are limitations to this approach as well: It's possible that a Wikipedia contributor manually entered in data (into a transcluded template) that exactly matches data in Wikidata and thus, the expansions would be the same across the diff-ing — Wikidata would not be recognizable in this case. Once we identify (through diff-ing) where Wikidata is being used in expanded templates, it's not obvious what specific Wikidata properties/statements were used. In other words, "linking" Wikidata to corresponding html (table) rows in an expanded template seems challenging.
Any insight about how we can approach this problem would be greatly appreciated!
Thanks, Andrew Hall
Am 26.11.2016 um 23:33 schrieb Andrew Hall:
- In the “Wikidata entities used in this page” section, are the entities used dependent on, for example, the logic of the templates through which they are referenced? If entities are listed in this section, are they for sure always coming from Wikidata?
Yes, *any* use is tracked and recorded, including accessing some part of the entity from a conditional somewhere in the Lua code. And all entities come from Wikidata -- we don't have any other Wikibase repo yet, and when we do, usage will be tracked separately for that.
- Sometimes “other (statements)” is specified in the “Wikidata entities used in this page” section. Is it possible to determine what those statements are?
No, that information is not recorded. There is no way to find out without tracing all templates, parameters, and Lua code. We may start tracking this in the future, but it's a lot of data.
I'm sure we had a ticket for changiong this, but couldn't find it, so I made a new one: https://phabricator.wikimedia.org/T151717
Hi Daniel,
Thanks very much for the response again.
It seems that from this https://phabricator.wikimedia.org/T145897 Phabricator ticket from September that the syntax "{{Uses Wikidata|<property_ids>}}” applied to template documentation is a means through which entity usage tracking is introduced for that given template and set of properties. However, this is not always applied to template documentation as is demonstrated in that ticket. For example, P856 is not tracked in the context of the English Wikipedia page “Windows_95”, but we believe that the Wikidata statement associated with P856 in the context of the “Windows_95” Wikidata item is being used in the corresponding Wikipedia page. To further complicate things, it seems that in the South Pole Telescope example, the Wikidata property P2046, which is “area”, was not included in a {{Uses Wikidata|…} statement but still seems to be tracked. The "infobox telescope" code https://en.wikipedia.org/w/index.php?title=Template:Infobox_telescope/doc&action=edit has the following syntax to include “area" data from Wikidata: "{{Wikidata entity link|P2046}}”.
It seems that there are different syntaxes to include Wikidata into Wikipedia templates and that some do not necessarily result in tracking as of yet — does this seem accurate? I was wondering how we might circumvent this issue if this is the case?
Thanks, Andrew
On Nov 27, 2016, at 6:55 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 26.11.2016 um 23:33 schrieb Andrew Hall:
- In the “Wikidata entities used in this page” section, are the entities used dependent on, for example, the logic of the templates through which they are referenced? If entities are listed in this section, are they for sure always coming from Wikidata?
Yes, *any* use is tracked and recorded, including accessing some part of the entity from a conditional somewhere in the Lua code. And all entities come from Wikidata -- we don't have any other Wikibase repo yet, and when we do, usage will be tracked separately for that.
- Sometimes “other (statements)” is specified in the “Wikidata entities used in this page” section. Is it possible to determine what those statements are?
No, that information is not recorded. There is no way to find out without tracing all templates, parameters, and Lua code. We may start tracking this in the future, but it's a lot of data.
I'm sure we had a ticket for changiong this, but couldn't find it, so I made a new one: https://phabricator.wikimedia.org/T151717
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.