Hi,
You may know me as the author of the reference book "Working with MediaWiki" (shameless plug - http://workingwithmediawiki.com). I'm also a MediaWiki extension developer, who has focused on creating generic interfaces for editing and viewing structured data. Of these, the best known is the extension Page Forms, which displays user-editable forms for editing template calls and sections within pages:
https://www.mediawiki.org/wiki/Extension:Page_Forms
However, I've also created various applications that provide a "drill-down" interface for browsing data. There is Semantic Drilldown, which provides such an interface for Semantic MediaWiki's data:
https://www.mediawiki.org/wiki/Extension:Semantic_Drilldown
...Cargo, which provides browsing for its own data:
https://www.mediawiki.org/wiki/Extension:Cargo/Browsing_data
...and Miga, a JavaScript application that is not directly MediaWiki-related but was nonetheless originally intended to browse data from MediaWiki instances:
I've been thinking quite a bit recently about creating this kind of drill-down interface for the entirety of Wikidata's own data.
In terms of the interface, my idea is that it would actually most resemble Miga - like Miga, it would be an all-JavaScript "single-page application", and I think it makes sense to copy Miga's general interface approach. You can see an example of Miga's browsing UI here - note the green bar at the top, holding the filter options:
http://migadv.com/miga/?fictional#_cat=Fictional%20nonhumans
The Wikidata browser could have a somewhat similar interface, though it would get its data via SPARQL queries rather than by querying data stored in the browser, as Miga does. Another difference would be how people got to 'classes" in the first place. I'm envisioning an interface where people start at the highest-level class ("Entity", I guess), then click down into child classes until they find the one they're looking for, then drill down from there. A text search could help with locating classes as well.
There are a few potential complications with creating a browsing interface for Wikidata, but I believe they can all be overcome. One complication is that there's no easy way to know which properties can be filtered on for any class - for instance that, for pages in the class "country", it makes sense to be able to filter on "population". It's my belief that Wikidata should directly store, and make use of, the expected "domain" and "range" for every property - I've shared this opinion with the Wikidata developers, who have tended to disagree. But what can be done instead of modifying Wikidata - and what I think would have to be done for this project to work - is to create a separate site that scrapes the "domain" data from Wikidata's property talk pages, stores that information in a database, and creates an API that returns, for any class name, the "data structure" for that class - i.e., the set of properties that have that class in their domain.
(This outside service, once created, could potentially be used for other things - like alternate form-based editing of Wikidata entities in which the form had pre-set fields for each expected property. That's outside the scope of this potential project, though.)
Another big complication is the massive amount of data involved. Wikidata has around 1,000 times the amount of data that the other applications I listed usually handle. But I think it's all doable, using some well-placed logic. See this Cargo drilldown interface, for example:
http://discoursedb.org/wiki/Special:Drilldown/Items
The "Author" field holds too many values to display on the screen, so it's just a text input with autocompletion. As you drill down through the values, though, the set of options gets reduced, and at some point all the options are shown on the screen. That's the sort of interface logic that could be used to keep the Wikidata browsing manageable.
A related complication is the large number of properties that could show up as filters: if all of them are displayed on the screen, it could overwhelm the interface. Miga already handles this problem, by calculating the "diffusion" of each property - the number of unique values divided by the number of total values - and then only displaying filters for properties with a small-enough diffusion value. I assume that this Wikidata browser could use a similar approach - and also automatically ignore properties of certain types, like "ID", which don't make sense to drill down on.
Another complication is that some (or maybe all?) properties can hold values that are time-specific - the "population" property I mentioned before is a perfect example of that, since it can hold a different value for year. I don't know what an ideal solution for that is, but I think it's fine for now to just always use the most recent value for any such property.
I believe it would be fairly easy to "internationalize" this tool, also, by the way - i.e., let the user select a language, and then show the interface, and as much of the "data structure" (class and property names) and data as possible, in that language.
Why do this whole thing? I can think of a number of important uses this tool could have:
1) A new way to explore all the data on Wikidata - allowing both aggregation and finding specific results.
2) A way to run specific queries, for those who don't know SPARQL or understand Wikidata's specific data structure. This could open up Wikidata querying to a wide range of people who otherwise would never be able to do it.
3) Tied in with that, an API to create SPARQL queries - I didn't mention this before, but it probably makes sense to add, to any page in the display, a "View SPARQL" link, which retrieves the SPARQL query that was used to get the current set of results.
4) Potentially, a visualization tool - I didn't mention this either, but Miga shows maps and timelines for data that contain coordinate and date information, and it makes sense for this tool to do the same thing, whether that happens in the first version or later.
So that's my explanation. This is a lot of information to throw out at one time. Ideally, I would be creating a whole wiki page for this idea, with mockup images and so forth; and maybe I'll do that at some point. But for now, I really just wanted to hear people's general views on this sort of thing. And if some people think it's a good idea, I'm also very curious to hear what the best strategy might be to get funding for this. I could try get a Wikimedia Individual Engagement Grant (IEG) to fund it - that's actually how Miga was funded - but I wonder if another option is to get Wikimedia Deutschland itself, or some other organization, to sponsor it, and perhaps to take ownership of the resulting application. But maybe that's getting too far ahead.
-Yaron
Perhaps you could get properties data from Sqid? It has property frequencies for each class:
https://tools.wmflabs.org/sqid/#/view?id=Q8502&lang=nb
Dan Michael
On 24.04.2017 06:44, Yaron Koren wrote:
Hi,
You may know me as the author of the reference book "Working with MediaWiki" (shameless plug - http://workingwithmediawiki.com http://workingwithmediawiki.com). I'm also a MediaWiki extension developer, who has focused on creating generic interfaces for editing and viewing structured data. Of these, the best known is the extension Page Forms, which displays user-editable forms for editing template calls and sections within pages:
https://www.mediawiki.org/wiki/Extension:Page_Forms https://www.mediawiki.org/wiki/Extension:Page_Forms
However, I've also created various applications that provide a "drill-down" interface for browsing data. There is Semantic Drilldown, which provides such an interface for Semantic MediaWiki's data:
https://www.mediawiki.org/wiki/Extension:Semantic_Drilldown https://www.mediawiki.org/wiki/Extension:Semantic_Drilldown
...Cargo, which provides browsing for its own data:
https://www.mediawiki.org/wiki/Extension:Cargo/Browsing_data https://www.mediawiki.org/wiki/Extension:Cargo/Browsing_data
...and Miga, a JavaScript application that is not directly MediaWiki-related but was nonetheless originally intended to browse data from MediaWiki instances:
I've been thinking quite a bit recently about creating this kind of drill-down interface for the entirety of Wikidata's own data.
In terms of the interface, my idea is that it would actually most resemble Miga - like Miga, it would be an all-JavaScript "single-page application", and I think it makes sense to copy Miga's general interface approach. You can see an example of Miga's browsing UI here
- note the green bar at the top, holding the filter options:
http://migadv.com/miga/?fictional#_cat=Fictional%20nonhumans http://migadv.com/miga/?fictional#_cat=Fictional%20nonhumans
The Wikidata browser could have a somewhat similar interface, though it would get its data via SPARQL queries rather than by querying data stored in the browser, as Miga does. Another difference would be how people got to 'classes" in the first place. I'm envisioning an interface where people start at the highest-level class ("Entity", I guess), then click down into child classes until they find the one they're looking for, then drill down from there. A text search could help with locating classes as well.
There are a few potential complications with creating a browsing interface for Wikidata, but I believe they can all be overcome. One complication is that there's no easy way to know which properties can be filtered on for any class - for instance that, for pages in the class "country", it makes sense to be able to filter on "population". It's my belief that Wikidata should directly store, and make use of, the expected "domain" and "range" for every property - I've shared this opinion with the Wikidata developers, who have tended to disagree. But what can be done instead of modifying Wikidata - and what I think would have to be done for this project to work - is to create a separate site that scrapes the "domain" data from Wikidata's property talk pages, stores that information in a database, and creates an API that returns, for any class name, the "data structure" for that class - i.e., the set of properties that have that class in their domain.
(This outside service, once created, could potentially be used for other things - like alternate form-based editing of Wikidata entities in which the form had pre-set fields for each expected property. That's outside the scope of this potential project, though.)
Another big complication is the massive amount of data involved. Wikidata has around 1,000 times the amount of data that the other applications I listed usually handle. But I think it's all doable, using some well-placed logic. See this Cargo drilldown interface, for example:
http://discoursedb.org/wiki/Special:Drilldown/Items http://discoursedb.org/wiki/Special:Drilldown/Items
The "Author" field holds too many values to display on the screen, so it's just a text input with autocompletion. As you drill down through the values, though, the set of options gets reduced, and at some point all the options are shown on the screen. That's the sort of interface logic that could be used to keep the Wikidata browsing manageable.
A related complication is the large number of properties that could show up as filters: if all of them are displayed on the screen, it could overwhelm the interface. Miga already handles this problem, by calculating the "diffusion" of each property - the number of unique values divided by the number of total values - and then only displaying filters for properties with a small-enough diffusion value. I assume that this Wikidata browser could use a similar approach - and also automatically ignore properties of certain types, like "ID", which don't make sense to drill down on.
Another complication is that some (or maybe all?) properties can hold values that are time-specific - the "population" property I mentioned before is a perfect example of that, since it can hold a different value for year. I don't know what an ideal solution for that is, but I think it's fine for now to just always use the most recent value for any such property.
I believe it would be fairly easy to "internationalize" this tool, also, by the way - i.e., let the user select a language, and then show the interface, and as much of the "data structure" (class and property names) and data as possible, in that language.
Why do this whole thing? I can think of a number of important uses this tool could have:
- A new way to explore all the data on Wikidata - allowing both
aggregation and finding specific results.
- A way to run specific queries, for those who don't know SPARQL or
understand Wikidata's specific data structure. This could open up Wikidata querying to a wide range of people who otherwise would never be able to do it.
- Tied in with that, an API to create SPARQL queries - I didn't
mention this before, but it probably makes sense to add, to any page in the display, a "View SPARQL" link, which retrieves the SPARQL query that was used to get the current set of results.
- Potentially, a visualization tool - I didn't mention this either,
but Miga shows maps and timelines for data that contain coordinate and date information, and it makes sense for this tool to do the same thing, whether that happens in the first version or later.
So that's my explanation. This is a lot of information to throw out at one time. Ideally, I would be creating a whole wiki page for this idea, with mockup images and so forth; and maybe I'll do that at some point. But for now, I really just wanted to hear people's general views on this sort of thing. And if some people think it's a good idea, I'm also very curious to hear what the best strategy might be to get funding for this. I could try get a Wikimedia Individual Engagement Grant (IEG) to fund it - that's actually how Miga was funded - but I wonder if another option is to get Wikimedia Deutschland itself, or some other organization, to sponsor it, and perhaps to take ownership of the resulting application. But maybe that's getting too far ahead.
-Yaron
-- WikiWorks · MediaWiki Consulting · http://wikiworks.com _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Dan,
Oh, very interesting! I never noticed that "Typical Properties" section before (I'm looking at it in English). I still think a separate API service would be necessary, so that it could attach a "property type" and a "diffusion" value for each property, but the part about getting the expected properties for each class seems easier than I thought it would be.
-Yaron
Perhaps you could get properties data from Sqid? It has property
frequencies for each class:
https://tools.wmflabs.org/sqid/#/view?id=Q8502&lang=nb
Dan Michael