Perhaps you could get properties data from Sqid? It has property
frequencies for each class:
Hi,
You may know me as the author of the reference book "Working with
MediaWiki" (shameless plug -
http://workingwithmediawiki.com
<http://workingwithmediawiki.com>). I'm also a MediaWiki extension
developer, who has focused on creating generic interfaces for editing
and viewing structured data. Of these, the best known is the extension
Page Forms, which displays user-editable forms for editing template
calls and sections within pages:
https://www.mediawiki.org/wiki/Extension:Page_Forms
<https://www.mediawiki.org/wiki/Extension:Page_Forms>
However, I've also created various applications that provide a
"drill-down" interface for browsing data. There is Semantic Drilldown,
which provides such an interface for Semantic MediaWiki's data:
https://www.mediawiki.org/wiki/Extension:Semantic_Drilldown
<https://www.mediawiki.org/wiki/Extension:Semantic_Drilldown>
...Cargo, which provides browsing for its own data:
https://www.mediawiki.org/wiki/Extension:Cargo/Browsing_data
<https://www.mediawiki.org/wiki/Extension:Cargo/Browsing_data>
...and Miga, a JavaScript application that is not directly
MediaWiki-related but was nonetheless originally intended to browse
data from MediaWiki instances:
http://migadv.com/
I've been thinking quite a bit recently about creating this kind of
drill-down interface for the entirety of Wikidata's own data.
In terms of the interface, my idea is that it would actually most
resemble Miga - like Miga, it would be an all-JavaScript "single-page
application", and I think it makes sense to copy Miga's general
interface approach. You can see an example of Miga's browsing UI here
- note the green bar at the top, holding the filter options:
http://migadv.com/miga/?fictional#_cat=Fictional%20nonhumans
<http://migadv.com/miga/?fictional#_cat=Fictional%20nonhumans>
The Wikidata browser could have a somewhat similar interface, though
it would get its data via SPARQL queries rather than by querying data
stored in the browser, as Miga does. Another difference would be how
people got to 'classes" in the first place. I'm envisioning an
interface where people start at the highest-level class ("Entity", I
guess), then click down into child classes until they find the one
they're looking for, then drill down from there. A text search could
help with locating classes as well.
There are a few potential complications with creating a browsing
interface for Wikidata, but I believe they can all be overcome. One
complication is that there's no easy way to know which properties can
be filtered on for any class - for instance that, for pages in the
class "country", it makes sense to be able to filter on "population".
It's my belief that Wikidata should directly store, and make use of,
the expected "domain" and "range" for every property - I've
shared
this opinion with the Wikidata developers, who have tended to
disagree. But what can be done instead of modifying Wikidata - and
what I think would have to be done for this project to work - is to
create a separate site that scrapes the "domain" data from Wikidata's
property talk pages, stores that information in a database, and
creates an API that returns, for any class name, the "data structure"
for that class - i.e., the set of properties that have that class in
their domain.
(This outside service, once created, could potentially be used for
other things - like alternate form-based editing of Wikidata entities
in which the form had pre-set fields for each expected property.
That's outside the scope of this potential project, though.)
Another big complication is the massive amount of data involved.
Wikidata has around 1,000 times the amount of data that the other
applications I listed usually handle. But I think it's all doable,
using some well-placed logic. See this Cargo drilldown interface, for
example:
http://discoursedb.org/wiki/Special:Drilldown/Items
<http://discoursedb.org/wiki/Special:Drilldown/Items>
The "Author" field holds too many values to display on the screen, so
it's just a text input with autocompletion. As you drill down through
the values, though, the set of options gets reduced, and at some point
all the options are shown on the screen. That's the sort of interface
logic that could be used to keep the Wikidata browsing manageable.
A related complication is the large number of properties that could
show up as filters: if all of them are displayed on the screen, it
could overwhelm the interface. Miga already handles this problem, by
calculating the "diffusion" of each property - the number of unique
values divided by the number of total values - and then only
displaying filters for properties with a small-enough diffusion value.
I assume that this Wikidata browser could use a similar approach - and
also automatically ignore properties of certain types, like "ID",
which don't make sense to drill down on.
Another complication is that some (or maybe all?) properties can hold
values that are time-specific - the "population" property I mentioned
before is a perfect example of that, since it can hold a different
value for year. I don't know what an ideal solution for that is, but I
think it's fine for now to just always use the most recent value for
any such property.
I believe it would be fairly easy to "internationalize" this tool,
also, by the way - i.e., let the user select a language, and then show
the interface, and as much of the "data structure" (class and property
names) and data as possible, in that language.
Why do this whole thing? I can think of a number of important uses
this tool could have:
1) A new way to explore all the data on Wikidata - allowing both
aggregation and finding specific results.
2) A way to run specific queries, for those who don't know SPARQL or
understand Wikidata's specific data structure. This could open up
Wikidata querying to a wide range of people who otherwise would never
be able to do it.
3) Tied in with that, an API to create SPARQL queries - I didn't
mention this before, but it probably makes sense to add, to any page
in the display, a "View SPARQL" link, which retrieves the SPARQL
query that was used to get the current set of results.
4) Potentially, a visualization tool - I didn't mention this either,
but Miga shows maps and timelines for data that contain coordinate and
date information, and it makes sense for this tool to do the same
thing, whether that happens in the first version or later.
So that's my explanation. This is a lot of information to throw out at
one time. Ideally, I would be creating a whole wiki page for this
idea, with mockup images and so forth; and maybe I'll do that at some
point. But for now, I really just wanted to hear people's general
views on this sort of thing. And if some people think it's a good
idea, I'm also very curious to hear what the best strategy might be to
get funding for this. I could try get a Wikimedia Individual
Engagement Grant (IEG) to fund it - that's actually how Miga was
funded - but I wonder if another option is to get Wikimedia
Deutschland itself, or some other organization, to sponsor it, and
perhaps to take ownership of the resulting application. But maybe
that's getting too far ahead.
-Yaron
--
WikiWorks · MediaWiki Consulting ·
http://wikiworks.com
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata