While running a few properties a question I had is how does the tool handle
the 10,000 item limit. Does not seem to be an explanation of how the 10,000
items are selected, be that the first 10,000 item returned in a Wikidata
query or other means of sampling. It is important to understand that to
consider what bias that selection has on the analysis since the full item
list is not being analysed (or maybe it is and you're only displaying
10,000) it just isn't clear.
Additionally adding a "count" column to the property gap would provide a
quick overview of how extensive the bias is per property similar to the
count of properties are listed per item.
I assume you have more planned for this given its in alpha for feedback so
just a few thoughts on what I would like to do, but don't see possible at
1. It would also be useful to provide an export of the list of items and
the associated properties that are missing for each, and their deviation
percentile to order the deviation. If these were also saved as part of the
tool it could become a working list to focus on to counter the bias.
2. Do you have plans for adding filters or grouping comparisons betweens
values of properties or subclasses?
Currently running on lakes (Q23397) results in more than 10,000 items. It
also doesn't afford an analysis across administrative regions of the world
such as between states in the United States of America. This also prompted
the above suggestion to clarify how the 10,000 items are handled in the
tool as I was unsure if this was a complete analysis or just of 10,000
lakes that may be located in different geographic regions with different
level of organization to work on these sets. In the USA, New York,
Minnesota, and Michigan seem to have had significantly more lake presence
than other states thus analysis by region, by lake size, or other property
would make it more clear what type of bias is present for what dataset. The
top property may not provide the bias analysis for particular use cases
with more nuanced bias.
University of Maryland College Park, College of Information Studies
University of Gothenburg