Hey Thad,

thanks for feedback!

The tables as you have described them can be introduced to the dashboards. We can also add an option to download the data from each visualization like we do in the Wikidata Concepts Monitor.

As of particular item statistics, there are > 70 million of them, so... I would go for batch processed, .csv-ed, compressed and then published open data sets.

Now - the visualizations and hovering over > 400 data points: there's nothing I can do about that, that is simply the number of Wikidata languages considered. No one ever said a complex database of the magnitude of this one can be visualized in a simple fashion. So there are things that we can do from the viewpoint of ergonomics, and things that we cannot simply because of the scale of the problem. I think that I've used Plotly to produce the charts so the zoom tool should be available there.

The data sets used to produce this dashboard are *barely* computable with our current stat100* servers + Apache Spark running on our (not bad at all) cluster. 

In any case, if you would like to see any fundamental improvements here, you now the game: open a Phab ticket, ping me, and Lydia who will decide on the priorities for the features suggested.

Once again, thanks a lot.

Best regards,

Goran S. Milovanović,
Data Scientist for Wikidata, WMDE

On Thu, Feb 27, 2020, 15:59 Thad Guidry <thadguidry@gmail.com> wrote:
Oh, and columns for "UNESCO Language Status|Ethnologue Status|" as well !

On Thu, Feb 27, 2020 at 8:53 AM Thad Guidry <thadguidry@gmail.com> wrote:
A lot of the visualizations are attempting to show summary aggregations, if you look really well.
I found myself needing to hover over hundred of colored dots, just to see what the details where for things along the Tops or Bottoms of various viz's.
If you look at what many of the viz's are trying to show, something about an "... Item ..." , then it's clear that seeing the Items is important for context and analysis!  This was also why I was having to hover over so many hundreds of dots...I needed more context of WHICH Item!

It would be useful to expose Items themselves within summary aggregations.
For that, a simple table will do!
I would say the Ontology tab is actually just barely OK for the information it's trying to present (linkage), but a tabular view that can sort by column would work well alternatively.  Its fine to discover WHICH Relations, but not so much for summary aggregation.  Good old Tables or Bar charts work well for that also!

An extra tab to display a table view showing "Qxxxx|Relations|Num.labels|Num.sitelinks|% of items reused" that can be sorted on column.

On Thu, Feb 27, 2020 at 7:52 AM Léa Lacroix <lea.lacroix@wikimedia.de> wrote:

Hello all,

A new dashboard got released for Wikidata’s birthday, and I realized that it was not properly announced here - sorry for that, let’s catch up :)

The Wikidata Languages Landscape dashboard provides insights into the ways languages are organized and used in Wikidata and across the Wikimedia projects that reuse Wikidata. It relies on different data sources: the Wikidata dumps, various datasets obtained directly from the Query Service, and datasets on Wikidata entity reuse statistics obtained from the Wikidata Concepts Monitor.

The dashboard provides different features:

  • The ontology tab, visualizing the graph of Wikidata ontology regarding languages
  • The language/class tab, generating graphs of items connected to a language, class or category
  • The label sharing graph, showing how similar the languages are judging from the extent of their overlap in what Wikidata items they have labels for
  • The language status tab, focussing on the UNESCO language endangerment categories and the Ethnologue language status
  • The language use tab, representing various indicators of language usage in Wikidata and across the Wikimedia projects

If you have any questions related to this dashboard, feel free to ask.

See also: full documentation of the dashboard.

Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list
Wikidata mailing list