Hello all,
A new dashboard got released for Wikidata’s birthday, and I realized that it was not properly announced here - sorry for that, let’s catch up :)
The *Wikidata Languages Landscape dashboard https://wmdeanalytics.wmflabs.org/WD_LanguagesLandscape/* provides insights into the ways languages are organized and used in Wikidata and across the Wikimedia projects that reuse Wikidata. It relies on different data sources: the Wikidata dumps, various datasets obtained directly from the Query Service, and datasets on Wikidata entity reuse statistics obtained from the Wikidata Concepts Monitor https://www.wikidata.org/wiki/Wikidata:Wikidata_Concepts_Monitor.
The dashboard provides different features:
- The *ontology* tab, visualizing the graph of Wikidata ontology regarding languages - The *language/class* tab, generating graphs of items connected to a language, class or category - The *label sharing* graph, showing how similar the languages are judging from the extent of their overlap in what Wikidata items they have labels for - The *language status* tab, focussing on the UNESCO language endangerment categories and the Ethnologue language status - The *language use* tab, representing various indicators of language usage in Wikidata and across the Wikimedia projects
If you have any questions related to this dashboard, feel free to ask.
See also: full documentation of the dashboard https://meta.wikimedia.org/wiki/Wikidata_Languages_Landscape. Cheers,
So... A lot of the visualizations are attempting to show summary aggregations, if you look really well. I found myself needing to hover over hundred of colored dots, just to see what the details where for things along the Tops or Bottoms of various viz's. If you look at what many of the viz's are trying to show, something about an "... Item ..." , then it's clear that seeing the Items is important for context and analysis! This was also why I was having to hover over so many hundreds of dots...I needed more context of WHICH Item!
So... It would be useful to expose Items themselves within summary aggregations. For that, a simple table will do! I would say the Ontology tab is actually just barely OK for the information it's trying to present (linkage), but a tabular view that can sort by column would work well alternatively. Its fine to discover WHICH Relations, but not so much for summary aggregation. Good old Tables or Bar charts work well for that also!
Proposal: An extra tab to display a table view showing "Qxxxx|Relations|Num.labels|Num.sitelinks|% of items reused" that can be sorted on column.
Thad https://www.linkedin.com/in/thadguidry/
On Thu, Feb 27, 2020 at 7:52 AM Léa Lacroix lea.lacroix@wikimedia.de wrote:
Oh, and columns for "UNESCO Language Status|Ethnologue Status|" as well !
Thad https://www.linkedin.com/in/thadguidry/
On Thu, Feb 27, 2020 at 8:53 AM Thad Guidry thadguidry@gmail.com wrote:
Hey Thad,
thanks for feedback!
The tables as you have described them can be introduced to the dashboards. We can also add an option to download the data from each visualization like we do in the Wikidata Concepts Monitor.
As of particular item statistics, there are > 70 million of them, so... I would go for batch processed, .csv-ed, compressed and then published open data sets.
Now - the visualizations and hovering over > 400 data points: there's nothing I can do about that, that is simply the number of Wikidata languages considered. No one ever said a complex database of the magnitude of this one can be visualized in a simple fashion. So there are things that we can do from the viewpoint of ergonomics, and things that we cannot simply because of the scale of the problem. I think that I've used Plotly to produce the charts so the zoom tool should be available there.
The data sets used to produce this dashboard are *barely* computable with our current stat100* servers + Apache Spark running on our (not bad at all) cluster.
In any case, if you would like to see any fundamental improvements here, you now the game: open a Phab ticket, ping me, and Lydia who will decide on the priorities for the features suggested.
Once again, thanks a lot.
Best regards,
Goran S. Milovanović, Data Scientist for Wikidata, WMDE
On Thu, Feb 27, 2020, 15:59 Thad Guidry thadguidry@gmail.com wrote:
So it sounds like the viz for summary aggregation for full stats might be more painful that worth for only a few users interested (like me), and I figured as much and knew the problem well. (I've done your job before! but semi-retired now just this week!)
Yeah, in fact, I LOOKED for a little download button on each viz panel. I noticed the download .png button and others...but no "download data table" button.
If that download button on panels is easy to add and not much trouble, I can open the Phab ticket, or you can on my behalf.
Thad https://www.linkedin.com/in/thadguidry/
On Thu, Feb 27, 2020 at 9:19 AM Goran Milovanovic < goran.milovanovic_ext@wikimedia.de> wrote: