Hey folks :)
As you might already have seen in the birthday presents list there is another birthday present: the Wikidata Concepts Monitor (WDCM - http://wdcm.wmflabs.org). It is a tool that enables you to browse and build an understanding of the way Wikidata is used across the Wikimedia projects.
Here’s the technical gist behind it: Currently 789 projects have client-side Wikidata usage tracking enabled, which allowed us to built a system that counts the number of pages using a particular Wikidata item per project. The count data were subjected to statistical modeling (1) by an unsupervised statistical learning algorithm - https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation (2) that is typically used in distributional semantics - https://en.wikipedia.org/wiki/Distributional_semantics (3) to discover the most natural groupings of Wikidata items in 14 semantic categories - https://en.wikipedia.org/wiki/Topic_model (4) in respect to the way they are used across the Wikimedia universe by the respective communities.
We hope for the WDCM system to become a tool that helps you discover. Beyond Wikidata’s syntax and semantics we are now beginning to learn about its pragmatics: the way Wikidata items will cluster in respect to how they are used is not necessarily the same as the way they go together in the Wikidata formal ontology. WDCM is the first step towards building an understanding of the highly complicated structure of Wikidata usage. This system can help you discover what Wikidata client projects are similar and in what respect, what semantic categories of items are used more or less frequently across 789 projects, how do items connect in respect to how similarly they are used by our communities, what are the most popular items per project, and many more (hopefully) interesting things.
Check out the WDCM and don’t forget to let us know what you think on the WDCM Wikidata project discussion page! I'd love to hear about any cool or interesting things you find in the visualizations. https://www.wikidata.org/wiki/Wikidata:Wikidata_Concepts_Monitor
Thanks to Goran who put in a lot of time to get this up and running and everyone who helped him.
Cheers Lydia
Thanks a lot all who are involved, very interesting.
However, when I look at the statistics of usage,
http://wdcm.wmflabs.org/WDCM_UsageDashboard/
I see that Wikivoyage allegedly uses, in particular, genes, humans (quite a lot, actually), and scientific articles. How could this be? I am pretty sure it does not use any of these.
Cheers Yaroslav
On Mon, Oct 30, 2017 at 6:12 PM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey folks :)
As you might already have seen in the birthday presents list there is another birthday present: the Wikidata Concepts Monitor (WDCM - http://wdcm.wmflabs.org). It is a tool that enables you to browse and build an understanding of the way Wikidata is used across the Wikimedia projects.
Here’s the technical gist behind it: Currently 789 projects have client-side Wikidata usage tracking enabled, which allowed us to built a system that counts the number of pages using a particular Wikidata item per project. The count data were subjected to statistical modeling (1) by an unsupervised statistical learning algorithm - https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation (2) that is typically used in distributional semantics - https://en.wikipedia.org/wiki/Distributional_semantics (3) to discover the most natural groupings of Wikidata items in 14 semantic categories
- https://en.wikipedia.org/wiki/Topic_model (4) in respect to the way
they are used across the Wikimedia universe by the respective communities.
We hope for the WDCM system to become a tool that helps you discover. Beyond Wikidata’s syntax and semantics we are now beginning to learn about its pragmatics: the way Wikidata items will cluster in respect to how they are used is not necessarily the same as the way they go together in the Wikidata formal ontology. WDCM is the first step towards building an understanding of the highly complicated structure of Wikidata usage. This system can help you discover what Wikidata client projects are similar and in what respect, what semantic categories of items are used more or less frequently across 789 projects, how do items connect in respect to how similarly they are used by our communities, what are the most popular items per project, and many more (hopefully) interesting things.
Check out the WDCM and don’t forget to let us know what you think on the WDCM Wikidata project discussion page! I'd love to hear about any cool or interesting things you find in the visualizations. https://www.wikidata.org/wiki/Wikidata:Wikidata_Concepts_Monitor
Thanks to Goran who put in a lot of time to get this up and running and everyone who helped him.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Mon, Oct 30, 2017 at 6:27 PM, Yaroslav Blanter ymbalt@gmail.com wrote:
Thanks a lot all who are involved, very interesting.
However, when I look at the statistics of usage,
http://wdcm.wmflabs.org/WDCM_UsageDashboard/
I see that Wikivoyage allegedly uses, in particular, genes, humans (quite a lot, actually), and scientific articles. How could this be? I am pretty sure it does not use any of these.
That does indeed sound wrong :D Which graph are you talking about? Then Goran can have a look.
Cheers Lydia
The graphs's title is
*Wikidata item usage per semantic category in each project type* on the page I linked to
http://wdcm.wmflabs.org/WDCM_UsageDashboard/
Thanks Cheers Yaroslav
On Mon, Oct 30, 2017 at 6:34 PM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Mon, Oct 30, 2017 at 6:27 PM, Yaroslav Blanter ymbalt@gmail.com wrote:
Thanks a lot all who are involved, very interesting.
However, when I look at the statistics of usage,
http://wdcm.wmflabs.org/WDCM_UsageDashboard/
I see that Wikivoyage allegedly uses, in particular, genes, humans
(quite a
lot, actually), and scientific articles. How could this be? I am pretty
sure
it does not use any of these.
That does indeed sound wrong :D Which graph are you talking about? Then Goran can have a look.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata