Hi,
I have a couple of questions regarding the Wiki Page ID. Does it always
stay unique for the page, where the page itself is just a placeholder for
any kind of information that might change over time?
Consider the following cases:
1. The first time someone creates page "Moon" it is assigned ID=1. If at
some point the page is renamed to "The_Moon", the ID=1 remains intact. Is
this correct?
2. What if we have page "Moon" with ID=1. Someone creates a second-page
"The_Moon" with ID=2. Is it possible that page "Moon" is transformed into a
redirect? Then, "Moon" would be redirecting to page "The_Moon"?
3. Is it possible for page "Moon" to become a category "Category:Moon" with
the same ID=1?
Thanks,
Gintas
Hello everyone,
I'd like to ask if Wikidata could please offer a HDT [1] dump along with the already available Turtle dump [2]. HDT is a binary format to store RDF data, which is pretty useful because it can be queried from command line, it can be used as a Jena/Fuseki source, and it also uses orders-of-magnitude less space to store the same data. The problem is that it's very impractical to generate a HDT, because the current implementation requires a lot of RAM processing to convert a file. For Wikidata it will probably require a machine with 100-200GB of RAM. This is unfeasible for me because I don't have such a machine, but if you guys have one to share, I can help setup the rdf2hdt software required to convert Wikidata Turtle to HDT.
Thank you.
[1] http://www.rdfhdt.org/
[2] https://dumps.wikimedia.org/wikidatawiki/entities/
Hi folks !
So we at Schema.org like the idea of your new Panorama View property as
noted here and gaining interest...
https://github.com/schemaorg/schemaorg/issues/1768
We also thought a bit about that and are also thinking of an additional
property (a subtype for us) to hold a 360 panoramic view (a complete
spherical projection view, also known as photo bubbles, photo spheres, etc).
Any seasoned Wikidata folks that can create that 360 panoramic view
property proposal for us ? :)
Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>
Hi!
I would like to initiate a discussion about coordinate precision in
Wikidata and Query Service. The reason is that right now we do not have
any limit to precision, coordinates are basically doubles, and that
allows to specify over-precise coordinates and makes it harder to
compare them - both between themselves within Wikidata and with outside
services.
>From the precision description in [1], we would rarely need beyond third
or fourth digit after the decimal point. However, we have in the
database coordinates like: Point(13.366666666 41.766666666) which
pretends to specify it with sub-millimeter accuracy - for an entity that
describes a municipality[2]!
We do have precision on values - e.g. the above has specified precision
of "arcseconds" - so it may be just a formatting issue, but even
arcsecond looks somewhat over-precise for a city. And it may be a bit
challenging to convert DMS precision DD precision.
But the bigger question is whether we should store over-precise
coordinates in the database at all, or we should round them up on export
or inside the data. The formulae that are used to calculate distances
have, by obvious reasons, limited precision, and direct comparisons
can't take precision into account, which may lead to such coordinates
very hard to work with. Should we maybe just put a limit on how precise
we put coordinates into RDF and in query service? Would four decimals
after the dot be enough? According to [4] this is what commercial GPS
device can provide. If not, why and which accuracy would be appropriate?
We do export precision of the coordinate as wikibase:geoPrecision[3] -
and we currently have 258060 distinct values for it. This is very weird.
I am not sure precision is useful in this form. Can anybody tell me any
use case for this number now? If not, maybe we should change how we
represent it. I'm also not sure where these come from as we only have
13 options in the UI. Bots?
[1] https://en.wikipedia.org/wiki/Decimal_degrees
[2] https://www.wikidata.org/wiki/Q116746
[3]
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Globe_coor…
[4]
https://gis.stackexchange.com/questions/8650/measuring-accuracy-of-latitude…
--
Stas Malyshev
smalyshev(a)wikimedia.org
Hi folks,
is anyone using the Wikidata entity dump dcatap.rdf at
https://dumps.wikimedia.org/wikidatawiki/entities/dcatap.rdf?
It is very rarely used and is thus causing us a (probably) undue
maintenance burden, because of which we plan to remove it.
If anyone is making use of it, please speak up so that we can keep it or
find a viable alternative.
Cheers,
Marius
Hi,
responding to Yaroslav Blanter's following observation on this mailing list:
"However, when I look at the statistics of usage,
http://wdcm.wmflabs.org/WDCM_UsageDashboard/ I see that Wikivoyage
allegedly uses, in particular, genes, humans (quite a lot, actually), and
scientific articles. How could this be? I am pretty sure it does not use
any of these."
Please note that The *Wikidata item usage per semantic category in each
project type* chart that you have referred to in a later message has a
logarithmic y-scale (there's a Note explaining this immediately below the
title of the chart). Also, even from the chart that you were referring to
you can see that Wikivoyage projects taken together make no use of the
categories Gene an Scientific Article. The usage of the logarithmic y-axis
there is a necessity, otherwise we could not offer a comparison across the
project types (because the differences in usage statistics are huge).
Here's my suggestion on how to obtain a more readable (and more precise)
information:
- go to the WDCM Usage Dashboard:
http://wdcm.wmflabs.org/WDCM_UsageDashboard/
- Tab: Dashboard, and then Tab: Tabs/Crosstabs
- Enter only: _Wikivoyage in the "Search projects:" field, and select all
semantic categories in the "Search categories:" field
- Click "Apply Selection"
What you should be able to learn from the results is that on all Wikivoyage
projects taken together the total usage of Q5 (Human) is 26, and that no
items from the Gene (Q7187) or Scientific Article (Q13442814) category are
used there at all.
Important reminder. The usage statistic in WDCM has the following semantics:
- pick an item;
- count on how many pages in a particular project is that item used;
- sum up the counts to obtain the usage statistic for that particular item
in the particular project.
All WDCM Dashboards have a section titled "Description" which provides this
and similarly important definitions, as well as (hopefully) simple
descriptions of the respective dashboard's functionality.
Hope this helps.
Best,
Goran
Goran S. Milovanović, PhD
Data Analyst, Software Department
Wikimedia Deutschland
------------------------------------------------
"It's not the size of the dog in the fight,
it's the size of the fight in the dog."
- Mark Twain
------------------------------------------------
Hey folks :)
As you might already have seen in the birthday presents list there is
another birthday present: the Wikidata Concepts Monitor (WDCM -
http://wdcm.wmflabs.org). It is a tool that enables you to browse and
build an understanding of the way Wikidata is used across the
Wikimedia projects.
Here’s the technical gist behind it: Currently 789 projects have
client-side Wikidata usage tracking enabled, which allowed us to built
a system that counts the number of pages using a particular Wikidata
item per project. The count data were subjected to statistical
modeling (1) by an unsupervised statistical learning algorithm -
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation (2) that is
typically used in distributional semantics -
https://en.wikipedia.org/wiki/Distributional_semantics (3) to discover
the most natural groupings of Wikidata items in 14 semantic categories
- https://en.wikipedia.org/wiki/Topic_model (4) in respect to the way
they are used across the Wikimedia universe by the respective
communities.
We hope for the WDCM system to become a tool that helps you discover.
Beyond Wikidata’s syntax and semantics we are now beginning to learn
about its pragmatics: the way Wikidata items will cluster in respect
to how they are used is not necessarily the same as the way they go
together in the Wikidata formal ontology. WDCM is the first step
towards building an understanding of the highly complicated structure
of Wikidata usage. This system can help you discover what Wikidata
client projects are similar and in what respect, what semantic
categories of items are used more or less frequently across 789
projects, how do items connect in respect to how similarly they are
used by our communities, what are the most popular items per project,
and many more (hopefully) interesting things.
Check out the WDCM and don’t forget to let us know what you think on
the WDCM Wikidata project discussion page! I'd love to hear about any
cool or interesting things you find in the visualizations.
https://www.wikidata.org/wiki/Wikidata:Wikidata_Concepts_Monitor
Thanks to Goran who put in a lot of time to get this up and running
and everyone who helped him.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnĂĽtzig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.