In Freebase, we had bot scripts that went through and removed "Lists of
Things" topic entities since they are lists of entities and not useful
clumped together and normalized in a graph database.
Does Wikidata have something similar or a user review process for deletion
of these ?
Ex. List of tallest buildings in Wuhan -
There are times when a few keywords are often interchanged with each other
in different languages and dialects.
It seems advantageous to somehow tell Wikidata Search that when someone
types Harvard College to interchange and also look for Harvard University,
and vice versa.
An interchange mapping table might suffice just for this use case, or
something else, dunno...
How far in the future is this feature ? Roadblocks ?
Hi. Currently, the dump service offers two different dumps for wikidata:
* XML: http://dumps.wikimedia.org/wikidatawiki/latest/
* JSON: http://dumps.wikimedia.org/wikidatawiki/entities/
According to http://www.wikidata.org/wiki/Wikidata:Database_download,
the JSON dump is listed as the recommended dump format. Also, at the
time of writing, the JSON dump has been generating regularly every
week whereas the XML dump has been delayed for 2+ months.
Going forward, will both dumps continue to be supported? Or will the
XML dump be phased out and only the JSON dump remain? Or are these
plans still to be determined based on upcoming changes to the dumping
infrastructure as per https://phabricator.wikimedia.org/T88728?
If the JSON dump is to be the sole data format, is there any way to
address the following omissions?
* '''Non-JSON pages not available''': The JSON dump only provides JSON
content-type pages in the main and property namespaces. Pages in other
namespaces are not available, including the Main Page. For example,
here are the counts from the 2015-03-30 dump
id name count
---- ------------ -----
4 Wikidata 10280
8 MediaWiki 2244
10 Template 4701
12 Help 779
14 Category 3073
828 Module 175
1198 Translations 83524
* '''Page metadata not available''' : For the JSON pages, the
page_touched and page_id is not available.
* '''Other tables not provided''': Other tables are not provided,
notably categorylinks and page_props
Thanks in advance for any information.
I recently introduced wikidata to a (very computationally savvy) colleague
by sending him this link:
His response is indicative of an interface problem that I think is actually
"Is there a simple way to get the RDF for a given concept? The page seems
to only present the english names for the concept and its linked concepts."
Leaving aside RDF, it is really not straightforward for newcomers to get
from a concept page like that to the corresponding structured data. This
could be solved with the consistent addition of a simple link like "view
json/xml/rdf" to each of the concept pages on wikidata. They would just be
links to the API calls: e.g.
http://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q423111 in this
As the concept pages themselves get tossed around a lot, such an addition
could be extremely valuable in teaching the uninitiated what its all about
and would come at very little cost - to me, this button is akin to the
'view source' action on web pages - an absolutely fundamental part of how
the web grows - even now.
I am investigating some concepts about signal processing and relating them
to data manipulation. It is somehow difficult because the way computer
scientists relate to concepts is very dogmatic, something is either black
or white, however I have not found much on "things that under certain
circumstances can be considered black-ish, and under another set of
circumstances can be considered white-ish"
In signal processing there is the concept of amplitude which is just the
signal strength. For humans language is like an amplitude communication
process where the receiver picks up not only the signal, but also its
amplitude depending on context, awareness, previous knowledge, etc. factors
which in turn can be considered waves being processed by the ontological
biological-organizational complex, the body-mind.
It is tough to describe that a certain concept might have a certain
amplitude in some situations and other amplitude in other situations, and
perhaps even harder to make a human interface for it.
Has anyone attempted it in the past? If Q items are not static entities,
what is the best way to convey that they are not? And is it possible or
desirable at all?
Perhaps these questions are more suitable for a Wikidata 2.0, or perhaps it
is already doable, who knows.
After looking at the list of items without any claims, I was wondering if I
could help with the cleanup by checking the categories I am familiar with.
Is there any way to get a breakdown of #claims per item given a list of
items from, say, a Wikipedia category?
Thanks in advance,
Looking at more "orphaned items", I found several pairs of items that
look like these two:
Same label and description, same coordinates, no Wikidata articles,
"identified" by different Historic Scotland IDs. If you follow the ID
links, however, you can see that the first of the items has data that
does not match the ID, while the second is correct.
The direct question is: How to fix these errors? There are other cases,
such as Q17572335 and Q17570206. I did not do a systematic study, but
something seems to have gone wrong here in more than one case. I cannot
fix mass edits one by one without having a clue what has happened and why.
The indirect question is: How can I find out who did this and maybe ask
the person to fix it? The history is of no help (Reinheitsgebot/Widar).
Posting every error in Wikidata to this list to ask also seems like a
Finally, the technical question is: Why is this even possible? I thought
that, in each language, label+description are a key (globally unique),
yet here we have many pairs of items with exactly the same label and
description. Or is the problem that no description was entered and so
the system does not apply the key? In any case, a data integration
helper application that looks at equal labels+descriptions would
probably make sense, especially for orphaned items. (As I know Wikidata,
someone might well reply to this email with a link to where this is
already found ;-).