I've been quite interested in the idea of compiling a list of companies
into Wikidata. I wanted to see how well this fits with Wikidata's goals.
There are company databases like Hoovers/DNB, Mattermark, LinkedIn,
Crunchbase, DataFox, etc. which are very popular, but not as open as
Wikidata. These sites are hugely valuable to folks like investors, job
candidates, sales organizations, etc. and I'd love to see an open
My company could contribute engineering resources towards compiling this
data in Wikidata. E.g. maybe building a tool which allows to pull and
review facts from company websites, SEC filings, news articles referencing
funding amounts or acquisitions, etc. into wikidata. This is something that
we could do on our own behind closed doors, but it seems more interesting
to do it in the open where others can receive value from it and contribute
back such that the dataset grows and becomes more valuable.
Many of these companies may not be notable or interesting on their own.
However, having an entire compiled dataset where you can lookup any company
is extremely interesting. I'd love Wikidata to be the de facto go-to for
this type of data instead of the more closed organizations mentioned above.
Is this the type of project that's interesting to Wikidata? If so, I'd be
very interested in learning more about the types of ways that we can engage
with Wikidata and helping to develop a framework for contributing.
On Fri, Jul 3, 2015 at 5:47 AM, Jane Darnell <jane023(a)gmail.com> wrote:
Well that really depends on the data actually. There
are lots of printed
datasets and if someone has those online and can no longer host them, then
we might want to harvest some, if not all of it. I am thinking of datasets
of large collections of <whatever>. I recall not long ago a museum of music
records became defunct and they were looking for a home for their database.
We couldn't do anything for them then but we could put it in Mix-n-Match
today (assuming the data is all published material that is considered a
reliable source yada yada...)
On Fri, Jul 3, 2015 at 2:10 PM, Andrew Gray <andrew.gray(a)dunelm.org.uk>
On 1 July 2015 at 22:51, Quim Gil
* Where to publish entire datasets... Something
tells me that this is
the most urgent and important problem that we
have, but the community
definitely knows better, so correct me if I'm wrong. I think our main
I would agree with this. There has historically been a lot of
vagueness around the word "data", and a lot of vague suggestions in
the early days when Wikidata was still being created... and as a
result people sometimes get the impression that Wikidata intends to be
a kind of generalised data repository. This is a bit like assuming
Wikipedia will take anything that's got words :-)
I wonder if it would be good to identify a couple of good, reliable,
repositories we can encourage people to use for this sort of material?
This means that even if we have to say to a potential partner "sorry,
this isn't what we want", we can still give them advice on how to get
it released and available in the most appropriate way. Better than a
- Andrew Gray
Wikidata mailing list
Wikidata mailing list