I've been quite interested in the idea of compiling a list of companies into Wikidata. I wanted to see how well this fits with Wikidata's goals. There are company databases like Hoovers/DNB, Mattermark, LinkedIn, Crunchbase, DataFox, etc. which are very popular, but not as open as Wikidata. These sites are hugely valuable to folks like investors, job candidates, sales organizations, etc. and I'd love to see an open collaborative approach.

My company could contribute engineering resources towards compiling this data in Wikidata. E.g. maybe building a tool which allows to pull and review facts from company websites, SEC filings, news articles referencing funding amounts or acquisitions, etc. into wikidata. This is something that we could do on our own behind closed doors, but it seems more interesting to do it in the open where others can receive value from it and contribute back such that the dataset grows and becomes more valuable.

Many of these companies may not be notable or interesting on their own. However, having an entire compiled dataset where you can lookup any company is extremely interesting. I'd love Wikidata to be the de facto go-to for this type of data instead of the more closed organizations mentioned above.

Is this the type of project that's interesting to Wikidata? If so, I'd be very interested in learning more about the types of ways that we can engage with Wikidata and helping to develop a framework for contributing.


On Fri, Jul 3, 2015 at 5:47 AM, Jane Darnell <jane023@gmail.com> wrote:
Well that really depends on the data actually. There are lots of printed datasets and if someone has those online and can no longer host them, then we might want to harvest some, if not all of it. I am thinking of datasets of large collections of <whatever>. I recall not long ago a museum of music records became defunct and they were looking for a home for their database. We couldn't do anything for them then but we could put it in Mix-n-Match today (assuming the data is all published material that is considered a reliable source yada yada...)

On Fri, Jul 3, 2015 at 2:10 PM, Andrew Gray <andrew.gray@dunelm.org.uk> wrote:
On 1 July 2015 at 22:51, Quim Gil <qgil@wikimedia.org> wrote:

> * Where to publish entire datasets... Something tells me that this is not
> the most urgent and important problem that we have, but the community
> definitely knows better, so correct me if I'm wrong. I think our main use

I would agree with this. There has historically been a lot of
vagueness around the word "data", and a lot of vague suggestions in
the early days when Wikidata was still being created... and as a
result people sometimes get the impression that Wikidata intends to be
a kind of generalised data repository. This is a bit like assuming
Wikipedia will take anything that's got words :-)

I wonder if it would be good to identify a couple of good, reliable,
repositories we can encourage people to use for this sort of material?
This means that even if we have to say to a potential partner "sorry,
this isn't what we want", we can still give them advice on how to get
it released and available in the most appropriate way. Better than a
frustrating back-and-forth...


- Andrew Gray

Wikidata mailing list

Wikidata mailing list