New subject: Kickstartet: Adding 2.2 million German organisations to Wikidata

16 Oct 2017


      Hi Sebastian,
This is huge! It will cover almost all currently existing German companies. Many of these will have similar names, so preparing for disambiguation is a concern.
A good way for such an approach would be proposing a property for an external identifier, loading the data into Mix-n-match, creating links for companies already in Wikidata, and adding the rest (or perhaps only parts of them - I’m not sure if having all of them in Wikidata makes sense, but that’s another discussion), preferably with location and/or sector of trade in the description field.
I’ve tried to figure out what could be used as key for a external identifier property. However, it looks like the registry does not offer any (persistent) URL to its entries. So for looking up a company, apparently there are two options:
-          conducting an extended search for the exact string “A&A Dienstleistungsgesellschaft mbH“
-          copying the register number “32853” plus selecting the court (Leipzig) from the according dropdown list and search that
Both ways are not very intuitive, even if we can provide a link to the search form. This would make a weak connection to the source of information. Much more important, it makes disambiguation in Mix-n-match difficult. This applies for the preparation of your initial load (you would not want to create duplicates). But much more so for everybody else who wants to match his or her data later on. Being forced to search for entries manually in a cumbersome way for disambiguation of a new, possibly large and rich dataset is, in my eyes, not something we want to impose on future contributors. And often, the free information they find in the registry (formal name, register number, legal form, address) will not easily match with the information they have (common name, location, perhaps founding date, and most important sector of trade), so disambiguation may still be difficult.
Have you checked which parts of the accessible information as below can be crawled and added legally to external databases such as Wikidata?
Cheers, Joachim
--
Joachim Neubert
ZBW – German National Library of Economics
Leibniz Information Centre for Economics
Neuer Jungfernstieg 21
20354 Hamburg
Phone +49-42834-462
Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Sebastian Hellmann
Gesendet: Sonntag, 15. Oktober 2017 09:45
An: wikidata@lists.wikimedia.orgmailto:wikidata@lists.wikimedia.org
Betreff: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata
Hi all,
the German business registry contains roughly 2.2 million organisations. Some information is paid, but other is public, i.e. the info you are searching for at and clicking on UT (see example below):
https://www.handelsregister.de/rp_web/mask.do?Typ=e
I would like to add this to Wikidata, either by crawling or by raising money to use crowdsourcing concepts like crowdflour or amazon turk.
It should meet notability criteria 2: https://www.wikidata.org/wiki/Wikidata:Notability
2. It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references. If there is no item about you yet, you are probably not notable.
The reference is the official German business registry, which is serious and public. Orgs are also per definition clearly identifiable legal entities.
How can I get clearance to proceed on this?
All the best,
Sebastian
Entity data
Saxony District court Leipzig HRB 32853 – A&A Dienstleistungsgesellschaft mbH
Legal status:
Gesellschaft mit beschränkter Haftung
Capital:
25.000,00 EUR
Date of entry:
29/08/2016
(When entering date of entry, wrong data input can occur due to system failures!)
Date of removal:
-
Balance sheet available:
-
Address (subject to correction):
A&A Dienstleistungsgesellschaft mbH
Prager Straße 38-40
04317 Leipzig
--
All the best,
Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lthttp://www.w3.org/community/ld4lt
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata