Dear All,

although in Italy these data are normally not available (not even the basic data) from the chambers of commerce, there are some open data from which we could extract several identifiers - of course these are biased toward the suppliers of Public Administrations, because contracting with PA is the trigger for being listed in these Open Data.

In the context of a broader effort to upload this kind of data in Wikidata, as the one which seems to emerge from this thread, the firm which I manage may be willing to contribute about half a million couples of labels and VAT IDs... it's a relatively thin dataset - in the sense that you just have the name of the firm and the VAT ID, and possibly a link to a portal we're building in which you may gather additional information about the activity of this firm with the Italian public administration - but, as I was mentioning, Italian firm data are quite rare (they are not even available on OpenCorporates.com).

By the way, https://www.wikidata.org/wiki/Property:P3608 (EU VAT number) already exists and may provide a sufficient identifier in most cases, since in most cases the country ISO code (e.g. IT for Italy) + the national VAT ID does generated the EU VAT number (the actual algorithm may be a bit more complex, but it's documented). (That said, there are also national identifiers which may be worth creating, such as the number of registration at national chambers of commerce, etc.)

About the value of these data on Wikidata, starting from our use case, I think that having permanent URIs for all firms on Wikidata would provide, for instance, great value for several anti-corruption projects around the world. (This could also provide a place to trace some international links among companies, which are not always readily available today.) That said, I perfectly understand the concerns of Andra in terms of scalability and maintenance, and this is one of the reasons I did not think of donating these data to Wikidata so far. 

I'll try to follow these discussions, but please - Sebastian or others - feel free to ping me if the project goes on and you want to include these Italian data.

Best,

Federico



On Mon, Oct 16, 2017 at 10:25 AM, Andra Waagmeester <andra@micelio.be> wrote:
There is an equal size of data on Belgian enterprises available. with the same objective to enrich wikidata with enterprise data I recently proposed the following property: https://www.wikidata.org/wiki/Wikidata:Property_proposal/NACE_code

However, after some talks with others in the Wikidata community, I recently have some second thoughts on whether or not a full dump of these type of datasets are valuable enrichments of Wikidata. Adding 2 million items with additional statement per item would be quite an enlargement of Wikidata. If we would bot add all business of both Belgium and Germany, we would have 4 million of new items, which currently would count for 10% of all of Wikidata. I am not sure what this would mean in term scalability and if it would cause any scalability issues. 

Maybe a use-case driven approach here would be more appropriate. We could think of a bot that would source both the trade registers of the different countries when a specific use case would vouch for the inclusion of trade data. 

Just my 2cts

Cheers, 

Andra

On Mon, Oct 16, 2017 at 9:48 AM, Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> wrote:

Thanks, done.

https://www.wikidata.org/wiki/Wikidata:Project_chat#Handelsregister


On 15.10.2017 22:10, Yaroslav Blanter wrote:
Hi Sebastian,

I would say the best way is to file a request for the permissions for the bot

https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot

and possibly leave a message on the Project Chat

https://www.wikidata.org/wiki/Wikidata:Project_chat

Cheers
Yaroslav

On Sun, Oct 15, 2017 at 9:44 AM, Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> wrote:

Hi all,

the German business registry contains roughly 2.2 million organisations. Some information is paid, but other is public, i.e. the info you are searching for at and clicking on UT (see example below):

https://www.handelsregister.de/rp_web/mask.do?Typ=e


I would like to add this to Wikidata, either by crawling or by raising money to use crowdsourcing concepts like crowdflour or amazon turk.


It should meet notability criteria 2: https://www.wikidata.org/wiki/Wikidata:Notability

2. It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references. If there is no item about you yet, you are probably not notable.


The reference is the official German business registry, which is serious and public. Orgs are also per definition clearly identifiable legal entities.

How can I get clearance to proceed on this?

All the best,
Sebastian



Entity data


Saxony District court Leipzig HRB 32853 – A&A Dienstleistungsgesellschaft mbH
Legal status: Gesellschaft mit beschränkter Haftung  
Capital: 25.000,00 EUR
Date of entry: 29/08/2016
(When entering date of entry, wrong data input can occur due to system failures!)
Date of removal: -
Balance sheet available: -
Address (subject to correction): A&A Dienstleistungsgesellschaft mbH
Prager Straße 38-40
04317 Leipzig


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata