That is a tough question. We are pretty sure that we technically scale quite well, and there is no reason that the community should restrict itself out of technical reasons. If the number of item suddenly increases by one or two orders of magnitudes, we would probably meet a few hiccups on the way, but the architecture should be able to deal with that.

What I am much more worried about is, is the scaling of the community though. One of my statements from my Wikidata talks is "we do not want to become the biggest data heap out there, but rather aim for an organic community, that is strong and resilient enough to maintain the data that is being collected." See also Wikidata requirement #6 <http://meta.wikimedia.org/wiki/Wikidata/Notes/Requirements> (a page worth re-reading).

Sometimes it might sense for Wikidata to bridge and connect to external data sources that have their own way of maintenance and curation. Should the dataset really be merged into Wikidata? Is the data wikilike? Is it used in the Wikimedia projects? Or could it be also provided as a linked open dataset, which is referenced from Wikidata?

Just to give an example: sure, one could theoretically start to collect temperature data of a city in hourly measurements*, but it could maybe make more sense to point to an external site that collects this data in a more efficient format, provide the mapping identifiers, and allow for a bot to go there and discover the data. Wikidata in turn could provide an aggregation of the data, which indeed would be used on e.g. Wikipedia and Wikivoyage, but leave the full dataset on the external site.

(Which, by the way, would also be a viable solutions for datasets which have incompatible licenses).

I hope this makes sense, Cheers,

Denny

* Actually, this kind of data would probably kill us faster than creating many items, as it would make a single item be ginormous. We scale not that well in that direction.

2013/3/14 Benjamin Good <ben.mcgee.good@gmail.com>

I've been struggling to understand what should go into wikidata and what should not. I see that this is because it hasn't been decided yet ;)
http://www.wikidata.org/wiki/Wikidata_talk:Notability

In helping the community to make this decision I think it would be really helpful for the developers to weigh in on the technical capacity of the envisioned/realized wikidata infrastructure. If we know how big the system could realistically be and continue to work well technically, it might help discussions about how much and what kind of content we should put into it. If the plan is to cope with only a few tens of millions of subjects that is quite different than if the plan allows for the potential creation of billions of items. (Suggesting less inclusive versus more inclusive policies).

?

-Ben

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.