I took another look at the output that is created with the data, and I am at the same time delighted and astonished by the capability and creativity of the Wikipedia community to solve such tasks with MediaWiki template syntax and at the same time horrified by the necessity of the solution taken.
Adding to my own explanation of how Wikidata would help here: we plan to implement some form of query answering capabilities in phase III which would actually not work on the full items, as described in my previous mail, but just on some smarter derived representation of the data. So specific queries -- the possible expressivity is not defined yet -- would be performed much more efficiently than performing them on the fly over all relevant items. (That is covered by the technical proposal as item P3.2 in http://meta.wikimedia.org/wiki/Wikidata/Technical_proposal#Technical_requirements_and_rationales_3).
Cheers, Denny
2012/9/21 Denny Vrandečić denny.vrandecic@wikimedia.de:
2012/9/21 Strainu strainu10@gmail.com:
Well, you said something about Wikidata. But even if the client Wiki would not need to load the full census, can it be avoided on Wikidata?
Talking about the template that Tim listed: https://fr.wikipedia.org/w/index.php?title=Mod%C3%A8le:Donn%C3%A9es_PyrF1-2009&action=edit
I was trying to understand the template and its usage. As far as I can tell it maps a ZIP (or some other identifier) of a commune to a value (maybe a percentage or population, sorry, the documentation did not exist and my French is rusty).
So basically it provide all values for a given property. Differently said that Wikipage implements a database table with the columns "key" and "value" and holds the whole table. (I think when Ward Cunningham described a wiki "the simplest online database that could possibly work", this is *not* what he envisioned.)
In Wikidata we are not storing the data by the property, but for every item. Put differently, every row in that template would become one statement for the item identified by its key.
So Wikidata would not load the whole census data for every article, but only the data for the items that is actually requested.
On the other hand, we would indeed load the whole data for one item on the repository (not the Wikipedias), which might lead to problems with very big items at some points. We will test make tests to see how this behaves once these features have been developed, and then see if we need to do something like partition by property groups (similar as Cassandra does it).
I hope that helps, Denny