Hi Magnus, hi Daniel,
I don't think file size should be our primary concern here. What may seem big today will be negligible in a few years. Having all data in one place is just easier to work with. I am happy to wait for another 30min for a download if it saves me from implementing another Web service connector in my own code. Compute time is cheap, disk space is cheap, human labour is expensive.
Maybe the whole size discussion is a bit of a red herring here anyway. If we are worried about file size, there would maybe be better ways of reducing it. We can split the contents into several smaller dump files, not just for descriptions. We are already doing this when creating RDF dumps, and it would be easy for us to do the same for JSON. We could do this immediately if someone needs it (just let me know and we will set it up for you). However, if we want to provide smaller files, a more effective method would be to split by language rather than by term type: all labels in all languages would still be much bigger than labels+descriptions+aliases in English only, and many applications will not need labels in 300 languages.
Anyway, as I said, I do not mind whether the auto-descriptions are stored like normal descriptions or whether they are added to the dump files "last minute" when generating them. I just need the descriptions in the dumps.
Cheers,
Markus
On 09.02.2015 12:28, Daniel Kinzler wrote:
Am 09.02.2015 um 12:25 schrieb Magnus Manske:
But wouldn't it be better to keep the dump as it is, for those who don't want
triple size (just inventing a number here), and have one separate, or even
per-language, dump with just the automated descriptions, for those who want that?
Possibly. Depends on how much more data this would actually be. Which also
depends on whether we would omit descriptions in languages that can easily be
covered by language fallback (e.g. no separate descriptions in de-ch and de-at).
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l