Hi Magnus, hi Daniel,
I don't think file size should be our primary concern here. What may
seem big today will be negligible in a few years. Having all data in one
place is just easier to work with. I am happy to wait for another 30min
for a download if it saves me from implementing another Web service
connector in my own code. Compute time is cheap, disk space is cheap,
human labour is expensive.
Maybe the whole size discussion is a bit of a red herring here anyway.
If we are worried about file size, there would maybe be better ways of
reducing it. We can split the contents into several smaller dump files,
not just for descriptions. We are already doing this when creating RDF
dumps, and it would be easy for us to do the same for JSON. We could do
this immediately if someone needs it (just let me know and we will set
it up for you). However, if we want to provide smaller files, a more
effective method would be to split by language rather than by term type:
all labels in all languages would still be much bigger than
labels+descriptions+aliases in English only, and many applications will
not need labels in 300 languages.
Anyway, as I said, I do not mind whether the auto-descriptions are
stored like normal descriptions or whether they are added to the dump
files "last minute" when generating them. I just need the descriptions
in the dumps.
Cheers,
Markus
On 09.02.2015 12:28, Daniel Kinzler wrote:
Am 09.02.2015 um 12:25 schrieb Magnus Manske:
But wouldn't it be better to keep the dump as
it is, for those who don't want
triple size (just inventing a number here), and have one separate, or even
per-language, dump with just the automated descriptions, for those who want that?
Possibly. Depends on how much more data this would actually be. Which also
depends on whether we would omit descriptions in languages that can easily be
covered by language fallback (e.g. no separate descriptions in de-ch and de-at).
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/