On Mon Feb 09 2015 at 13:09:27 Magnus Manske <magnusmanske@googlemail.com> wrote:

On Mon Feb 09 2015 at 13:00:35 Daniel Kinzler <daniel.kinzler@wikimedia.de> wrote:

Since wb_terms has one row per term, and a field for the term type, it would be
simple enough to inject "auto-descriptions". The only issue is that wb_terms is
already pretty huge, and adding automatic descriptions in *all* languages would
likely bloat it a lot more. Language variants could be omitted, but still -
that's a lot of data...

It would be a quick'n'dirty solution. But it highlights an issue: We'd have the same problem with manual descriptions, if they were to arrive in large numbers.

There's always Yet Another Table. Maybe a description would be generated on-the-fly only if a Wikidata page is visited in a language, and removed after ~1 month of "non-viewing"? That should keep the table short enough, but would require extra effort for API calls and dumps, provided those should show descriptions for /all/ languages.

Then again there's the Labs hadoop cluster, used for Analytics IIRC. That sounds like a way to process and store vast amounts of small, self-contained datasets (description strings). Would tie the solution to Wikimedia, though, and require a lot of engineering effort to get started.