Captioning Wikidata items? - Wikimedia-l

26 Sep 2018

Just a weird idea.

It is very interesting how neural nets can caption images. Quite
interesting. It is done by building a state-model of the image, that is
feed into a kind of neural net (RNN) and that net (a black box) will
transform the state-model into running text. In some cases the neural net
is steered. That is called an attention control, and it creates
relationship between parts in the image.

Swap out the image wit an item, and a virtually identical setup can
generate captions for items. The caption for an item is whats called the
description in Wikidata. It is also the first sentence with a lead-in in
Wikipedia articles. It is possible to steer the attention, that is to tell
the network what items should be used, and thus the later sentences will be
meaningful.

What that means is that we could create meaningful stub entries for the
article placeholder, that is the "AboutTopic" special page. We can't
automate this for very small projects, but somewhere between small and mid
sized languages it will start to make sense.

To make this work we need some very special knowledge, which we probably
don't have, like how to turn an item into a state-model by using the highly
specialized rdf2vec algorithm (hello Copenhagen) and verifying the stateful
language model (hello Helsinki and Tromsø).

I wonder if the only real problems are what do the community want, and what
is the acceptable error limit.

John Erling Blad
/jeblad