Thanks, Gerard.
I've tried to get some idea of what these items are, but my SPARQL just times out. It's not clear to me, you see, how many of these items are not already, in practice, "knowable in any language". If they represent things with names or titles and they are not notable enough for a Wikipedia article in any language, perhaps their name or title is all anyone needs. Or, perhaps, transliteration into their language's script. Some of the items must actually represent translations or derivative works of some other item.
Anyway, to keep it simple, I'd agree that automatically generating a language-neutral description for a Wikidata Item could be a high priority. But the higher priority, it seems to me, is identifying which items would most benefit from such a description. Some of those could be fixed right now! In my mind, though, I'm already ignoring people and their works, given at least one external identifier. Then I'm thinking places just need a decent geotag... Maybe what we really need is a high quality search string to be derived, so that anyone can make use of their search-engine of choice.
For Wikipedia items, again, I agree that exploring all the bi-directional links should be fruitful. As a start, we could extract the structural context of the link (that it is a link from an identifiable section of the Wikipedia page and, if it is the case, that it links to an identifiable section of a Wikipedia page). We could enhance this data with the textual context of the link (the sentence it is in and the adjacent sentences, with their links). (At this stage, I'd be slightly concerned about copyright implications, so we might leave the textual context in its original Wikipedia. Here it could already enhance the backlinks ("What links here") with meaningful context.) Interpretation of contextual links into a language-neutral form would then allow equivalent links to be identified in other Wikipedias. If we establish that several Wikipedias seem to have semantically equivalent links, I would think it reasonable to quote the textual context from just one of them as a reference in support of the generalized claim in the copyright-free domain (where only the idea is expressed and no person is author of its language-neutral expression).
I don't know anything about LSJBOT so, as always, a link would be helpful.
Best regards,
Al.
On Sunday, 9 August 2020, Gerard Meijssen <
gerard.meijssen@gmail.com> wrote:
Hoi,
I am amazed by all the competing ideas, notions I have read on the mailing list so far. It is bewildering and does not give me a notion of what is to be done.
I have thought about it and for me it is simple. For every Wikipedia article there are two Wikidata items that have no Wikipedia article. It follows that the first item of business is to make these knowable in any language. The best way to do this is by providing automated descriptions that aid in disambiguation.
When a Wikipedia article exists, it links to many articles. All of them have their own Wikidata item and all can be described either in a Wikidata triple or in structured text.
When sufficient data is available, a text can be generated. This has been demonstrated by LSJBOT and it is why a Cebuano Wikipedia has so many articles. A template as used by LSJBOT can be adapted for every language.
My point is that all the research in the world makes no difference when we do not apply what we know.
Thanks,
GerardM