According to https://www.youtube.com/watch?v=TLuM4E6IE5U : "Semantic
annotation is the process of attaching additional information to various
concepts (e.g. people, things, places, organizations etc) in a given
text or any other content. Unlike classic text annotations for reader's
reference, semantic annotations are used by machines to refer to."
On Wikipedia a red link is a link to an article that hasn't been created
(yet) in that language. Often another language does have an article
about the subject or at least we have a Wikidata item about the subject.
Take for example
https://nl.wikipedia.org/w/index.php?title=Friedrich_Ris . It has over
250 incoming links, but the person doesn't have an article in Dutch. We
have a Wikidata item with links to 7 Wikipedia's at
https://www.wikidata.org/wiki/Q116510 , but no way to relate
Wouldn't it be nice to be able to make a connection between the red link
on Wikipedia and the Wikidata item?
Let's assume we have this list somewhere. We would be able to offer all
sorts of nice features to our users like:
* Hover of the link to get a hovercard in your favorite backup language
* Generate an article placeholder for the user with basic information in
the local language
* Pre-populate the translate extension so you can translate the article
from another language
(probably plenty of other good uses)
Where to store this link? I'm not sure about that. On some Wikipedia's
people have tested with local templates around the red links. That's not
structured data, clutters up the Wikitext, it doesn't scale and the
local communities generally don't seem to like the approach. That's not
the way to go. Maybe a better option would be to create a new property
on Wikidata to store the name of the future article. Something like
Q116510: Pxxx -> (nl)"Friedrich Ris". Would be easiest because the
infrastructure is there and you can just build tools on top of it, but
I'm afraid this will cause a lot of noise on items. A couple of
suggestions wouldn't be a problem, but what is keeping people from
adding the suggestion in 100 languages? Or maybe restrict the usage that
a Wikipedia must have at least 1 (or n) incoming links before people are
allowed to add it?
We could create a new projects on the Wikimedia Cloud to store the
links, but that would be quite the extra time investment setting up
What do you think?
I'm looking for Wikidata bots that perform accuracy audits. For example,
comparing the birth dates of persons with the same date indicated in
databases linked to the item by an external-id.
I do not even know if they exist. Bots are often poorly documented, so I
appeal to the community to get some example.
I don't have enough knowledge about neural nets to evaluate the email
below, but I'm forwarding it in case it's of interest to others on two
( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message ---------
From: John Erling Blad <jeblad(a)gmail.com>
Date: Wed, Sep 26, 2018 at 6:23 PM
Subject: [Wikimedia-l] Captioning Wikidata items?
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Just a weird idea.
It is very interesting how neural nets can caption images. Quite
interesting. It is done by building a state-model of the image, that is
feed into a kind of neural net (RNN) and that net (a black box) will
transform the state-model into running text. In some cases the neural net
is steered. That is called an attention control, and it creates
relationship between parts in the image.
Swap out the image wit an item, and a virtually identical setup can
generate captions for items. The caption for an item is whats called the
description in Wikidata. It is also the first sentence with a lead-in in
Wikipedia articles. It is possible to steer the attention, that is to tell
the network what items should be used, and thus the later sentences will be
What that means is that we could create meaningful stub entries for the
article placeholder, that is the "AboutTopic" special page. We can't
automate this for very small projects, but somewhere between small and mid
sized languages it will start to make sense.
To make this work we need some very special knowledge, which we probably
don't have, like how to turn an item into a state-model by using the highly
specialized rdf2vec algorithm (hello Copenhagen) and verifying the stateful
language model (hello Helsinki and Tromsø).
I wonder if the only real problems are what do the community want, and what
is the acceptable error limit.
John Erling Blad
Wikimedia-l mailing list, guidelines at:
New messages to: Wikimedia-l(a)lists.wikimedia.org
Do you people think that https?://(\S+\.)+\S+(/\S*)?<https://regex101.com/?regex=https?://(\S+\.)+\S+(/\S*)?> can be improved? I think it can.
For instance, why require the protocol spec? For most use case scenarios (just clicking it), it's not useful. Arguably, it's not part of the website.
Also, the regex is lax. That's not a big issue, but a more strict one could prevent duplicate urls from being added by bots and data imports. Aside from the http/s discussion, constraints like these:
- warn against urls ending in /
- warn against uppercase
would help data imports from external databases or general batch edits with bots to avoid adding duplicate values.
Hi Team !
+Dan Brickley <danbri(a)google.com> +Lydia Pintscher
Schema.org mapping is progressing on every new Weekly Summary "Newest
That's great ! And thanks to Léa and team for providing the new properties
What's not great, is many times, we cannot apply a "broader external class"
to map to a Schema.org Type. This is because "broader concept"
https://www.wikidata.org/wiki/Property:P4900 is constrained to "qualifiers
only and not for use on statements".
We are able to use the existing "narrower external class"
<https://www.wikidata.org/wiki/Property:P3950> , for example like here on
this topic, https://www.wikidata.org/wiki/Q7406919 , but there is no
"broader external class" property in Wikidata yet from what we see.
It would be *awesome* if someone could advocate for that new property to
help map Wikidata to external vocabularies that have broader concepts quite
often, such as Schema.org.
The next IRC office hour with the Wikidata team will take place on
September 25th, from 18:00 to 19:00 (UTC+2, Berlin time
on the channel #wikimedia-office.
As usual, we will present you some news from the development team, the
projects to come, and collect your feedback.
If you have any special topic you'd like to see as a focus, please share it
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.