Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015.
I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people...
Thanks, GerardM
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
Added an automatic "find images" link to the template; that Brazil list looks awfully sparse in terms of pictures :-(
On Wed, Jun 3, 2015 at 6:17 AM Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015.
I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people...
Thanks, GerardM
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 3 June 2015 at 06:16, Gerard Meijssen gerard.meijssen@gmail.com wrote:
list of people who died in Brazil in 2015 [1].
page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
Those are great - but why are the dates written into the page source, rather than called from Wikidata?
Before they are used in Wikipedia (at least, en.WP, though the same could apply elsewhere), I'd like to see two changes:
* Add a column for "Wikidata ID" * Use a template for each row.
You can see an example of the latter (albeit not drawn from Wikidata) on, for example,
https://en.wikipedia.org/wiki/List_of_public_art_in_the_City_of_Westminster
That said, it would be even better if we could just place a temaplte like {{Wikidata list}} on a page, and have each row added by that. Soon, no doubt...
On Wed, Jun 3, 2015 at 11:02 AM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 3 June 2015 at 06:16, Gerard Meijssen gerard.meijssen@gmail.com wrote:
list of people who died in Brazil in 2015 [1].
page for people who died in the Netherlands in 2015 [2]. It is trivially
easy to do this and, the result is
great. The result looks great, it can be used for any country in any Wikipedia
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
Those are great - but why are the dates written into the page source, rather than called from Wikidata?
nl.wp already has "remote" property access, and there I use Wikidata calls; everywhere else should be switched over end of this months, so I'll switch to Wikidata calls once that's done. Not sure if Wikidata itself has it?
Before they are used in Wikipedia (at least, en.WP, though the same could apply elsewhere), I'd like to see two changes:
- Add a column for "Wikidata ID"
Can do.
* Use a template for each row.
You can see an example of the latter (albeit not drawn from Wikidata) on, for example,
https://en.wikipedia.org/wiki/List_of_public_art_in_the_City_of_Westminster
That said, it would be even better if we could just place a temaplte like {{Wikidata list}} on a page, and have each row added by that. Soon, no doubt...
{{public art row}} is ... very topic-specific. Listeria allows for all kinds of lists. Species? Award winners? Artwork in a museum? Philosophical concepts?
One-row-template-fits-all would have to be very generic, to the point where the difference between table row and template parameters would be negligible IMO. That {{sort}} thing looks interesting though.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 3 June 2015 at 11:20, Magnus Manske magnusmanske@googlemail.com wrote:
- Use a template for each row.
You can see an example of the latter (albeit not drawn from Wikidata) on, for example,
https://en.wikipedia.org/wiki/List_of_public_art_in_the_City_of_Westminster
{{public art row}} is ... very topic-specific. Listeria allows for all kinds of lists. Species? Award winners? Artwork in a museum? Philosophical concepts?
If the template takes a Wikidata value and calls the relevant properties from Wikidata, then all Listeria would need to know would be the name of the template and - as now - the criteria for inclusion.
One-row-template-fits-all would have to be very generic, to the point where the difference between table row and template parameters would be negligible IMO.
I wouldn't recommend that.
That {{sort}} thing looks interesting though.
Date formatting/ sorting templates would also be useful.
On Wed, Jun 3, 2015 at 1:12 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 3 June 2015 at 11:20, Magnus Manske magnusmanske@googlemail.com wrote:
- Use a template for each row.
You can see an example of the latter (albeit not drawn from Wikidata) on, for example,
https://en.wikipedia.org/wiki/List_of_public_art_in_the_City_of_Westminster
{{public art row}} is ... very topic-specific. Listeria allows for all
kinds
of lists. Species? Award winners? Artwork in a museum? Philosophical concepts?
If the template takes a Wikidata value and calls the relevant properties from Wikidata, then all Listeria would need to know would be the name of the template and - as now - the criteria for inclusion.
Ah, you mean there should be a row template for each type of list (otherwise, how to specify "relevant"?)!
I could add an option to use a specific row template instead.
One-row-template-fits-all would have to be very generic, to the point
where
the difference between table row and template parameters would be
negligible
IMO.
I wouldn't recommend that.
That {{sort}} thing looks interesting though.
Date formatting/ sorting templates would also be useful.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
As I suggested before, it is probably better to use the list item as a reference. In this case Andy, your query for Listeria should be this: claim[361:6635653]
This will give you 92 items according to Autolist1 and I think this corresponds to you enwiki list. That list has lots of hand-formatting, but it is trivial to make a transform from the Listeria live data to your manual pretty for-the-readers list.
On Wed, Jun 3, 2015 at 3:44 PM, Magnus Manske magnusmanske@googlemail.com wrote:
On Wed, Jun 3, 2015 at 1:12 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 3 June 2015 at 11:20, Magnus Manske magnusmanske@googlemail.com wrote:
- Use a template for each row.
You can see an example of the latter (albeit not drawn from Wikidata) on, for example,
https://en.wikipedia.org/wiki/List_of_public_art_in_the_City_of_Westminster
{{public art row}} is ... very topic-specific. Listeria allows for all
kinds
of lists. Species? Award winners? Artwork in a museum? Philosophical concepts?
If the template takes a Wikidata value and calls the relevant properties from Wikidata, then all Listeria would need to know would be the name of the template and - as now - the criteria for inclusion.
Ah, you mean there should be a row template for each type of list (otherwise, how to specify "relevant"?)!
I could add an option to use a specific row template instead.
One-row-template-fits-all would have to be very generic, to the point
where
the difference between table row and template parameters would be
negligible
IMO.
I wouldn't recommend that.
That {{sort}} thing looks interesting though.
Date formatting/ sorting templates would also be useful.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 3 June 2015 at 14:44, Magnus Manske magnusmanske@googlemail.com wrote:
Ah, you mean there should be a row template for each type of list (otherwise, how to specify "relevant"?)!
Yes.
I have an idea for a prototype, but no time right now. Watch this space...
I could add an option to use a specific row template instead.
Thank you.
Example for row template use: https://en.wikipedia.org/wiki/User:Magnus_Manske/listeria_test
On Wed, Jun 3, 2015 at 3:26 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 3 June 2015 at 14:44, Magnus Manske magnusmanske@googlemail.com wrote:
Ah, you mean there should be a row template for each type of list (otherwise, how to specify "relevant"?)!
Yes.
I have an idea for a prototype, but no time right now. Watch this space...
I could add an option to use a specific row template instead.
Thank you.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance.
Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost.. Thanks, GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in...
On 3 June 2015 at 07:16, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015.
I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people...
Thanks, GerardM
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
On 03.06.2015 22:44, Gerard Meijssen wrote:
Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance.
Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost..
DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]).
So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-).
DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration.
To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference.
Regards,
Markus
[1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397
Thanks, GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in...
On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com mailto:gerard.meijssen@gmail.com> wrote:
Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, I care for the English Wikipedia data but the DBpedia for the English Wikipedia is NOT the same as the one for Russian, Dutch, French or German. They are distinct.
I welcome the move by the English Wikipedia but it is only a small subset of the people who die. With the inclusion of all the DBpedia data ie for all Wikipedias it harvests we are substantially more than en.wp alone.
The one thing I hate about not cooperating is that we are backward as a result and aim to learn the lessons that are already learned. Thanks, GerardM
On 4 June 2015 at 00:18, Markus Krötzsch markus@semantic-mediawiki.org wrote:
On 03.06.2015 22:44, Gerard Meijssen wrote:
Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance.
Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost..
DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]).
So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-).
DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration.
To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference.
Regards,
Markus
[1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397
Thanks,
GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in...
On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com mailto:gerard.meijssen@gmail.com> wrote:
Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 03.06.2015 22:44, Gerard Meijssen wrote:
Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance.
Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost..
DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]).
Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence).
e.g. we have 28K items with double birth dates one from the infobox and another from persondata.
select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)} http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&que...
The persondata template is used in German Wikipedia as well. The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english)
Best, Dimitris
So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-).
DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration.
To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference.
Regards,
Markus
[1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397
Thanks,
GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in...
On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com mailto:gerard.meijssen@gmail.com> wrote:
Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Dmitris,
Interesting situation. If you have contradictory data from several templates, then the challenge will be to find out which information is correct for importing it to Wikidata. Could your dataset maybe become an input to the primary sources tool [1]? Then Wikidata users could help to clean the dataset and try to find references (as you know, references are quite important for Wikidata, but it would really be asking too much of DBpedia to provide these).
This could be a viable strategy to merge DBpedia data into Wikidata. This email was only about person-related data, but one could do this for any kind of dataset where the information in DBpedia is of relatively high quality. I don't know exactly what the primary sources tool needs as input (it is still beta), but I think it mainly requires that a decent quality set of candidate statements is extracted and provided in some suitable format.
As a first step, it might make sense to do a scan to see how many date-of-death (or whatever) statements in DBpedia are not yet found in Wikidata. If it is a small dataset (e.g., only a subset of the people who have died in the last year), then maybe one could also add and verify it in another way, not going through primary sources. But especially for recent deaths, there might be a great variety of sources (esp. newspaper articles) that are not easy to find without user support.
Regards,
Markus
[1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
On 04.06.2015 09:56, Dimitris Kontokostas wrote:
On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
On 03.06.2015 22:44, Gerard Meijssen wrote: Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance. Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost.. DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]).
Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence).
e.g. we have 28K items with double birth dates one from the infobox and another from persondata.
select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)} http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&que...
The persondata template is used in German Wikipedia as well. The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english)
Best, Dimitris
So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-). DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration. To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference. Regards, Markus [1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397 Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in-2015.html On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>>> wrote: Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Markus with all due respect, we have a LOT of data in Wikidata that is plain wrong. When we add the missing data from DBpedia it is of a higher quality than what we have. Insisting that it first needs to be validated is foolish. It is not done for any of the work we do. All our bots make use of Wikipedia and in this DBpedia is no different.
I do agree that it makes sense to verify the data that is different. But even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our practise has been to remove the 1929 for the more precise data.
Let us be pragmatic and improve our data and start with what is missing. Thanks, GerardM
On 4 June 2015 at 10:31, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi Dmitris,
Interesting situation. If you have contradictory data from several templates, then the challenge will be to find out which information is correct for importing it to Wikidata. Could your dataset maybe become an input to the primary sources tool [1]? Then Wikidata users could help to clean the dataset and try to find references (as you know, references are quite important for Wikidata, but it would really be asking too much of DBpedia to provide these).
This could be a viable strategy to merge DBpedia data into Wikidata. This email was only about person-related data, but one could do this for any kind of dataset where the information in DBpedia is of relatively high quality. I don't know exactly what the primary sources tool needs as input (it is still beta), but I think it mainly requires that a decent quality set of candidate statements is extracted and provided in some suitable format.
As a first step, it might make sense to do a scan to see how many date-of-death (or whatever) statements in DBpedia are not yet found in Wikidata. If it is a small dataset (e.g., only a subset of the people who have died in the last year), then maybe one could also add and verify it in another way, not going through primary sources. But especially for recent deaths, there might be a great variety of sources (esp. newspaper articles) that are not easy to find without user support.
Regards,
Markus
[1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
On 04.06.2015 09:56, Dimitris Kontokostas wrote:
On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>
wrote:
On 03.06.2015 22:44, Gerard Meijssen wrote: Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance. Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost.. DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]).
Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence).
e.g. we have 28K items with double birth dates one from the infobox and another from persondata.
select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)}
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&que...
The persondata template is used in German Wikipedia as well. The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english)
Best, Dimitris
So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-). DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration. To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference. Regards, Markus [1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397 Thanks, GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in...
On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>>> wrote: Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for
any country in any Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 04.06.2015 10:49, Gerard Meijssen wrote:
Hoi, Markus with all due respect, we have a LOT of data in Wikidata that is plain wrong. When we add the missing data from DBpedia it is of a higher quality than what we have. Insisting that it first needs to be validated is foolish. It is not done for any of the work we do. All our bots make use of Wikipedia and in this DBpedia is no different.
I do agree that it makes sense to verify the data that is different. But even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our practise has been to remove the 1929 for the more precise data.
Let us be pragmatic and improve our data and start with what is missing.
That's exactly what I am saying. I think your misconception is that what you suggest does not happen because of some opposition from the Wikidata community. In reality, it simply does not happen because nobody did it yet, neither from the DBpedia nor from the Wikidata community. It does not help very much to post arguments of how useful this would be. At least you don't need to convince me. What is needed is deed, not talk.
The folks working on the primary sources tool are trying to provide a standard process for almost arbitrary data imports. It was just my first thought for turning your complaint into something that could work as a solution -- if you have a better idea which tool to use, feel free to post it.
Regards,
Markus
Thanks, GerardM
On 4 June 2015 at 10:31, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
Hi Dmitris, Interesting situation. If you have contradictory data from several templates, then the challenge will be to find out which information is correct for importing it to Wikidata. Could your dataset maybe become an input to the primary sources tool [1]? Then Wikidata users could help to clean the dataset and try to find references (as you know, references are quite important for Wikidata, but it would really be asking too much of DBpedia to provide these). This could be a viable strategy to merge DBpedia data into Wikidata. This email was only about person-related data, but one could do this for any kind of dataset where the information in DBpedia is of relatively high quality. I don't know exactly what the primary sources tool needs as input (it is still beta), but I think it mainly requires that a decent quality set of candidate statements is extracted and provided in some suitable format. As a first step, it might make sense to do a scan to see how many date-of-death (or whatever) statements in DBpedia are not yet found in Wikidata. If it is a small dataset (e.g., only a subset of the people who have died in the last year), then maybe one could also add and verify it in another way, not going through primary sources. But especially for recent deaths, there might be a great variety of sources (esp. newspaper articles) that are not easy to find without user support. Regards, Markus [1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool On 04.06.2015 09 <tel:04.06.2015%2009>:56, Dimitris Kontokostas wrote: On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org> <mailto:markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>> wrote: On 03.06.2015 22 <tel:03.06.2015%2022>:44, Gerard Meijssen wrote: Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance. Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost.. DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]). Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence). e.g. we have 28K items with double birth dates one from the infobox and another from persondata. select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)} http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+count%28*%29+where+%7B%3Fs+dbpedia-owl%3AbirthDate+%3Fb1+%3B+dbpedia-owl%3AbirthDate+%3Fb2+.%0D%0Afilter+%28%3Fb1+%21%3D+%3Fb2+%26%26+%3Fb1+%3C+%3Fb2%29%7D&format=text%2Fhtml&timeout=30000&debug=on The persondata template is used in German Wikipedia as well. The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english) Best, Dimitris So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-). DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration. To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference. Regards, Markus [1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397 Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in-2015.html On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>>>> wrote: Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> https://lists.wikimedia.org/mailman/listinfo/wikidata -- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Thu, Jun 4, 2015 at 11:02 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
That's exactly what I am saying. I think your misconception is that what you suggest does not happen because of some opposition from the Wikidata community. In reality, it simply does not happen because nobody did it yet, neither from the DBpedia nor from the Wikidata community. It does not help very much to post arguments of how useful this would be. At least you don't need to convince me. What is needed is deed, not talk.
Wikipedians have objections because importing from Wikipedia is already bad enough in their eyes. Importing from an extraction is even worse in their eyes. We should have convincing numbers about the quality of extraction. This is a reputation problem for Wikidata. One we're spending a lot of time improving and will need to spend a lot more on.
The folks working on the primary sources tool are trying to provide a standard process for almost arbitrary data imports. It was just my first thought for turning your complaint into something that could work as a solution -- if you have a better idea which tool to use, feel free to post it.
Big +1.
Cheers Lydia
Hoi, If we want to make visible that our data is good, you have to approach it from another angle. Adding sources to all statements is not going to happen. Comparing data that exists in multiple sources is comparatively easy. When data at Wikidata is the same as a source like VIAF or IMDB, we do not have to give attention to those statements. When we put our effort in the differences we make a difference. Both our quality and their quality improves.
The notions that fear of Wikipedians prevents their take up of our data does not mean that we have to adopt their practices that do not scale. Thanks, GerardM
On 4 June 2015 at 12:25, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
On Thu, Jun 4, 2015 at 11:02 AM, Markus Krötzsch markus@semantic-mediawiki.org wrote:
That's exactly what I am saying. I think your misconception is that what
you
suggest does not happen because of some opposition from the Wikidata community. In reality, it simply does not happen because nobody did it
yet,
neither from the DBpedia nor from the Wikidata community. It does not
help
very much to post arguments of how useful this would be. At least you
don't
need to convince me. What is needed is deed, not talk.
Wikipedians have objections because importing from Wikipedia is already bad enough in their eyes. Importing from an extraction is even worse in their eyes. We should have convincing numbers about the quality of extraction. This is a reputation problem for Wikidata. One we're spending a lot of time improving and will need to spend a lot more on.
The folks working on the primary sources tool are trying to provide a standard process for almost arbitrary data imports. It was just my first thought for turning your complaint into something that could work as a solution -- if you have a better idea which tool to use, feel free to
post
it.
Big +1.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 4 June 2015 at 11:32, Gerard Meijssen gerard.meijssen@gmail.com wrote:
If we want to make visible that our data is good, you have to approach it from another angle. Adding sources to all statements is not going to happen. Comparing data that exists in multiple sources is comparatively easy. When data at Wikidata is the same as a source like VIAF or IMDB, we do not have to give attention to those statements. When we put our effort in the differences we make a difference. Both our quality and their quality improves.
If we're validating data against VIAF or IMDb (or wherever) then we an note that we did so by recording them as sources.
We are currently working on something that could be extended to be used as a source of finding data conflicts / import. I have to check if this can be integrated with the primary sources tool. I hope we have something ready in the next couple of weeks and I'll get back at this thread.
Best, Dimitris
On Thu, Jun 4, 2015 at 11:49 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Markus with all due respect, we have a LOT of data in Wikidata that is plain wrong. When we add the missing data from DBpedia it is of a higher quality than what we have. Insisting that it first needs to be validated is foolish. It is not done for any of the work we do. All our bots make use of Wikipedia and in this DBpedia is no different.
I do agree that it makes sense to verify the data that is different. But even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our practise has been to remove the 1929 for the more precise data.
Let us be pragmatic and improve our data and start with what is missing. Thanks, GerardM
On 4 June 2015 at 10:31, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi Dmitris,
Interesting situation. If you have contradictory data from several templates, then the challenge will be to find out which information is correct for importing it to Wikidata. Could your dataset maybe become an input to the primary sources tool [1]? Then Wikidata users could help to clean the dataset and try to find references (as you know, references are quite important for Wikidata, but it would really be asking too much of DBpedia to provide these).
This could be a viable strategy to merge DBpedia data into Wikidata. This email was only about person-related data, but one could do this for any kind of dataset where the information in DBpedia is of relatively high quality. I don't know exactly what the primary sources tool needs as input (it is still beta), but I think it mainly requires that a decent quality set of candidate statements is extracted and provided in some suitable format.
As a first step, it might make sense to do a scan to see how many date-of-death (or whatever) statements in DBpedia are not yet found in Wikidata. If it is a small dataset (e.g., only a subset of the people who have died in the last year), then maybe one could also add and verify it in another way, not going through primary sources. But especially for recent deaths, there might be a great variety of sources (esp. newspaper articles) that are not easy to find without user support.
Regards,
Markus
[1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
On 04.06.2015 09:56, Dimitris Kontokostas wrote:
On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>
wrote:
On 03.06.2015 22:44, Gerard Meijssen wrote: Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance. Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost.. DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]).
Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence).
e.g. we have 28K items with double birth dates one from the infobox and another from persondata.
select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)}
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&que...
The persondata template is used in German Wikipedia as well. The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english)
Best, Dimitris
So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-). DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration. To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference. Regards, Markus [1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397 Thanks, GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in...
On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>>> wrote: Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for
any country in any Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:
Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 04.06.2015 11:05, Dimitris Kontokostas wrote:
We are currently working on something that could be extended to be used as a source of finding data conflicts / import. I have to check if this can be integrated with the primary sources tool. I hope we have something ready in the next couple of weeks and I'll get back at this thread.
Great, this sounds like a plan. The work on the primary sources tool will take a few more months before it will really be ready for prime time. If this is too long, there might be more short-term solutions (such as a Wikidata game), but you'd have to ask the people running this in each case.
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
Cheers,
Markus
Best, Dimitris
On Thu, Jun 4, 2015 at 11:49 AM, Gerard Meijssen <gerard.meijssen@gmail.com mailto:gerard.meijssen@gmail.com> wrote:
Hoi, Markus with all due respect, we have a LOT of data in Wikidata that is plain wrong. When we add the missing data from DBpedia it is of a higher quality than what we have. Insisting that it first needs to be validated is foolish. It is not done for any of the work we do. All our bots make use of Wikipedia and in this DBpedia is no different. I do agree that it makes sense to verify the data that is different. But even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our practise has been to remove the 1929 for the more precise data. Let us be pragmatic and improve our data and start with what is missing. Thanks, GerardM On 4 June 2015 at 10:31, Markus Krötzsch <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>> wrote: Hi Dmitris, Interesting situation. If you have contradictory data from several templates, then the challenge will be to find out which information is correct for importing it to Wikidata. Could your dataset maybe become an input to the primary sources tool [1]? Then Wikidata users could help to clean the dataset and try to find references (as you know, references are quite important for Wikidata, but it would really be asking too much of DBpedia to provide these). This could be a viable strategy to merge DBpedia data into Wikidata. This email was only about person-related data, but one could do this for any kind of dataset where the information in DBpedia is of relatively high quality. I don't know exactly what the primary sources tool needs as input (it is still beta), but I think it mainly requires that a decent quality set of candidate statements is extracted and provided in some suitable format. As a first step, it might make sense to do a scan to see how many date-of-death (or whatever) statements in DBpedia are not yet found in Wikidata. If it is a small dataset (e.g., only a subset of the people who have died in the last year), then maybe one could also add and verify it in another way, not going through primary sources. But especially for recent deaths, there might be a great variety of sources (esp. newspaper articles) that are not easy to find without user support. Regards, Markus [1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool On 04.06.2015 09 <tel:04.06.2015%2009>:56, Dimitris Kontokostas wrote: On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org> <mailto:markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>> wrote: On 03.06.2015 22 <tel:03.06.2015%2022>:44, Gerard Meijssen wrote: Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance. Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost.. DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]). Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence). e.g. we have 28K items with double birth dates one from the infobox and another from persondata. select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)} http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+count%28*%29+where+%7B%3Fs+dbpedia-owl%3AbirthDate+%3Fb1+%3B+dbpedia-owl%3AbirthDate+%3Fb2+.%0D%0Afilter+%28%3Fb1+%21%3D+%3Fb2+%26%26+%3Fb1+%3C+%3Fb2%29%7D&format=text%2Fhtml&timeout=30000&debug=on The persondata template is used in German Wikipedia as well. The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english) Best, Dimitris So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-). DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration. To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference. Regards, Markus [1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397 Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in-2015.html On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>>>> wrote: Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> https://lists.wikimedia.org/mailman/listinfo/wikidata -- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Thu, Jun 4, 2015 at 12:56 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 04.06.2015 11:05, Dimitris Kontokostas wrote:
We are currently working on something that could be extended to be used as a source of finding data conflicts / import. I have to check if this can be integrated with the primary sources tool. I hope we have something ready in the next couple of weeks and I'll get back at this thread.
Great, this sounds like a plan. The work on the primary sources tool will take a few more months before it will really be ready for prime time. If this is too long, there might be more short-term solutions (such as a Wikidata game), but you'd have to ask the people running this in each case.
I'll show you what we can provide and you can suggest any options
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Cheers, Dimitris
Cheers,
Markus
Best, Dimitris
On Thu, Jun 4, 2015 at 11:49 AM, Gerard Meijssen <gerard.meijssen@gmail.com mailto:gerard.meijssen@gmail.com> wrote:
Hoi, Markus with all due respect, we have a LOT of data in Wikidata that is plain wrong. When we add the missing data from DBpedia it is of a higher quality than what we have. Insisting that it first needs to be validated is foolish. It is not done for any of the work we do. All our bots make use of Wikipedia and in this DBpedia is no
different.
I do agree that it makes sense to verify the data that is different. But even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our practise has been to remove the 1929 for the more precise data. Let us be pragmatic and improve our data and start with what is
missing. Thanks, GerardM
On 4 June 2015 at 10:31, Markus Krötzsch <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>> wrote: Hi Dmitris, Interesting situation. If you have contradictory data from several templates, then the challenge will be to find out which information is correct for importing it to Wikidata. Could your dataset maybe become an input to the primary sources tool [1]? Then Wikidata users could help to clean the dataset and try to find references (as you know, references are quite important for Wikidata, but it would really be asking too much of DBpedia to provide these). This could be a viable strategy to merge DBpedia data into Wikidata. This email was only about person-related data, but one could do this for any kind of dataset where the information in DBpedia is of relatively high quality. I don't know exactly what the primary sources tool needs as input (it is still beta), but I think it mainly requires that a decent quality set of candidate statements is extracted and provided in some suitable format. As a first step, it might make sense to do a scan to see how many date-of-death (or whatever) statements in DBpedia are not yet found in Wikidata. If it is a small dataset (e.g., only a subset of the people who have died in the last year), then maybe one could also add and verify it in another way, not going through primary sources. But especially for recent deaths, there might be a great variety of sources (esp. newspaper articles) that are not easy to find without user support. Regards, Markus [1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool On 04.06.2015 09 <tel:04.06.2015%2009>:56, Dimitris Kontokostas wrote: On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org> <mailto:markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>> wrote: On 03.06.2015 22 <tel:03.06.2015%2022>:44, Gerard Meijssen wrote: Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added
significance.
Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have
the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost..
DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]). Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings
wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence).
e.g. we have 28K items with double birth dates one from the infobox and another from persondata. select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)}
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&que...
The persondata template is used in German Wikipedia as well.
The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english)
Best, Dimitris So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-). DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration. To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference. Regards, Markus [1] https://en.wikipedia.org/wiki/Template:Persondata [2] https://github.com/dbpedia/extraction-framework/issues/397 Thanks, GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in...
On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com> <mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>>>> wrote: Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015. I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people... Thanks, GerardM [1]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [2]
https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> https://lists.wikimedia.org/mailman/listinfo/wikidata -- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Am 04.06.2015 um 14:00 schrieb Markus Krötzsch markus@semantic-mediawiki.org:
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
The DBpedia Events Dataset [http://events.dbpedia.org/] contains people who died recently. Well, this is extracted from DBpedia Live, which is again extracted from Wikipedia articles. But it usually gets peoples death by the end of the day, which is often before it is in the (German) news:
http://events.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fevents.dbped...
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Coming back to an old thread. We now extract references from Wikipedia and are available in the 2015-10 beta release
citation_data_en.ttl.bz2 http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_data_en.ttl.bz2citation_links_en.ttl.bz2 http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_links_en.ttl.bz2
any feedback is more than welcome
Best,
Dimitris
On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Based on the other open related thread [1] there are references for the deathDate of 1950 people [2] I manually checked a random 5 pages and all had a reference "imported from Wikipedia" so maybe this is a good start
(cc'ing wiki-cite after Dario's suggestion on the other thread)
Best, Dimitris
[1] https://lists.wikimedia.org/pipermail/wikidata/2016-August/009447.html [2] curl http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts.... | bzcat | grep "deathDate"
On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Did they have a date of death in Wikidata as well ? Thanks, GerardM
On 31 August 2016 at 11:53, Dimitris Kontokostas jimkont@gmail.com wrote:
Based on the other open related thread [1] there are references for the deathDate of 1950 people [2] I manually checked a random 5 pages and all had a reference "imported from Wikipedia" so maybe this is a good start
(cc'ing wiki-cite after Dario's suggestion on the other thread)
Best, Dimitris
[1] https://lists.wikimedia.org/pipermail/wikidata/2016-August/009447.html [2] curl http://downloads.dbpedia.org/temporary/citations/enwiki- 20160305-citedFacts.tql.bz2 | bzcat | grep "deathDate"
On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
yes taking the 1st entry for example http://dbpedia.org/resource/Abdülaziz_of_the_Ottoman_Empire < http://dbpedia.org/ontology/deathDate%3E "1876-06-04"^^< http://www.w3.org/2001/XMLSchema#date%3E < http://books.google.com/books?vid=ISBN978-1-59339-837-8%3E . is about https://www.wikidata.org/wiki/Q151500
On Wed, Aug 31, 2016 at 2:09 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Did they have a date of death in Wikidata as well ? Thanks, GerardM
On 31 August 2016 at 11:53, Dimitris Kontokostas jimkont@gmail.com wrote:
Based on the other open related thread [1] there are references for the deathDate of 1950 people [2] I manually checked a random 5 pages and all had a reference "imported from Wikipedia" so maybe this is a good start
(cc'ing wiki-cite after Dario's suggestion on the other thread)
Best, Dimitris
[1] https://lists.wikimedia.org/pipermail/wikidata/2016-Augu st/009447.html [2] curl http://downloads.dbpedia.org/temporary/citations/enwiki-2016 0305-citedFacts.tql.bz2 | bzcat | grep "deathDate"
On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I am pasting the first few items for reference for those who cannot filter the dump
http://dbpedia.org/resource/Abdülaziz_of_the_Ottoman_Empire < http://dbpedia.org/ontology/deathDate%3E "1876-06-04"^^< http://www.w3.org/2001/XMLSchema#date%3E < http://books.google.com/books?vid=ISBN978-1-59339-837-8%3E . http://dbpedia.org/resource/Jonah http://dbpedia.org/property/deathDate "8"^^http://www.w3.org/2001/XMLSchema#integer < http://citation.dbpedia.org/hash/e5ccc3add53d0f5356a2391e0466d47fa5c6c8fe223... . http://dbpedia.org/resource/John_the_Evangelist < http://dbpedia.org/property/deathDate%3E "c. AD 100"@en < http://books.google.com/books?vid=ISBN1-889814-09-1%3E . http://dbpedia.org/resource/Sigismund_Báthory < http://dbpedia.org/ontology/deathDate%3E "1613-03-27"^^< http://www.w3.org/2001/XMLSchema#date%3E < http://www.antikvarium.hu/ant/book.php?konyv-cim=baranyai-decsi-janos-magyar... . http://dbpedia.org/resource/Thomas_the_Apostle < http://dbpedia.org/ontology/deathDate%3E "1972-12-21"^^< http://www.w3.org/2001/XMLSchema#date%3E < https://web.archive.org/web/20120606161624/http://cs.nyu.edu/kandathi/thomas... . http://dbpedia.org/resource/Julius_Plücker < http://dbpedia.org/ontology/deathDate%3E "1868-05-22"^^< http://www.w3.org/2001/XMLSchema#date%3E < http://citation.dbpedia.org/hash/048618db3e3be53963dc8294015e890b789d385cf0e... . http://dbpedia.org/resource/Uthman http://dbpedia.org/ontology/deathDate "0656-06-17"^^http://www.w3.org/2001/XMLSchema#date < http://citation.dbpedia.org/hash/f50b296af8a6217d44dd0ebb6cf5f4c7c2d2db72f11... . http://dbpedia.org/resource/Toussaint_Louverture < http://dbpedia.org/ontology/deathDate%3E "1803-04-07"^^< http://www.w3.org/2001/XMLSchema#date%3E < https://books.google.com.sa/books?id=xA0FAAAAYAAJ%3E . http://dbpedia.org/resource/Bob_Monkhouse < http://dbpedia.org/ontology/deathDate%3E "2003-12-29"^^< http://www.w3.org/2001/XMLSchema#date%3E < http://www.guardian.co.uk/news/2003/dec/30/guardianobituaries.artsobituaries... . http://dbpedia.org/resource/Bill_Pertwee < http://dbpedia.org/ontology/deathDate%3E "2013-05-27"^^< http://www.w3.org/2001/XMLSchema#date%3E < http://www.guardian.co.uk/uk/2013/may/27/dads-army-star-bill-pertwee-dies%3E . http://dbpedia.org/resource/Hyder_Ali < http://dbpedia.org/ontology/deathDate%3E "1782-12-07"^^< http://www.w3.org/2001/XMLSchema#date%3E < http://books.google.com/books?vid=ISBN8187879572%3E .
On Wed, Aug 31, 2016 at 2:19 PM, Dimitris Kontokostas jimkont@gmail.com wrote:
yes taking the 1st entry for example http://dbpedia.org/resource/Abdülaziz_of_the_Ottoman_Empire < http://dbpedia.org/ontology/deathDate%3E "1876-06-04"^^http://www.w3. org/2001/XMLSchema#date http://books.google.com/ books?vid=ISBN978-1-59339-837-8 . is about https://www.wikidata.org/wiki/Q151500
On Wed, Aug 31, 2016 at 2:09 PM, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:
Hoi, Did they have a date of death in Wikidata as well ? Thanks, GerardM
On 31 August 2016 at 11:53, Dimitris Kontokostas jimkont@gmail.com wrote:
Based on the other open related thread [1] there are references for the deathDate of 1950 people [2] I manually checked a random 5 pages and all had a reference "imported from Wikipedia" so maybe this is a good start
(cc'ing wiki-cite after Dario's suggestion on the other thread)
Best, Dimitris
[1] https://lists.wikimedia.org/pipermail/wikidata/2016-Augu st/009447.html [2] curl http://downloads.dbpedia.org/temporary/citations/enwiki-2016 0305-citedFacts.tql.bz2 | bzcat | grep "deathDate"
On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 04.06.2015 12:17, Dimitris Kontokostas wrote: ...
Another question: can DBpedia extract references from Wikipedia articles too? If this would be possible, it might be feasible to guess and suggest a reference (or a list of references). Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event, which could narrow down the choices very much.
We don't extract them for now, although I think we could relatively easily. The problem in this case would be that we cannot associate references with facts. The DBpedia Information Extraction Framework is quite module and can be easily extended with new extractors but it is hard to make these extractors "talk to each other". So we could easily get something like the following dbp:A dbo:birthDate "..." dbp:A dbo:deahthDate "..." dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 ....something else" depending on the modeling dbp:A dbo:reference dbp:r2
but not sure if this solves your problem
Yes, I understand that you can hardly get the association between extracted facts and references. My suggestion was to extract both independently and then to query for references that have a publication date close to a person's death so as to suggest them to users as a possible reference for the death-date fact. This would still require a manual check, since we cannot know if the guessed reference belongs to the date of death, but if it has a high precision it would be a worthwhile way of spending volunteer time to obtain confirmed references.
At the same time, it might be one of the fastest ways to get sourced date of death into Wikidata, since news articles will usually appear before the major authority files are updated (so even if we get donations from them, some lag would remain). With such an extraction framework, one could establish a pipeline from Wikipedia to Wikidata.
In the long run, references from authority files will become more valuable than news articles, because they are more long-lived.
Best wishes,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Kontokostas Dimitris
On 4 June 2015 at 10:56, Markus Krötzsch markus@semantic-mediawiki.org wrote:
the primary sources tool
New to me - where can we read more about this, please?
Especially with things like date of death, one would expect that references have a publication date very close to (but strictly after) the event
Often the same day; and - depending on timezones* - potentially the day before.
* For example, a high-profile 1am death in the UK on 5 June could be repoted an hour later on a US webpage timestamped 8pm on 4 June.
Good day, In my opinion it's a fine service (especially for me). I agree with Gerard concerning the use of DBpedia.
best regards, Eric van Balkum
p.s. We can add one to the list nl.WIKIPEDIA.org/WIKI/ALBERT_WEST (or the bot will do so)
Gerard Meijssen schreef op 2015-06-04 10:49:
Hoi, Markus with all due respect, we have a LOT of data in Wikidata that is plain wrong. When we add the missing data from DBpedia it is of a higher quality than what we have. Insisting that it first needs to be validated is foolish. It is not done for any of the work we do. All our bots make use of Wikipedia and in this DBpedia is no different.
I do agree that it makes sense to verify the data that is different. But even so. When Wikidata says 1929 and DBpedia says 7-June-1929 our practise has been to remove the 1929 for the more precise data.
Let us be pragmatic and improve our data and start with what is missing. Thanks, GerardM
On 4 June 2015 at 10:31, Markus Krötzsch markus@semantic-mediawiki.org wrote: Hi Dmitris,
Interesting situation. If you have contradictory data from several templates, then the challenge will be to find out which information is correct for importing it to Wikidata. Could your dataset maybe become an input to the primary sources tool [1]? Then Wikidata users could help to clean the dataset and try to find references (as you know, references are quite important for Wikidata, but it would really be asking too much of DBpedia to provide these).
This could be a viable strategy to merge DBpedia data into Wikidata. This email was only about person-related data, but one could do this for any kind of dataset where the information in DBpedia is of relatively high quality. I don't know exactly what the primary sources tool needs as input (it is still beta), but I think it mainly requires that a decent quality set of candidate statements is extracted and provided in some suitable format.
As a first step, it might make sense to do a scan to see how many date-of-death (or whatever) statements in DBpedia are not yet found in Wikidata. If it is a small dataset (e.g., only a subset of the people who have died in the last year), then maybe one could also add and verify it in another way, not going through primary sources. But especially for recent deaths, there might be a great variety of sources (esp. newspaper articles) that are not easy to find without user support.
Regards,
Markus
[1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool [1]
On 04.06.2015 09 [2]:56, Dimitris Kontokostas wrote:
On Thu, Jun 4, 2015 at 1:18 AM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>
wrote:
On 03.06.2015 22 [3]:44, Gerard Meijssen wrote:
Hoi, The Dutch indicated their willingness to add the dead to Wikidata ... I add quite a few dead from other countries and because of Jura1 Brazilians who died in 2015 have an added significance.
Given that we CAN produce lists like this, it makes sense to reconsider the offer by the fine people from DBpedia and have the information they harvest from Wikipedia added automatically to Wikidata.. One reason I pointed out on my recent blogpost..
DBpedia is getting this information from the contents of the template Persondata as used on Wikipedia [1]. The enwiki community just recently decided to maintain this data on Wikidata instead. I guess this means that (English) DBpedia will not contain this data in the future, unless they import it from Wikidata (they are tracking the issue at [2]).
Note that DBpedia gets person data information both from the persondata template and from the infobox templates using the mappings wiki. We also noted that the data between the two is many times out of sync (and usually the person data is stalled/wrong because people don't know it's existence).
e.g. we have 28K items with double birth dates one from the infobox and another from persondata.
select count(*) where {?s dbpedia-owl:birthDate ?b1 ; dbpedia-owl:birthDate ?b2 . filter (?b1 != ?b2 && ?b1 < ?b2)} http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&que... [4]
The persondata template is used in German Wikipedia as well. The following release has ~ 2.2M triples coming from the german persondata template (which iirc has the same problems as the english)
Best, Dimitris
So you see, times are changing quickly ... but overall I hope that this is still solving the problem you identified, in fact in a much more direct way than one might have hoped for :-).
DBpedia may still play a role. I don't know how exactly the enwiki community is planning to implement the move from Persondata to Wikidata. It could be that DBpedia is the only project extracting this data. So in a way, your suggestion might be a great idea, though not as a long-term data maintenance plan but as a one-time help for migration.
To support data maintenance further, it would make sense to use bots for synching with authority files. These files also contain death dates and they can even be used as a valid reference.
Regards,
Markus
[1] https://en.wikipedia.org/wiki/Template:Persondata [5] [2] https://github.com/dbpedia/extraction-framework/issues/397 [6]
Thanks, GerardM
http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in... [7]
On 3 June 2015 at 07:16, Gerard Meijssen <gerard.meijssen@gmail.com mailto:gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com
mailto:gerard.meijssen@gmail.com>> wrote:
Hoi, Jura1 created a wonderful list of people who died in Brazil in 2015 [1]. It is a page that may update regularly from Wikidata thanks to the ListeriaBot. Obviously, there may be a few more because I am falling ever more behind with my quest for registering deaths in 2015.
I have copied his work and created a page for people who died in the Netherlands in 2015 [2]. It is trivially easy to do this and, the result is great. The result looks great, it can be used for any country in any Wikipedia
The Dutch Wikipedia indicated that they nowadays maintain important metadata at Wikidata. I am really happy that we can showcase their work. It is important work because as someone reminded me at some stage, this is part of what amounts to the policy of living people...
Thanks, GerardM
[1] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [8] [2] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands [9]
Wikidata mailing list Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata [10]
Wikidata mailing list Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata [10]
-- Kontokostas Dimitris
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata [10]
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata [10]
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata [10]
Links: ------ [1] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool [2] tel:04.06.2015%2009 [3] tel:03.06.2015%2022 [4] http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&que... [5] https://en.wikipedia.org/wiki/Template:Persondata [6] https://github.com/dbpedia/extraction-framework/issues/397 [7] http://ultimategerardm.blogspot.nl/2015/06/wikidata-jurandyr-noronha-died-in... [8] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_Brazil [9] https://www.wikidata.org/wiki/User:Jura1/Recent_deaths_in_the_Netherlands [10] https://lists.wikimedia.org/mailman/listinfo/wikidata
In some countries there are birth and death registers that can be queried, often only to validate a date, but it usually comes with access limitations and a cost. Any idea how we could automate a check against those?
John
On Thu, Jun 4, 2015 at 12:27 PM, John Erling Blad jeblad@gmail.com wrote:
In some countries there are birth and death registers that can be queried, often only to validate a date, but it usually comes with access limitations and a cost. Any idea how we could automate a check against those?
That is https://www.wikidata.org/wiki/Wikidata:Development_plan#Consistency_checks_a...
Cheers Lydia
Hoi, This is the all singing all dancing solution. When can we have something now .. It does not need to be integrated. It only needs to work. Thanks, GerardM
Why wait for tomorrow when it can be done now ?
On 4 June 2015 at 12:54, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
On Thu, Jun 4, 2015 at 12:27 PM, John Erling Blad jeblad@gmail.com wrote:
In some countries there are birth and death registers that can be queried, often only to validate a date, but it usually comes with access limitations and a cost. Any idea how we could automate a check against those?
That is https://www.wikidata.org/wiki/Wikidata:Development_plan#Consistency_checks_a...
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Interesting... https://www.mediawiki.org/wiki/Wikibase_Quality_Extensions
On Thu, Jun 4, 2015 at 12:54 PM, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
On Thu, Jun 4, 2015 at 12:27 PM, John Erling Blad jeblad@gmail.com wrote:
In some countries there are birth and death registers that can be queried, often only to validate a date, but it usually comes with access limitations and a cost. Any idea how we could automate a check against those?
That is https://www.wikidata.org/wiki/Wikidata:Development_plan#Consistency_checks_a...
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 4 June 2015 at 11:27, John Erling Blad jeblad@gmail.com wrote:
In some countries there are birth and death registers that can be queried, often only to validate a date, but it usually comes with access limitations and a cost. Any idea how we could automate a check against those?
We could work with our colleagues at the Wikipedia Library[1] to negotiate the donation of free access.
[1] https://en.wikipedia.org/wiki/Wikipedia:The_Wikipedia_Library