Hello everyone,
I've been until now a lurker on this list, let me introduce myself - I'm a sociologist studying digital technologies, an activist (I run Creative Commons Poland) and I run a digital think tank / NGO in Poland.
I'm hoping someone on this list might be able to help me: I'm involved in the celebrations of the Public Domain Day - on the 1st of January each year works pass into the public domain of authors who've died 70 years ago (at least in Poland, and in most countries, but it might differ in some jurisdictions).
I'm looking for a good way to determine, who died in 1941 - and thought that Wikipedia will be a good place to find this out. I know there are lists of people who died in a given year, but they are not complete. Is there any way to automatically query Wikipedia for such information? I know that it's to some extent structured, as this information is provided in templates for biographical articles, but I don't know whether there is any mechanism for querying?
Any advice will be much appreciated.
All the best,
Alek
There's a category for that sort of thing:
http://en.wikipedia.org/wiki/Category:1941_deaths
Other wikis might have similar categories.
FT2
On Fri, Dec 23, 2011 at 12:35 PM, Alek Tarkowski < atarkowski@centrumcyfrowe.pl> wrote:
Hello everyone,
I've been until now a lurker on this list, let me introduce myself - I'm a sociologist studying digital technologies, an activist (I run Creative Commons Poland) and I run a digital think tank / NGO in Poland.
I'm hoping someone on this list might be able to help me: I'm involved in the celebrations of the Public Domain Day - on the 1st of January each year works pass into the public domain of authors who've died 70 years ago (at least in Poland, and in most countries, but it might differ in some jurisdictions).
I'm looking for a good way to determine, who died in 1941 - and thought that Wikipedia will be a good place to find this out. I know there are lists of people who died in a given year, but they are not complete. Is there any way to automatically query Wikipedia for such information? I know that it's to some extent structured, as this information is provided in templates for biographical articles, but I don't know whether there is any mechanism for querying?
Any advice will be much appreciated.
All the best,
Alek
-- dyrektor, Centrum Cyfrowe Projekt: Polska www: http://centrumcyfrowe.pl identi.ca / twitter: @centrumcyfrowe
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Alek,
2011/12/23 Alek Tarkowski atarkowski@centrumcyfrowe.pl:
I'm looking for a good way to determine, who died in 1941 - and thought that Wikipedia will be a good place to find this out. I know there are lists of people who died in a given year, but they are not complete. Is there any way to automatically query Wikipedia for such information? I know that it's to some extent structured, as this information is provided in templates for biographical articles, but I don't know whether there is any mechanism for querying?
It's likely that most of those people are in a category¹ . Since no category is ever complete, mixing the content of the category with the equivalent of the biggest Wikipedia (German, French…²) could help. Finding the equivalent categories is easy: start at ¹, then use the interwiki links under “languages” in the left column. Also, don't forget commons³.
Best regards,
¹ https://en.wikipedia.org/wiki/Category:1941_deaths ² https://www.wikipedia.org/ ³ https://commons.wikimedia.org/wiki/Category:1941_deaths
Hi Alek,
Not every language version of Wikipedia has such categories, but at least 80 do. You can find a list at http://meta.wikimedia.org/wiki/Death_anomalies_table of the eighty or so dead people categories - died in 1941 will be a subcategory of that. Someone with toolserver access could probably extract a list for you of all the people we have minus duplicates across languages. You could file a request for such a report at http://en.wikipedia.org/wiki/Wikipedia_talk:Database_reports
But this will only give you notable people who died in a particular year, if you want a list of people who died in a particular year you are better off looking at genealogy sites, and for 1941 military war grave sites. Tens of millions of people died that year and I doubt that we have even 0.1% of them.
Hope that helps
WereSpielChequers
2011/12/23 Jérémie Roquet arkanosis@gmail.com
Hi Alek,
2011/12/23 Alek Tarkowski atarkowski@centrumcyfrowe.pl:
I'm looking for a good way to determine, who died in 1941 - and thought that Wikipedia will be a good place to find this out. I know there are lists of people who died in a given year, but they are not complete. Is there any way to automatically query Wikipedia for such information? I know that it's to some extent structured, as this information is provided in templates for biographical articles, but I don't know whether there is any mechanism for querying?
It's likely that most of those people are in a category¹ . Since no category is ever complete, mixing the content of the category with the equivalent of the biggest Wikipedia (German, French…²) could help. Finding the equivalent categories is easy: start at ¹, then use the interwiki links under “languages” in the left column. Also, don't forget commons³.
Best regards,
¹ https://en.wikipedia.org/wiki/Category:1941_deaths ² https://www.wikipedia.org/ ³ https://commons.wikimedia.org/wiki/Category:1941_deaths
-- Jérémie
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I think he's after writers and artists, and wants to identify people whose copyrights may have passed into public domain because they died that year.
"Known" authors and artists will be more relevant which should work well. It won't give a complete list of all deaths of anyone who created anything copyrighted, but the more known their works (and hence useful/interesting to know are PD), the more likely we are to have coverage.
FT2
On Fri, Dec 23, 2011 at 2:57 PM, WereSpielChequers < werespielchequers@gmail.com> wrote:
But this will only give you notable people who died in a particular year, if you want a list of people who died in a particular year you are better off looking at genealogy sites, and for 1941 military war grave sites. Tens of millions of people died that year and I doubt that we have even 0.1% of them.
Jeremie, FT2,
thank you very much for your advice.
Do you have any idea how complete these lists are? Are they done by hand, or is there a bot compiling these lists? And in any case, is there any way to estimate how completely they cover a given category?
All the best,
Alek
Categories are done by hand, at most one could write a bot that looked for infobox or introduction text containing date of birth/death and automatically add the category if it didn't exist, but as a rule it seems that if someone's died then a date of death is usually there and usually so are the categories you'd need.
The easiest and most exact way would be a database query, which could look for *"born * died * 1941"* or just *"died * 1941"* in the first paragraph, and also that at least one word like *wrote / author / poet / painter* or *{{infobox person}}* in the text, or *"novelists | writers | painters | authors..."*appear in at least one category. That should do exactly what you need but you'll need to find someone to set up and run the query for you.
If not, then these other options might help somewhat......
(1) Biographies will often start like this: *NAME (born 18 May 1862, died 17 June 1941, Sweden) was a.....*
So you could search for articles with the words *died 1941* in them. Trouble is there are many reasons an article could have those words. Limiting it to biographical articles might help. Some search engines allow you to search for pages where the specific words appear close together but Wikipedia's search doesn't have that feature, or not yet. Even so this search does turn up useful results, especially combined with the * incategory:* operator. You can also narrow down by adding words that copyright creators are likely to have, such as "author" "playwright", "poet" "artist" etc. Try these searches:
died 1941http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=died+1941&fulltext=Search&ns0=1&redirs=1&profile=advanced born died 1941http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=born+died+1941+&fulltext=Search&ns0=1&profile=advanced (biographies with "died" will usually also have "born", use this to narrow down) died 1941 authorhttp://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=died+1941+author&fulltext=Search&ns0=1&profile=advanced (not so helpful) born died 1941 wrotehttp://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=born+died+1941+wrote&fulltext=Search&ns0=1&profile=advanced (adding one "copyright-creator" word seems to work, just. Adding more seems to confuse things) died 1941 incategory:"Polish writers"http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=died+1941+incategory%3A%22Polish+writers%22&fulltext=Search&ns0=1&profile=advanced (but doesn't pick up articles nested in subcategories)
(2) Google has proximate word searching and can be told to list content from just one site. All Wikipedia articles are indexed on Google. But it's very limited in what it will show you and can't detect other things needed to narrow it down. Try this in Google search:
born * died * 1941 site:en.wikipedia.orghttps://www.google.com/search?num=100&hl=en&ie=UTF-8&oe=UTF-8&q=born+*+died+*+1941+site:en.wikipedia.org
(3) A third option which will pick up names of articles (but no further details) is this category search tool: http://toolserver.org/~magnus/catscan_rewrite.php which lets you enter a category and search several layers deep. So nested categories will show up. Try entering "Novelists" under "categories" and "3" under "depth".
There may be other ways, such as common terms that only appear in biographies. Perhaps someone else will have ideas.
FT2
On Fri, Dec 23, 2011 at 5:15 PM, Alek Tarkowski < atarkowski@centrumcyfrowe.pl> wrote:
Jeremie, FT2,
thank you very much for your advice.
Do you have any idea how complete these lists are? Are they done by hand, or is there a bot compiling these lists? And in any case, is there any way to estimate how completely they cover a given category?
On Fri, Dec 23, 2011 at 06:02:37PM +0000, FT2 wrote:
Categories are done by hand, at most one could write a bot that looked for infobox or introduction text containing date of birth/death and automatically add the category if it didn't exist, but as a rule it seems that if someone's died then a date of death is usually there and usually so are the categories you'd need.
This is why we need rollout of some sort of semantic engine. I understand that rolling out the existing SMW "will never happen" on wikipedia :-(, but IIRC people were working on more lightweight systems?
sincerely, Kim Bruning
On 23 December 2011 19:59, Kim Bruning kim@bruning.xs4all.nl wrote:
On Fri, Dec 23, 2011 at 06:02:37PM +0000, FT2 wrote:
Categories are done by hand, at most one could write a bot that looked for infobox or introduction text containing date of birth/death and automatically add the category if it didn't exist, but as a rule it seems that if someone's died then a date of death is usually there and usually so are the categories you'd need.
This is why we need rollout of some sort of semantic engine. I understand that rolling out the existing SMW "will never happen"
Why d'you say that so categorically?
Theoretically, I don't see "why this shouldn't happen at some point".
on wikipedia :-(, but IIRC people were working on more lightweight systems?
sincerely, Kim Bruning --
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 12/23/2011 7:02 PM, FT2 wrote:
(3) A third option which will pick up names of articles (but no further details) is this category search tool: http://toolserver.org/~magnus/catscan_rewrite.php http://toolserver.org/%7Emagnus/catscan_rewrite.php which lets you enter a category and search several layers deep. So nested categories will show up. Try entering "Novelists" under "categories" and "3" under "depth".
The new CatScan is (IMHO) very unfriendly, you may find the old one more usable. Main page for all versions: http://en.wikipedia.org/wiki/Wikipedia:CatScan
Either is a nice tool to get a list of content creators who died in a given year.
I wonder if there would be a way to set up a ping that would let you know whenever a biography has the year of death added or is written. For new articles, you could consider some form of new article report. This is currently done by TedderBot http://en.wikipedia.org/wiki/User:TedderBot/NewPageSearch/Poland/archive, see how it works in practice for our http://en.wikipedia.org/wiki/Wikipedia:POLAND#New_articles_announcements
I am not very familiar with the semantic side of things, it would be nice, and perhaps http://en.wikipedia.org/wiki/Template:Persondata could be of use.
Categorisation is done manually, completeness varies from project to project and by topic area in the project. On the English language wikipedia we probably do have most of our novelists categorised as such. Deaths are a very different matter as many of our articles never pick up on the subject's death. However the anomalies that I've found there tend to be among people whose notable careers started and ended in their youth - sportspeople particularly. So I would suggest that a query for "1941 births" and "writers" might well give you what you want.
WSC
On 23 December 2011 17:15, Alek Tarkowski atarkowski@centrumcyfrowe.plwrote:
Jeremie, FT2,
thank you very much for your advice.
Do you have any idea how complete these lists are? Are they done by hand, or is there a bot compiling these lists? And in any case, is there any way to estimate how completely they cover a given category?
All the best,
Alek
-- dyrektor, Centrum Cyfrowe Projekt: Polska www: http://centrumcyfrowe.pl identi.ca / twitter: @centrumcyfrowe
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Dear Alek, dear list, exactly for this use case DBpedia (http://dbpedia.org ) was created, so you can query Wikipedia like a database. DBpedia already does the rollout to a "semantic engine", which you can query. Below I drafted some queries. These will give you all the Persons in Wikipedia that have a "deathDate". Totally there are 187739, which should be the most complete list you will find. Then the queries is refined to all persons, which died in 1941 (yielding 1318 persons), then all artists that died in 1941 and then all artists and their works!
Note that there is a static database which uses the latest dump: http://dbpedia.org/sparql http://dbpedia.org/snorql as well as a live version, which is synchronized directly (each edit is loaded into the engine) http://live.dbpedia.org/sparql Also for some of the other Wikipedias besides the English one, language specific versions exist: Polish: http://pl.dbpedia.org/ German: http://de.dbpedia.org Greek: http://el.dbpedia.org
DBpedia has quite a large community, I would estimate that over 1000 volunteers from the area of computer science and Semantic Web worked on or with it since 2006. (This does not account for industry partners or a like) .
@Alek I drafted some queries for you. There are a total of 5 result formats to choose from. Maybe json, plain or html are the one you are looking for. Here is a link to some user interfaces: http://wiki.dbpedia.org/OnlineAccess http://wiki.dbpedia.org/Applications
Feel free to improve the data directly in Wikipedia (and use the live endpoint 5 minutes later) or tailor the data how you like it at mappings wiki: http://mappings.dbpedia.org . Actuallly the information contained could help to clean up the infoboxes, which would also help the start of WikiData. Here is one hook though. The more precise the queries get, the worse recall will be, as minor errors in the data add up with each constraint.
Hope I could help, Sebastian
Queries below: ******************* A count of all persons that have a deathDate [1]: SELECT count (*) WHERE { ?person http://dbpedia.org/ontology/deathDate ?deathDate . ?person http://xmlns.com/foaf/0.1/page ?page . }
All persons that died in 1941. Note that on http:dbpedia.org there is a given limit of 1000, so you need to use OFFSET:
SELECT * WHERE { ?person http://dbpedia.org/ontology/deathDate ?deathDate . ?person http://xmlns.com/foaf/0.1/page ?page . FILTER(?deathDate >= "1941-01-01"^^xsd:date) FILTER(?deathDate <= "1942-01-01"^^xsd:date) } order by ?deathDate Limit 1000 OFFSET 0
SELECT * WHERE { ?person http://dbpedia.org/ontology/deathDate ?deathDate . ?person http://xmlns.com/foaf/0.1/page ?page . FILTER(?deathDate >= "1941-01-01"^^xsd:date) FILTER(?deathDate <= "1942-01-01"^^xsd:date) } order by ?deathDate Limit 1000 OFFSET 1000
All artists that died in 1941 [3] SELECT * WHERE { ?person http://dbpedia.org/ontology/deathDate ?deathDate . ?person http://xmlns.com/foaf/0.1/page ?page . ?person rdf:type http://dbpedia.org/ontology/Artist FILTER(?deathDate >= "1941-01-01"^^xsd:date) FILTER(?deathDate <= "1942-01-01"^^xsd:date) }
All artists and their work[4]: SELECT * WHERE { ?person http://dbpedia.org/ontology/deathDate ?deathDate . ?person rdf:type http://dbpedia.org/ontology/Artist . ?person http://xmlns.com/foaf/0.1/page ?page .
OPTIONAL { ?person ?works ?work . FILTER (?works in (http://dbpedia.org/property/works, http://dbpedia.org/property/notableworks, http://dbpedia.org/property/writer) ) } FILTER(?deathDate >= "1941-01-01"^^xsd:date) . FILTER(?deathDate <= "1942-01-01"^^xsd:date) . }
[1] http://dbpedia.org/snorql/?query=SELECT+count+%28*%29+WHERE+%7B%0D%0A%3Fpers...
[2] http://dbpedia.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A%3Fperson+%3Chttp%3...
[3] http://dbpedia.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A%3Fperson+%3Chttp%3...
[4] http://dbpedia.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A%3Fperson+%3Chttp%3...
On 12/23/2011 01:35 PM, Alek Tarkowski wrote:
Hello everyone,
I've been until now a lurker on this list, let me introduce myself - I'm a sociologist studying digital technologies, an activist (I run Creative Commons Poland) and I run a digital think tank / NGO in Poland.
I'm hoping someone on this list might be able to help me: I'm involved in the celebrations of the Public Domain Day - on the 1st of January each year works pass into the public domain of authors who've died 70 years ago (at least in Poland, and in most countries, but it might differ in some jurisdictions).
I'm looking for a good way to determine, who died in 1941 - and thought that Wikipedia will be a good place to find this out. I know there are lists of people who died in a given year, but they are not complete. Is there any way to automatically query Wikipedia for such information? I know that it's to some extent structured, as this information is provided in templates for biographical articles, but I don't know whether there is any mechanism for querying?
Any advice will be much appreciated.
All the best,
Alek
wiki-research-l@lists.wikimedia.org