Geonames [1] is a database which holds around 9 M entries of geographical related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4] https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Proper data-based stubs are being worked on: https://phabricator.wikimedia.org/project/profile/1416/ Lsjbot, you have no chance to survive make your time.
Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of geographical related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4] https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, PLEASE reconsider. A Wikidata based solution is not superior because it started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT and people at Wikidata collaborate. It will get things right the first time. It does not have to be perfect from the start as long as it gets better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the scripts LSJBOT uses, they can help us improve the text when more or better information becomes available. Thanks, GerardM
On 6 September 2015 at 08:25, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Proper data-based stubs are being worked on: https://phabricator.wikimedia.org/project/profile/1416/ Lsjbot, you have no chance to survive make your time.
Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of geographical related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
At svwp we work closely with Wikidata and see it as the natural base for our article substance. And we follow closely Phabricator and are eager to implement it as soon as it will be feasible to implement. And Lsjbot is in no way counteractive to these. It will be easy to exchange Lsjbot article with Phabricator generated ones when time is right.
But I believe you miss the point with what Lsjbot is doing now. The extensive research etc done on data in Geonames is one of the crucial efforts. And in a way all this generation project is a research on the viability to use this data for full in all language versions. If it still is seen as viable we could extend our article coverage for geographical entities with a factor 10 in all versions. And this research is a must even independently of which technique is used to generate the articles.
The other crucial effort is the extended intelligence built into the generation of facts in the articles. To find out close by physical object by clever algorithms is a intellectual effort of highest dignity. First when bot generating was introduced, it was more or less a mapping of items from input to items in output (in articles). We now see how more info is created by info only implicit existing in input and where it is combined with external (map) data
I can not enough press on how much I am impressed by Sverkers outstanding intellectual effort and his creativity in implementing and running software that is of great help reaching our common vision "free knowledge for all".
Anders
Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
Hoi, PLEASE reconsider. A Wikidata based solution is not superior because it started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT and people at Wikidata collaborate. It will get things right the first time. It does not have to be perfect from the start as long as it gets better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the scripts LSJBOT uses, they can help us improve the text when more or better information becomes available. Thanks, GerardM
On 6 September 2015 at 08:25, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Proper data-based stubs are being worked on: https://phabricator.wikimedia.org/project/profile/1416/ Lsjbot, you have no chance to survive make your time.
Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of geographical related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, Wouldn't have it been a better use of Sverker's brainpower to implement a long-term solution that doesn't require saving articles to the wiki?
Il 06/09/2015 11:23, Anders Wennersten ha scritto:
At svwp we work closely with Wikidata and see it as the natural base for our article substance. And we follow closely Phabricator and are eager to implement it as soon as it will be feasible to implement. And Lsjbot is in no way counteractive to these. It will be easy to exchange Lsjbot article with Phabricator generated ones when time is right.
But I believe you miss the point with what Lsjbot is doing now. The extensive research etc done on data in Geonames is one of the crucial efforts. And in a way all this generation project is a research on the viability to use this data for full in all language versions. If it still is seen as viable we could extend our article coverage for geographical entities with a factor 10 in all versions. And this research is a must even independently of which technique is used to generate the articles.
The other crucial effort is the extended intelligence built into the generation of facts in the articles. To find out close by physical object by clever algorithms is a intellectual effort of highest dignity. First when bot generating was introduced, it was more or less a mapping of items from input to items in output (in articles). We now see how more info is created by info only implicit existing in input and where it is combined with external (map) data
I can not enough press on how much I am impressed by Sverkers outstanding intellectual effort and his creativity in implementing and running software that is of great help reaching our common vision "free knowledge for all".
Anders
Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
Hoi, PLEASE reconsider. A Wikidata based solution is not superior because it started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT and people at Wikidata collaborate. It will get things right the first time. It does not have to be perfect from the start as long as it gets better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the scripts LSJBOT uses, they can help us improve the text when more or better information becomes available. Thanks, GerardM
On 6 September 2015 at 08:25, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Proper data-based stubs are being worked on: https://phabricator.wikimedia.org/project/profile/1416/ Lsjbot, you have no chance to survive make your time.
Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of geographical related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, As always I have been a big fan of the wonderful work that has been done. My reaction was very much for what I perceived as a negative reaction from Ricordisamoa. Telling you to stop and become part of Wikidata is a bit off. Asking for collaboration and work towards a common goal, a goal that you very much want to share as I perceive it in your reply is most wonderful and most welcome.
When your data is at a quality level where you create stubs, it is very much at the level where we should have it in Wikidata. Obviously it is for the Swedish community to have the stubs or experiment with cached articles based on Wikidata data. Obviously, we are at a point where we can create the stubs and where caching concepts is technically feasible but not something we have done so far.
What does it take to have such an experiment? Thanks, GerardM
On 6 September 2015 at 11:23, Anders Wennersten mail@anderswennersten.se wrote:
At svwp we work closely with Wikidata and see it as the natural base for our article substance. And we follow closely Phabricator and are eager to implement it as soon as it will be feasible to implement. And Lsjbot is in no way counteractive to these. It will be easy to exchange Lsjbot article with Phabricator generated ones when time is right.
But I believe you miss the point with what Lsjbot is doing now. The extensive research etc done on data in Geonames is one of the crucial efforts. And in a way all this generation project is a research on the viability to use this data for full in all language versions. If it still is seen as viable we could extend our article coverage for geographical entities with a factor 10 in all versions. And this research is a must even independently of which technique is used to generate the articles.
The other crucial effort is the extended intelligence built into the generation of facts in the articles. To find out close by physical object by clever algorithms is a intellectual effort of highest dignity. First when bot generating was introduced, it was more or less a mapping of items from input to items in output (in articles). We now see how more info is created by info only implicit existing in input and where it is combined with external (map) data
I can not enough press on how much I am impressed by Sverkers outstanding intellectual effort and his creativity in implementing and running software that is of great help reaching our common vision "free knowledge for all".
Anders
Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
Hoi, PLEASE reconsider. A Wikidata based solution is not superior because it started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT and people at Wikidata collaborate. It will get things right the first time. It does not have to be perfect from the start as long as it gets better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the scripts LSJBOT uses, they can help us improve the text when more or better information becomes available. Thanks, GerardM
On 6 September 2015 at 08:25, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Proper data-based stubs are being worked on:
https://phabricator.wikimedia.org/project/profile/1416/ Lsjbot, you have no chance to survive make your time.
Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of geographical
related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi,
"Article Placeholders are automatically generated content pages in Wikipedia or other mediawiki projects displaying data from Wikidata." Seriously? RobotWiki? Do we really want this? Quality, not quantity.
From: gerard.meijssen@gmail.com Date: Sun, 6 Sep 2015 11:35:31 +0200 To: wikimedia-l@lists.wikimedia.org Subject: Re: [Wikimedia-l] LsJbot and geonames
Hoi, As always I have been a big fan of the wonderful work that has been done. My reaction was very much for what I perceived as a negative reaction from Ricordisamoa. Telling you to stop and become part of Wikidata is a bit off. Asking for collaboration and work towards a common goal, a goal that you very much want to share as I perceive it in your reply is most wonderful and most welcome.
When your data is at a quality level where you create stubs, it is very much at the level where we should have it in Wikidata. Obviously it is for the Swedish community to have the stubs or experiment with cached articles based on Wikidata data. Obviously, we are at a point where we can create the stubs and where caching concepts is technically feasible but not something we have done so far.
What does it take to have such an experiment? Thanks, GerardM
On 6 September 2015 at 11:23, Anders Wennersten mail@anderswennersten.se wrote:
At svwp we work closely with Wikidata and see it as the natural base for our article substance. And we follow closely Phabricator and are eager to implement it as soon as it will be feasible to implement. And Lsjbot is in no way counteractive to these. It will be easy to exchange Lsjbot article with Phabricator generated ones when time is right.
But I believe you miss the point with what Lsjbot is doing now. The extensive research etc done on data in Geonames is one of the crucial efforts. And in a way all this generation project is a research on the viability to use this data for full in all language versions. If it still is seen as viable we could extend our article coverage for geographical entities with a factor 10 in all versions. And this research is a must even independently of which technique is used to generate the articles.
The other crucial effort is the extended intelligence built into the generation of facts in the articles. To find out close by physical object by clever algorithms is a intellectual effort of highest dignity. First when bot generating was introduced, it was more or less a mapping of items from input to items in output (in articles). We now see how more info is created by info only implicit existing in input and where it is combined with external (map) data
I can not enough press on how much I am impressed by Sverkers outstanding intellectual effort and his creativity in implementing and running software that is of great help reaching our common vision "free knowledge for all".
Anders
Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
Hoi, PLEASE reconsider. A Wikidata based solution is not superior because it started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT and people at Wikidata collaborate. It will get things right the first time. It does not have to be perfect from the start as long as it gets better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the scripts LSJBOT uses, they can help us improve the text when more or better information becomes available. Thanks, GerardM
On 6 September 2015 at 08:25, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Proper data-based stubs are being worked on:
https://phabricator.wikimedia.org/project/profile/1416/ Lsjbot, you have no chance to survive make your time.
Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of geographical
related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
2015-09-06 13:22 GMT+02:00 Steinsplitter Wiki steinsplitter-wiki@live.com:
Hoi,
"Article Placeholders are automatically generated content pages in Wikipedia or other mediawiki projects displaying data from Wikidata." Seriously? RobotWiki? Do we really want this? Quality, not quantity.
Yeah. I REALLY want this.
From: gerard.meijssen@gmail.com Date: Sun, 6 Sep 2015 11:35:31 +0200 To: wikimedia-l@lists.wikimedia.org Subject: Re: [Wikimedia-l] LsJbot and geonames
Hoi, As always I have been a big fan of the wonderful work that has been done. My reaction was very much for what I perceived as a negative reaction
from
Ricordisamoa. Telling you to stop and become part of Wikidata is a bit
off.
Asking for collaboration and work towards a common goal, a goal that you very much want to share as I perceive it in your reply is most wonderful and most welcome.
When your data is at a quality level where you create stubs, it is very much at the level where we should have it in Wikidata. Obviously it is
for
the Swedish community to have the stubs or experiment with cached
articles
based on Wikidata data. Obviously, we are at a point where we can create the stubs and where caching concepts is technically feasible but not something we have done so far.
What does it take to have such an experiment? Thanks, GerardM
On 6 September 2015 at 11:23, Anders Wennersten <
mail@anderswennersten.se>
wrote:
At svwp we work closely with Wikidata and see it as the natural base
for
our article substance. And we follow closely Phabricator and are eager
to
implement it as soon as it will be feasible to implement. And Lsjbot
is in
no way counteractive to these. It will be easy to exchange Lsjbot
article
with Phabricator generated ones when time is right.
But I believe you miss the point with what Lsjbot is doing now. The extensive research etc done on data in Geonames is one of the crucial efforts. And in a way all this generation project is a research on the viability to use this data for full in all language versions. If it
still
is seen as viable we could extend our article coverage for geographical entities with a factor 10 in all versions. And this research is a must
even
independently of which technique is used to generate the articles.
The other crucial effort is the extended intelligence built into the generation of facts in the articles. To find out close by physical
object
by clever algorithms is a intellectual effort of highest dignity. First when bot generating was introduced, it was more or less a mapping of
items
from input to items in output (in articles). We now see how more info
is
created by info only implicit existing in input and where it is
combined
with external (map) data
I can not enough press on how much I am impressed by Sverkers
outstanding
intellectual effort and his creativity in implementing and running
software
that is of great help reaching our common vision "free knowledge for
all".
Anders
Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
Hoi, PLEASE reconsider. A Wikidata based solution is not superior because
it
started from Wikidata.
PLEASE consider collaboration. It will be so much more powerful when LSJBOT and people at Wikidata collaborate. It will get things right the first time. It does not have to be perfect from the start as long as it gets better over time. As long as we always work on improving the data.
PLEASE consider text generation based on Wikidata. They are the
scripts
LSJBOT uses, they can help us improve the text when more or better information becomes available. Thanks, GerardM
On 6 September 2015 at 08:25, Ricordisamoa <
ricordisamoa@openmailbox.org>
wrote:
Proper data-based stubs are being worked on:
https://phabricator.wikimedia.org/project/profile/1416/ Lsjbot, you have no chance to survive make your time.
Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of
geographical
related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard
(and
these will not be generated) it was seen as very good in most
areas. In
the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location.
And I
was fascinated of the question of notability of wells in the Bahrain
desert
(which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then
when
it generated articles for species. It looks for relevant
geographical
items close to the actual one: a lake close by, a mountain and where is
the
nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more
than
what exist in enwp. Also for a well defined type like villages, almost
50% as
many has been generated than existing in enwp. One example [2]
where you
can see what has been generated (and note the reuse of a relevant
figure
existing in frwp). Please compare the corresponding articles on
other
languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many
months
more before completed and perhaps more M marks will be passed
before it
is through. If you want to give feedback you are welcome to enter it
at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner,
I am
just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3]
https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Congratulations for the stub creation, they are good (and better that those handmade stubs in other languages).
About the Wikidata placeholder project, it sounds very interesting.
2015-09-06 2:40 GMT+02:00 Anders Wennersten mail@anderswennersten.se:
Geonames [1] is a database which holds around 9 M entries of geographical related items from all over the world.
Lsjbot is now generating articles from a subset of it, after several months of extensive research on its quality, Wikidata relations and notability issues. While the quality in some regions is substandard (and these will not be generated) it was seen as very good in most areas. In the discussion I was intrigued to learn that identical Arabic names should be transcribed differently depending on its geographic location. And I was fascinated of the question of notability of wells in the Bahrain desert (which in the end was excluded, mostly because we knew too little of that reality)
In this run Lsjbot has extended its functionality even further then when it generated articles for species. It looks for relevant geographical items close to the actual one: a lake close by, a mountain and where is the nearest major town etc.
Macedonia can be taken as one example. Lsjbot generated over 10000 articles (and 5000 disambiguous pages) making it a magnitude more than what exist in enwp. Also for a well defined type like villages, almost 50% as many has been generated than existing in enwp. One example [2] where you can see what has been generated (and note the reuse of a relevant figure existing in frwp). Please compare the corresponding articles on other languages in this case, many having less information than the bot generated one.
The generation is still in early stage [3) but has already got the article count for svwp to pass 2 M today. But it will take many months more before completed and perhaps more M marks will be passed before it is through. If you want to give feedback you are welcome to enter it at [4]
Anders (with all credits for the Lsjbot to be given to Sverker, its owner, I am just one of the many supporters of him and his bot on svwp)
[1] http://www.geonames.org/about.html
[2] https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
[3] https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_pl...
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org