Nemo has found this wiki which I find very interesting [1]. it contains 1,68 million articles and seems to be a copy of articles from Lithunian Wikipedia + some 1,5 million botgenerated articles, with focus on species (i know from Lsjbot that there are at least some 1,3 M articles of species to be found from reliable databases)
The effort seems to be done by just a few lithuanians wikipedians with the right technical skill and insight on wikipedia, they are probably active also on ltwp[2].
For me it is a reminder what will happen if we continue to be sceptical of botgerneration of articles with correct info with verfied sources. Creative people will do it anyway and then outside Wikpedia, which could make Wikipedia redundant in the same way Wikipedia has made the old paperbased encyclopedias redundant. The online encyclopedia with most knowledge to the readers will survive, and botgenerated verified articles contains more knowledge then no article on the subject. Also note that the most active now are languages like Vietnamese and Lithunian, with small communities all aware it will take eons of time if to expected these will be created manually
I do would like the movement and upcoming strategy to make a proactive stand re semiautomted articles
On sv:wp we have had this focus, since last august with including upload on wikidata as part of the articlegeneration. We have found the inclusion of Wikidata much more complex then we anticipated. We thought half a year would be enough to "get a set of items with proper 100% quality data into Wikidata", but we now think it will take something like two years for just a small set of 10000 articles :( This have not changed our belief in this approach, but we would certainly appreciate it there were other entities doing the same and with whom we could exchange experience (or a central initiative)
Anders
[1] Start page http://lietuvai.lt/wiki/Pagrindinis_puslapis Latest changes http://lietuvai.lt/wiki/Specialus:Naujausi_puslapiai For random article press Atsitiktinis puslapis http://lietuvai.lt/wiki/Specialus:Atsitiktinis_puslapis/Straipsnis [2] ltwp https://lt.wikipedia.org/wiki/Pagrindinis_puslapis
Bot generated articles have been important throughout the history of the wiki Projects. They are essential to our future. They have also always been controversial with some editors.
Agreed that not showing them or remaining skeptical rather than learning to use them better will be a proviso and may lead to forks. I am sad when I see veryactive bot and script users blocked on larger wikis (Rich Farmborough comes to mind from enwp) and perhaps we can find ways to recognize the best bots just as we do articles. On Feb 4, 2014 3:31 AM, "Anders Wennersten" mail@anderswennersten.se wrote:
Nemo has found this wiki which I find very interesting [1]. it contains 1,68 million articles and seems to be a copy of articles from Lithunian Wikipedia + some 1,5 million botgenerated articles, with focus on species (i know from Lsjbot that there are at least some 1,3 M articles of species to be found from reliable databases)
The effort seems to be done by just a few lithuanians wikipedians with the right technical skill and insight on wikipedia, they are probably active also on ltwp[2].
For me it is a reminder what will happen if we continue to be sceptical of botgerneration of articles with correct info with verfied sources. Creative people will do it anyway and then outside Wikpedia, which could make Wikipedia redundant in the same way Wikipedia has made the old paperbased encyclopedias redundant. The online encyclopedia with most knowledge to the readers will survive, and botgenerated verified articles contains more knowledge then no article on the subject. Also note that the most active now are languages like Vietnamese and Lithunian, with small communities all aware it will take eons of time if to expected these will be created manually
I do would like the movement and upcoming strategy to make a proactive stand re semiautomted articles
On sv:wp we have had this focus, since last august with including upload on wikidata as part of the articlegeneration. We have found the inclusion of Wikidata much more complex then we anticipated. We thought half a year would be enough to "get a set of items with proper 100% quality data into Wikidata", but we now think it will take something like two years for just a small set of 10000 articles :( This have not changed our belief in this approach, but we would certainly appreciate it there were other entities doing the same and with whom we could exchange experience (or a central initiative)
Anders
[1] Start page http://lietuvai.lt/wiki/Pagrindinis_puslapis Latest changes http://lietuvai.lt/wiki/Specialus:Naujausi_puslapiai For random article press Atsitiktinis puslapis http://lietuvai.lt/wiki/ Specialus:Atsitiktinis_puslapis/Straipsnis [2] ltwp https://lt.wikipedia.org/wiki/Pagrindinis_puslapis _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Sam, I am quite concerned that you would use a public mailing list to express your displeasure about a specific individual's block on a particular project, without ensuring that you had your facts straight. It is unfair not only to the project involved, but to the person who is blocked: nobody needs to have a board trustee shining a bright light on their removal from the project. In fact, your using a specific editor as your poster boy for bot editing without knowing why his restrictions are in place is rather inconsiderate to the editor, the project, and the other people who think you're giving wise counsel.
Before you do that in the future, perhaps it would be a good idea to understand why a project had to, after years of trying to work with a valued editor and to mitigate the problems caused, finally remove him from the project.
Risker
On 4 February 2014 07:05, Samuel Klein meta.sj@gmail.com wrote:
Bot generated articles have been important throughout the history of the wiki Projects. They are essential to our future. They have also always been controversial with some editors.
Agreed that not showing them or remaining skeptical rather than learning to use them better will be a proviso and may lead to forks. I am sad when I see veryactive bot and script users blocked on larger wikis (Rich Farmborough comes to mind from enwp) and perhaps we can find ways to recognize the best bots just as we do articles. On Feb 4, 2014 3:31 AM, "Anders Wennersten" mail@anderswennersten.se wrote:
Nemo has found this wiki which I find very interesting [1]. it contains 1,68 million articles and seems to be a copy of articles from Lithunian Wikipedia + some 1,5 million botgenerated articles, with focus on species (i know from Lsjbot that there are at least some 1,3 M articles of
species
to be found from reliable databases)
The effort seems to be done by just a few lithuanians wikipedians with
the
right technical skill and insight on wikipedia, they are probably active also on ltwp[2].
For me it is a reminder what will happen if we continue to be sceptical
of
botgerneration of articles with correct info with verfied sources.
Creative
people will do it anyway and then outside Wikpedia, which could make Wikipedia redundant in the same way Wikipedia has made the old
paperbased
encyclopedias redundant. The online encyclopedia with most knowledge to
the
readers will survive, and botgenerated verified articles contains more knowledge then no article on the subject. Also note that the most active now are languages like Vietnamese and Lithunian, with small communities
all
aware it will take eons of time if to expected these will be created manually
I do would like the movement and upcoming strategy to make a proactive stand re semiautomted articles
On sv:wp we have had this focus, since last august with including upload on wikidata as part of the articlegeneration. We have found the inclusion of Wikidata much more complex then we anticipated. We thought half a year would be enough to "get a set of items with proper 100% quality data into Wikidata", but we now think it will take something like two years for
just
a small set of 10000 articles :( This have not changed our belief in this approach, but we would certainly appreciate it there were other entities doing the same and with whom we could exchange experience (or a central initiative)
Anders
[1] Start page http://lietuvai.lt/wiki/Pagrindinis_puslapis Latest changes http://lietuvai.lt/wiki/Specialus:Naujausi_puslapiai For random article press Atsitiktinis puslapis http://lietuvai.lt/wiki/ Specialus:Atsitiktinis_puslapis/Straipsnis [2] ltwp https://lt.wikipedia.org/wiki/Pagrindinis_puslapis _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On 4 February 2014 12:40, Risker risker.wp@gmail.com wrote:
Before you do that in the future, perhaps it would be a good idea to understand why a project had to, after years of trying to work with a valued editor and to mitigate the problems caused, finally remove him from the project.
Because hitting Control-V was deemed to constitute "automation", wasn't it?
- d.
Risker, 04/02/2014 13:40:
Sam, I am quite concerned that you would use a public mailing list to express your displeasure about a specific individual's block [...]
You're putting words in his mouth. Saying, for instance, how sad it is that about 1 % of the USA population is in jail doesn't equal saying that all people in jail should be immediately liberated; similarly, I'm always sad when I block a user, because it's a failure, but that doesn't mean I won't do what's needed.
Nemo
On 4 February 2014 08:55, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Risker, 04/02/2014 13:40:
Sam, I am quite concerned that you would use a public mailing list to express your displeasure about a specific individual's block [...]
You're putting words in his mouth. Saying, for instance, how sad it is that about 1 % of the USA population is in jail doesn't equal saying that all people in jail should be immediately liberated; similarly, I'm always sad when I block a user, because it's a failure, but that doesn't mean I won't do what's needed.
Nemo, he named a specific user. I don't think I'm putting words in his
mouth.
The vast majority of users who do a lot of bot edits are still merrily working away on English Wikipedia.
Risker
On 4 February 2014 14:03, Risker risker.wp@gmail.com wrote: ..
The vast majority of users who do a lot of bot edits are still merrily working away on English Wikipedia.
As someone who has made around 3 million automated edits on Commons and uploaded over 200,000 valuable educational images there, I would love to do similar work to benefit the English Wikipedia. I do not feel in the least bit encouraged to even try to set up a content creation or even an uncontroversial en.wp house-keeping project in 2014 considering how much of my volunteer time would be lost it debate any proposal there is likely to create, compared to the simplicity of other Wikimedia projects.
Knowing what happens to anyone who becomes "of interest" and has a large number of edits, along with the associated endless repeated attempts to find any single problematic edit out of hundreds of thousands of perfectly good content creation, I find the word "merrily" a poor choice. The extraordinary case that Sam mentioned has been a widely discussed lesson to all bot-writers, many of us carefully do our work in a way that avoids ever attempting to put our heads above the parapet and risk becoming targets of depressing damaging witch-hunts, reputation ruining bad faith allegations and extreme effectively *years*-long sanctions from those with big hammers. So rather than "merrily" one might better chose from "cautiously", "covertly" or even "fearfully" and "not".
Risker, out of interest, considering my long track record of useful bot-work on Commons, would you support my proposal to let Faebot do some sensible non-controversial work on en.wp or do you think I am a danger to Wikimedia?
Fae
On 4 February 2014 10:30, Fæ faewik@gmail.com wrote:
On 4 February 2014 14:03, Risker risker.wp@gmail.com wrote: ..
The vast majority of users who do a lot of bot edits are still merrily working away on English Wikipedia.
As someone who has made around 3 million automated edits on Commons and uploaded over 200,000 valuable educational images there, I would love to do similar work to benefit the English Wikipedia. I do not feel in the least bit encouraged to even try to set up a content creation or even an uncontroversial en.wp house-keeping project in 2014 considering how much of my volunteer time would be lost it debate any proposal there is likely to create, compared to the simplicity of other Wikimedia projects.
Knowing what happens to anyone who becomes "of interest" and has a large number of edits, along with the associated endless repeated attempts to find any single problematic edit out of hundreds of thousands of perfectly good content creation, I find the word "merrily" a poor choice. The extraordinary case that Sam mentioned has been a widely discussed lesson to all bot-writers, many of us carefully do our work in a way that avoids ever attempting to put our heads above the parapet and risk becoming targets of depressing damaging witch-hunts, reputation ruining bad faith allegations and extreme effectively *years*-long sanctions from those with big hammers. So rather than "merrily" one might better chose from "cautiously", "covertly" or even "fearfully" and "not".
Risker, out of interest, considering my long track record of useful bot-work on Commons, would you support my proposal to let Faebot do some sensible non-controversial work on en.wp or do you think I am a danger to Wikimedia?
I'd defer to the opinion of the Bot Approval Group, Fae. Bots have done
(and continue to do) extremely useful work on English Wikipedia. They've also been involved with some difficult-to-fix harm (usually unintentional, by poor programming or without understanding of underlying content issues), and unfortunately there has been a pattern of a handful of bot owners not cleaning up those sorts of problems. This has resulted in the bar being raised for everyone.
The issue of bot article creation is one that will vary widely from project to project depending on the culture and philosophy of the community. If we think a bit, we're all likely to come up with a project or two that expanded rapidly with the use of bots, only to find that the content added had to be removed because it didn't meet copyright requirements or was of very poor quality. On the other hand, we've also seen brilliant successes. And yes, there was some fairly significant early expansion of English Wikipedia through bot article creation. Some of those articles have barely been touched since - except by other bots.
Risker
On 4 February 2014 15:54, Risker risker.wp@gmail.com wrote:
Risker, out of interest, considering my long track record of useful bot-work on Commons, would you support my proposal to let Faebot do some sensible non-controversial work on en.wp or do you think I am a danger to Wikimedia?
I'd defer to the opinion of the Bot Approval Group, Fae. Bots have done
(and continue to do) extremely useful work on English Wikipedia. They've also been involved with some difficult-to-fix harm (usually unintentional, by poor programming or without understanding of underlying content issues), and unfortunately there has been a pattern of a handful of bot owners not cleaning up those sorts of problems. This has resulted in the bar being raised for everyone.
The issue of bot article creation is one that will vary widely from project to project depending on the culture and philosophy of the community. If we think a bit, we're all likely to come up with a project or two that expanded rapidly with the use of bots, only to find that the content added had to be removed because it didn't meet copyright requirements or was of very poor quality. On the other hand, we've also seen brilliant successes. And yes, there was some fairly significant early expansion of English Wikipedia through bot article creation. Some of those articles have barely been touched since - except by other bots.
Risker
I take that as a no.
Fae
On 4 February 2014 11:21, Fæ faewik@gmail.com wrote:
On 4 February 2014 15:54, Risker risker.wp@gmail.com wrote:
Risker, out of interest, considering my long track record of useful bot-work on Commons, would you support my proposal to let Faebot do some sensible non-controversial work on en.wp or do you think I am a danger to Wikimedia?
I'd defer to the opinion of the Bot Approval Group, Fae. Bots have done
(and continue to do) extremely useful work on English Wikipedia. They've also been involved with some difficult-to-fix harm (usually
unintentional,
by poor programming or without understanding of underlying content
issues),
and unfortunately there has been a pattern of a handful of bot owners not cleaning up those sorts of problems. This has resulted in the bar being raised for everyone.
The issue of bot article creation is one that will vary widely from
project
to project depending on the culture and philosophy of the community. If
we
think a bit, we're all likely to come up with a project or two that expanded rapidly with the use of bots, only to find that the content
added
had to be removed because it didn't meet copyright requirements or was of very poor quality. On the other hand, we've also seen brilliant
successes.
And yes, there was some fairly significant early expansion of English Wikipedia through bot article creation. Some of those articles have
barely
been touched since - except by other bots.
Risker
I take that as a no.
That's unfortunate, Fae. It's meant to say "I don't have the knowledge to analyse whether or not your bot works, so I would defer to those who do." I don't think I'm qualified to figure out whether or not your bots, or anyone else's bots, should be operating on Wikipedia.
I'd have the same answer to a developer who wanted me to review code, or an engineer who wanted me to look at his designs for an internal combustion engine. It's just knowledge outside of my scope.
Risker
Perhaps it would be a good idea to understand how bad ArbCom managed the Rich Farmbrough case by putting him against a slow death that would ultimately end in a year-long ban handled by a single administrator.
On 4 February 2014 16:42, Harold Hidalgo hahc21@gmail.com wrote:
Perhaps it would be a good idea to understand how bad ArbCom managed the Rich Farmbrough case by putting him against a slow death that would ultimately end in a year-long ban handled by a single administrator.
Risker has not noted her personal involvement in such. She's not defending the treatment of Rich Farmbrough as any sort of uninvolved commentator.
- d.
On 4 February 2014 11:45, David Gerard dgerard@gmail.com wrote:
On 4 February 2014 16:42, Harold Hidalgo hahc21@gmail.com wrote:
Perhaps it would be a good idea to understand how bad ArbCom managed the Rich Farmbrough case by putting him against a slow death that would ultimately end in a year-long ban handled by a single administrator.
Risker has not noted her personal involvement in such. She's not defending the treatment of Rich Farmbrough as any sort of uninvolved commentator.
I'm not defending the treatment of any individual editor, David. I'm saying that it is wrong, just plain wrong, to try to leverage a situation involving any individual editor by name when making what is an otherwise valid point, particularly when unfamiliar with the entire background. Rich doesn't deserve to have his case reheard on this mailing list, when there's not a darn thing that's going to change as a result of it. He is a decent person and a dedicated Wikimedian, and people shouldn't be using his name to make political points.
I do try to stand up to that principle; there've been numerous opportunities for me over the years to point to the behaviour of specific individuals and try to make hay out of them. I may not always succeed, but I really do try, especially on this global mailing list.
Risker
Risker, 04/02/2014 17:59:
doesn't deserve to have his case reheard on this mailing list
Then it would have been useful if you had refrained from issuing a motion of order against a simple, incidental 7-words mention, making this (otherwise quiet) thread into a television legal drama with the continuous scenes of "objection!" and the judge telling the court to ignore the rampant attorney's harangue.
Nemo
On 4 February 2014 12:27, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Risker, 04/02/2014 17:59:
doesn't deserve to have his case reheard on this mailing list
Then it would have been useful if you had refrained from issuing a motion of order against a simple, incidental 7-words mention, making this (otherwise quiet) thread into a television legal drama with the continuous scenes of "objection!" and the judge telling the court to ignore the rampant attorney's harangue.
I'm not sure I entirely understand your point here, Nemo, but nonetheless since it seems to be the opinion of several people in this thread that I was personally responsible for this whole mess, I'll simplly suggest that people read the actual case[1] where the Arbitration Committee upheld not one but two *community* restrictions on the user in question, and took steps to ensure that the community's decision was enforced.
Of course, if the Arbitration Committee had overturned the community restrictions, then it would be pilloried for blatantly ignoring a decision that the community had every right to make without Arbcom's involvement.
So meanwhile, I look at my watchlist and note that about 15% of the edits on it were made by bots - and as far as I can see, none of them are problematic. Some of the bots on English Wikipedia have been editing longer than I have, and more are created all the time. There are a lot of really excellent bots around, and a lot of bots that might cause problems are weeded out or improved when they get to the Bot Approvals Group. Bots aren't the problem.
Risker
[1] https://en.wikipedia.org/wiki/Wikipedia:Arbitration/Requests/Case/Rich_Farmb...
On 4 February 2014 17:48, Risker risker.wp@gmail.com wrote:
On 4 February 2014 12:27, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Risker, 04/02/2014 17:59:
doesn't deserve to have his case reheard on this mailing list
Risker, here's a great tip: If you *really* do not want the case reheard, then why not just stop emailing and publicising the case, repeating the name of the accused and providing links in an attempt to prove some point or other. Try spending your volunteer time welcoming a few new Wikipedia editors instead of banning contributors and making life ghastly for those who are under your hammer.
I welcomed and helped many thousands of newer en.wp contributors during the time I was an admin. Even after the machinations of a few individuals shot my reputation to hell and got me banned from what was my home project for many years, I still helped Wikipedians with tricky problems behind the scenes; in fact my highly publicised case made me someone that those upset and having difficulties with our arcane processes could turn to in confidence, in a way that most trusted users do not have the real life experience and grey hairs to offer. Having authority is not all about dishing it out, or even having the badges to prove you must be respected.
Fae
A great tip would be to avoid changing this thread into a personal attack on Risker or anybody else. Thank you. Thyge/Sir48
2014-02-04 Fæ faewik@gmail.com:
On 4 February 2014 17:48, Risker risker.wp@gmail.com wrote:
On 4 February 2014 12:27, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Risker, 04/02/2014 17:59:
doesn't deserve to have his case reheard on this mailing list
Risker, here's a great tip: If you *really* do not want the case reheard, then why not just stop emailing and publicising the case, repeating the name of the accused and providing links in an attempt to prove some point or other. Try spending your volunteer time welcoming a few new Wikipedia editors instead of banning contributors and making life ghastly for those who are under your hammer.
I welcomed and helped many thousands of newer en.wp contributors during the time I was an admin. Even after the machinations of a few individuals shot my reputation to hell and got me banned from what was my home project for many years, I still helped Wikipedians with tricky problems behind the scenes; in fact my highly publicised case made me someone that those upset and having difficulties with our arcane processes could turn to in confidence, in a way that most trusted users do not have the real life experience and grey hairs to offer. Having authority is not all about dishing it out, or even having the badges to prove you must be respected.
Fae
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On 4 February 2014 20:03, Thyge ltl.privat@gmail.com wrote:
A great tip would be to avoid changing this thread into a personal attack on Risker or anybody else. Thank you. Thyge/Sir48
Er, that was the point of my tip to Risker.
Fae
@Anders: I seem to have unintentionally derailed your excellent thread. My apologies; I've taken responses to that subthread offline. To return to your main point: we do need 'A strategy for semi-automated article generation; and inclusion of Wikidata'.
Anders Wennersten writes: < [we] will not be able to achieve our goal without... technical expertise < (like knowledge in Lua, how to write datainterface to external dataproviders)
And it is important to attract and expand this sort of expertise. Not only through local chapter support but through collaboration across different project-communities, as you say.
@Gerard: I second your vision for Wikidata. It is a natural place to cultivate tools for large-scale creation and enhancement of information. And for now it seems open to experimentation, being bold, trying and reverting things.
Wikidata is a wiki. You indicate that the official sources need work. Wikidata is a good place to work on this.
+1 !
Sam.
Thanks Sam, your answer warms my soul!
And you summarize my key points excellent, (and clearer than I managed myself)
@Gerard: Our visions are very close and I support yours in general. On a more concrete level it seems we have some different views, it could be misundertandings from my side, it could be that we think of different article subject segments or even that we have different perspective on what can be feasible at different point in time. My strong belief (and life experience) is that is in the meeting of different perspectives, like ours in this case, that really bright concepts and solutions turns up! And unfortunately a mail list is not really working for an exchange of ideas and concepts, so I wonder over possibilities to have some time a IRL gathering to really discuss through these issues and reach new enlightenments. I am open to anywhere anyplace, Wikimania could be one opportunity if it does not put this a bit far away. Or could we create a special subtrack at Wikimania for this??
Anders
Samuel Klein skrev 2014-02-06 21:29:
@Anders: I seem to have unintentionally derailed your excellent thread. My apologies; I've taken responses to that subthread offline. To return to your main point: we do need 'A strategy for semi-automated article generation; and inclusion of Wikidata'.
Anders Wennersten writes: < [we] will not be able to achieve our goal without... technical expertise < (like knowledge in Lua, how to write datainterface to external dataproviders)
And it is important to attract and expand this sort of expertise. Not only through local chapter support but through collaboration across different project-communities, as you say.
@Gerard: I second your vision for Wikidata. It is a natural place to cultivate tools for large-scale creation and enhancement of information. And for now it seems open to experimentation, being bold, trying and reverting things.
Wikidata is a wiki. You indicate that the official sources need work. Wikidata is a good place to work on this.
+1 !
Sam.
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On 4 February 2014 16:45, David Gerard dgerard@gmail.com wrote:
On 4 February 2014 16:42, Harold Hidalgo hahc21@gmail.com wrote:
Perhaps it would be a good idea to understand how bad ArbCom managed the ... case by putting him against a slow death that would ultimately end in a year-long ban handled by a single administrator.
Risker has not noted her personal involvement in such. She's not defending the treatment of ... as any sort of uninvolved commentator.
Equally odd is deciding spontaneously to opine on the topic of using bots, with a track record of being a Wikimedia expert and authority in the case mentioned, while also being someone who would rather "defer to the opinion of the Bot Approval Group" when asked directly for opinions on whether myself as a highly active and successful Commons bot writer is a menace to Wikimedia - but as someone who also been subject to years of depressing ridicule, after being subject to the devastating effect of Risker's personal intervention.
It would be great if the English Wikipedia were becoming a more open and welcoming environment, including positive encouragement for bot writers. I just don't see it being led in that direction, instead over the last few years I see it being looked at by other Wikimedia projects as a lesson in how to avoid pointless bureaucracy and hostility to new users or those with minority viewpoints.
Fae
Hoi,
At Wikidata the number of items and the associated data is growing steadily. We are dealing with the aftermath of some bots and to be honest, that is also very much the name of the game.
An example: many species have been added in the ceb nl sv Wikipedia and it would be wonderful if the "parent taxon" would be included [1] for all of them. This is now happening in a "one at a time" fashion.
What is also happening is new information that is added in Wikidata from external sources. I blogged about this [2] and in my opinion this is fabulous. What is so great is that any Wikipedia that includes "Wikidata search" to its extended search already benefits. Every community can choose to add stub articles based on the information in Wikidata.
In my opinion data that has some relevance can be included in Wikidata particularly when it is rich in statements and references to external sources. With great information in Wikidata, it becomes possible to use it to build even more extensive stub articles. Such things are starting to happen.
Bot created information is controversial in many Wikipedias. It is not in Wikidata. Very welcome is all the data that enriches the items we already know. Very welcome is the data on the things we do not yet know but appreciate as relevant. Thanks, GerardM
[1] http://ultimategerardm.blogspot.nl/2014/01/taxonomy-where-there-is-nothing.h... [2] http://ultimategerardm.blogspot.nl/2014/02/wikidata-ntf4-human-gene.html
On 4 February 2014 09:31, Anders Wennersten mail@anderswennersten.sewrote:
Nemo has found this wiki which I find very interesting [1]. it contains 1,68 million articles and seems to be a copy of articles from Lithunian Wikipedia + some 1,5 million botgenerated articles, with focus on species (i know from Lsjbot that there are at least some 1,3 M articles of species to be found from reliable databases)
The effort seems to be done by just a few lithuanians wikipedians with the right technical skill and insight on wikipedia, they are probably active also on ltwp[2].
For me it is a reminder what will happen if we continue to be sceptical of botgerneration of articles with correct info with verfied sources. Creative people will do it anyway and then outside Wikpedia, which could make Wikipedia redundant in the same way Wikipedia has made the old paperbased encyclopedias redundant. The online encyclopedia with most knowledge to the readers will survive, and botgenerated verified articles contains more knowledge then no article on the subject. Also note that the most active now are languages like Vietnamese and Lithunian, with small communities all aware it will take eons of time if to expected these will be created manually
I do would like the movement and upcoming strategy to make a proactive stand re semiautomted articles
On sv:wp we have had this focus, since last august with including upload on wikidata as part of the articlegeneration. We have found the inclusion of Wikidata much more complex then we anticipated. We thought half a year would be enough to "get a set of items with proper 100% quality data into Wikidata", but we now think it will take something like two years for just a small set of 10000 articles :( This have not changed our belief in this approach, but we would certainly appreciate it there were other entities doing the same and with whom we could exchange experience (or a central initiative)
Anders
[1] Start page http://lietuvai.lt/wiki/Pagrindinis_puslapis Latest changes http://lietuvai.lt/wiki/Specialus:Naujausi_puslapiai For random article press Atsitiktinis puslapis http://lietuvai.lt/wiki/ Specialus:Atsitiktinis_puslapis/Straipsnis [2] ltwp https://lt.wikipedia.org/wiki/Pagrindinis_puslapis _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Thanks for your input!
I agree that with Wikidata we can generate article content semiautomatic without the controversy we have seen as for now.
But our learning is it takes much more time then expected to get Wikidata operational on the data we want to get into it
For our data we are working with just now, Swedish entities like administrative units, towns, parishes, lakes etc we have found: 1.Before we load Wikidata we must have the identities correct on svwp, both name and official "entitynumber" and coordinates. We thought this would be simple but we find that it takes much longer then anticipated as we want this data to be 99,8% correct not 98% as we are used to have (if we do not load widata with top Q it can not be recommended as a general source of info for all versions). And we find loads of problems and errors and ambiguous data in the sources we use from our authorities (besides typing errors in wp). No one has ever scrutinized this official data as we Wikipedians are doing now. And just for the basic 10000 entities, it will take our group of five-six up to a year to get this right 2.before we then load data in Wikidata we must have identified the correct properties and in many cases get new ones in place. It is just a few week since it was possible to enter populations, and an important property like geoshape is far away yet. And for the new property unique for our project we have to work through the wikidata defintionprocess, that can easily take 6-9 month for a single property, All of this must of course be ready before we start the actual loading. 3.The actual load in wikidata is then quite straight forward (by bot). But to make use of data in Wikidata we need to have new templates in place in our language version. And here we find we need for many dataitems to have modules written in Lua in order for the data to be handled in the template in order to present data correct in the articles 4.After loading wikidata we need to work through our articles for them to base their data on wikidata. Some of this is done without problems with using templates and going through the articles with bots. But we expect also there will be a need in several articles to make manual adjustment to get all correct (like factdata residing in the text portion that should now be taken from Wikdata)
But if we come this far, our articles will be perfect and we can produce a set of software, like templetes and modules that make implementation of these data in other language versions very easy
Another learning is that we actually will not be able to achieve our goal without the support of technical expertise (like knowledge in Lua, how to write datainterface to external dataproviders) . Right now we are discussing with our local chapter, if they can provide technical expertise when ours is not enough , we are after all wikipedians not tech wizards
And we are missing to have colleagues on other language versions to discuss with, it is very complex.
Anders
Gerard Meijssen skrev 2014-02-05 12:16:
Hoi,
At Wikidata the number of items and the associated data is growing steadily. We are dealing with the aftermath of some bots and to be honest, that is also very much the name of the game.
An example: many species have been added in the ceb nl sv Wikipedia and it would be wonderful if the "parent taxon" would be included [1] for all of them. This is now happening in a "one at a time" fashion.
What is also happening is new information that is added in Wikidata from external sources. I blogged about this [2] and in my opinion this is fabulous. What is so great is that any Wikipedia that includes "Wikidata search" to its extended search already benefits. Every community can choose to add stub articles based on the information in Wikidata.
In my opinion data that has some relevance can be included in Wikidata particularly when it is rich in statements and references to external sources. With great information in Wikidata, it becomes possible to use it to build even more extensive stub articles. Such things are starting to happen.
Bot created information is controversial in many Wikipedias. It is not in Wikidata. Very welcome is all the data that enriches the items we already know. Very welcome is the data on the things we do not yet know but appreciate as relevant. Thanks, GerardM
[1] http://ultimategerardm.blogspot.nl/2014/01/taxonomy-where-there-is-nothing.h... [2] http://ultimategerardm.blogspot.nl/2014/02/wikidata-ntf4-human-gene.html
On 4 February 2014 09:31, Anders Wennersten mail@anderswennersten.sewrote:
Nemo has found this wiki which I find very interesting [1]. it contains 1,68 million articles and seems to be a copy of articles from Lithunian Wikipedia + some 1,5 million botgenerated articles, with focus on species (i know from Lsjbot that there are at least some 1,3 M articles of species to be found from reliable databases)
The effort seems to be done by just a few lithuanians wikipedians with the right technical skill and insight on wikipedia, they are probably active also on ltwp[2].
For me it is a reminder what will happen if we continue to be sceptical of botgerneration of articles with correct info with verfied sources. Creative people will do it anyway and then outside Wikpedia, which could make Wikipedia redundant in the same way Wikipedia has made the old paperbased encyclopedias redundant. The online encyclopedia with most knowledge to the readers will survive, and botgenerated verified articles contains more knowledge then no article on the subject. Also note that the most active now are languages like Vietnamese and Lithunian, with small communities all aware it will take eons of time if to expected these will be created manually
I do would like the movement and upcoming strategy to make a proactive stand re semiautomted articles
On sv:wp we have had this focus, since last august with including upload on wikidata as part of the articlegeneration. We have found the inclusion of Wikidata much more complex then we anticipated. We thought half a year would be enough to "get a set of items with proper 100% quality data into Wikidata", but we now think it will take something like two years for just a small set of 10000 articles :( This have not changed our belief in this approach, but we would certainly appreciate it there were other entities doing the same and with whom we could exchange experience (or a central initiative)
Anders
[1] Start page http://lietuvai.lt/wiki/Pagrindinis_puslapis Latest changes http://lietuvai.lt/wiki/Specialus:Naujausi_puslapiai For random article press Atsitiktinis puslapis http://lietuvai.lt/wiki/ Specialus:Atsitiktinis_puslapis/Straipsnis [2] ltwp https://lt.wikipedia.org/wiki/Pagrindinis_puslapis _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, Anders, I am afraid that the way you describe is one where perfection is the enemy of the good.
Wikidata is full of imperfections. It is incomplete and often so wrong... how about prime ministers of the United Kingdom who have been dead for centuries featuring as an actor in several movies ???
When the data that are to be used in stubs or articles are uploaded to Wikidata you can as happily improve them in Wikidata as anywhere else. What is possible is to have tools on Wikidata like Reasonator that will help you get to grips with the consistency.
When you state that it takes months to get the properties in Wikidata that are needed to do your project, I find it a big problem at Wikidata and definitely something that needs attention. When you know what the set of properties it is you need, you can propose them as a lot and make it obvious that they have to be considered together. As you know the quantity type of properties have been released. So this is a good time to make your proposals.
Most of all, Wikidata is a wiki. You indicate that the official sources need work. Wikidata is a good place to work on this. When it is a years work for you to beaver away, it may find more people to work with you on solutions at Wikidata. Thanks, GerardM
On 5 February 2014 13:21, Anders Wennersten mail@anderswennersten.sewrote:
Thanks for your input!
I agree that with Wikidata we can generate article content semiautomatic without the controversy we have seen as for now.
But our learning is it takes much more time then expected to get Wikidata operational on the data we want to get into it
For our data we are working with just now, Swedish entities like administrative units, towns, parishes, lakes etc we have found: 1.Before we load Wikidata we must have the identities correct on svwp, both name and official "entitynumber" and coordinates. We thought this would be simple but we find that it takes much longer then anticipated as we want this data to be 99,8% correct not 98% as we are used to have (if we do not load widata with top Q it can not be recommended as a general source of info for all versions). And we find loads of problems and errors and ambiguous data in the sources we use from our authorities (besides typing errors in wp). No one has ever scrutinized this official data as we Wikipedians are doing now. And just for the basic 10000 entities, it will take our group of five-six up to a year to get this right 2.before we then load data in Wikidata we must have identified the correct properties and in many cases get new ones in place. It is just a few week since it was possible to enter populations, and an important property like geoshape is far away yet. And for the new property unique for our project we have to work through the wikidata defintionprocess, that can easily take 6-9 month for a single property, All of this must of course be ready before we start the actual loading. 3.The actual load in wikidata is then quite straight forward (by bot). But to make use of data in Wikidata we need to have new templates in place in our language version. And here we find we need for many dataitems to have modules written in Lua in order for the data to be handled in the template in order to present data correct in the articles 4.After loading wikidata we need to work through our articles for them to base their data on wikidata. Some of this is done without problems with using templates and going through the articles with bots. But we expect also there will be a need in several articles to make manual adjustment to get all correct (like factdata residing in the text portion that should now be taken from Wikdata)
But if we come this far, our articles will be perfect and we can produce a set of software, like templetes and modules that make implementation of these data in other language versions very easy
Another learning is that we actually will not be able to achieve our goal without the support of technical expertise (like knowledge in Lua, how to write datainterface to external dataproviders) . Right now we are discussing with our local chapter, if they can provide technical expertise when ours is not enough , we are after all wikipedians not tech wizards
And we are missing to have colleagues on other language versions to discuss with, it is very complex.
Anders
Gerard Meijssen skrev 2014-02-05 12:16:
Hoi,
At Wikidata the number of items and the associated data is growing steadily. We are dealing with the aftermath of some bots and to be honest, that is also very much the name of the game.
An example: many species have been added in the ceb nl sv Wikipedia and it would be wonderful if the "parent taxon" would be included [1] for all of them. This is now happening in a "one at a time" fashion.
What is also happening is new information that is added in Wikidata from external sources. I blogged about this [2] and in my opinion this is fabulous. What is so great is that any Wikipedia that includes "Wikidata search" to its extended search already benefits. Every community can choose to add stub articles based on the information in Wikidata.
In my opinion data that has some relevance can be included in Wikidata particularly when it is rich in statements and references to external sources. With great information in Wikidata, it becomes possible to use it to build even more extensive stub articles. Such things are starting to happen.
Bot created information is controversial in many Wikipedias. It is not in Wikidata. Very welcome is all the data that enriches the items we already know. Very welcome is the data on the things we do not yet know but appreciate as relevant. Thanks, GerardM
[1] http://ultimategerardm.blogspot.nl/2014/01/taxonomy- where-there-is-nothing.html [2] http://ultimategerardm.blogspot.nl/2014/02/wikidata- ntf4-human-gene.html
On 4 February 2014 09:31, Anders Wennersten mail@anderswennersten.se wrote:
Nemo has found this wiki which I find very interesting [1]. it contains
1,68 million articles and seems to be a copy of articles from Lithunian Wikipedia + some 1,5 million botgenerated articles, with focus on species (i know from Lsjbot that there are at least some 1,3 M articles of species to be found from reliable databases)
The effort seems to be done by just a few lithuanians wikipedians with the right technical skill and insight on wikipedia, they are probably active also on ltwp[2].
For me it is a reminder what will happen if we continue to be sceptical of botgerneration of articles with correct info with verfied sources. Creative people will do it anyway and then outside Wikpedia, which could make Wikipedia redundant in the same way Wikipedia has made the old paperbased encyclopedias redundant. The online encyclopedia with most knowledge to the readers will survive, and botgenerated verified articles contains more knowledge then no article on the subject. Also note that the most active now are languages like Vietnamese and Lithunian, with small communities all aware it will take eons of time if to expected these will be created manually
I do would like the movement and upcoming strategy to make a proactive stand re semiautomted articles
On sv:wp we have had this focus, since last august with including upload on wikidata as part of the articlegeneration. We have found the inclusion of Wikidata much more complex then we anticipated. We thought half a year would be enough to "get a set of items with proper 100% quality data into Wikidata", but we now think it will take something like two years for just a small set of 10000 articles :( This have not changed our belief in this approach, but we would certainly appreciate it there were other entities doing the same and with whom we could exchange experience (or a central initiative)
Anders
[1] Start page http://lietuvai.lt/wiki/Pagrindinis_puslapis Latest changes http://lietuvai.lt/wiki/Specialus:Naujausi_puslapiai For random article press Atsitiktinis puslapis http://lietuvai.lt/wiki/ Specialus:Atsitiktinis_puslapis/Straipsnis [2] ltwp https://lt.wikipedia.org/wiki/Pagrindinis_puslapis _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org