Re: [Foundation-l] Another look a bot creation of articles

List overview All Threads
Download

newer

older

[Foundation-l] Wikipedia at World...

[Foundation-l] U.S. copyright...

Andrew Su

15 Jul 2008 15 Jul '08

4:38 p.m.

As the author of the paper in question, I thought I'd chime in my two cents here...

...

PLoS Biology is a recognized journal for biology research, but not for

...

wiki research. Their statements about the usefulness in wikis

...

of bot-generated stubs are not backed up by verifiable evidence.

Agreed, our intention was to create a resource for biologists, not to make any broader statements about wikis as a whole. Apologies if anyone felt we were overinterpreting our observations, but I felt all of our conclusions were supported by our analyses. As for the statements not being backed up by verifiable evidence, I (obviously) disagree. All of our figures and conclusions were derived from publicly-available sources (edit histories, page sources, etc.), and anyone who is interested would be able to reproduce our results.

...

Their statistic that 50% of edits landed in new articles doesn't

...

indicate quality or usefulness. It only says that carpet bombing

...

might sometimes hit a target.

Perhaps there is some misunderstanding here in what the article said? The 50% of edits refers to edits *subsequent* to our bot effort, not the bot effort itself. If there is still confusion, I'm happy to clarify in more detail.

...

Their work is interesting biology. But for wiki research, this

...

paper is merely of anecdotal interest. Maybe they are writing a

...

separate article focused on wikis? Are the authors coming to

...

Wikimania?

Great, then we succeeded in our goal of doing interesting biology. No, we have no plans to attend Wikimania or do another article on "wiki research", but that's mostly because it's not our field. If anyone has suggestions on how we might use our effort to comment on wiki research and would like to collaborate, we're certainly open to hearing more.

... and in response to a comment on another thread, it is a bit unfortunate that some headlines seem to indicate that this was a foundation-sponsored activity. But, alas, we don't write the headlines... (The title of the article we wrote is "A Gene Wiki for Community Annotation of Gene Function".)

Regards,

Andrew

Show replies by date

Lars Aronsson

19 Jul 19 Jul

7:43 a.m.

New subject: [Foundation-l] Another look a bot creation of articles

Andrew Su wrote:

...

...
Their statistic that 50% of edits landed in new articles doesn't indicate quality or usefulness. It only says that carpet bombing might sometimes hit a target.

Perhaps there is some misunderstanding here in what the article said? The 50% of edits refers to edits *subsequent* to our bot effort, not the bot effort itself. If there is still confusion, I'm happy to clarify in more detail.

Yes, I understand this is about the subsequent manual edits. My analogy with carpet bombing needs to be clarified. Suppose we have a country with some strategic targets that we want to hit. If we carpet bomb everything, we will hit those targets, but many bombs will also be dropped outside of the targets.

Now, in a growing wiki the country (the whole) is the knowledge that readers have, and which they could potentially write about. The strategic targets are the actual edits they will contribute, which is a lot smaller than the whole country. Planting a lot of stubs is carpet bombing, dropping stubs on various topics, hoping to find the topics of those future manual edits. The 50% number in your report means that 50% of those future edits (targets) were hit by the stub carpet bombing. But that number doesn't say anything of the precision of the carpet bombing. How many of the planted stubs failed to attract any manual edits?

That would be an interesting study, especially if you could repeat it with different size and quality of the stubs.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Andrew Su

23 Jul 23 Jul

6:35 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

Lars,

Perhaps you and I are the only ones on this list who are interested, but since I'm enjoying the discussion...

[snip]

...

Now, in a growing wiki the country (the whole) is the knowledge that readers have, and which they could potentially write about. The strategic targets are the actual edits they will contribute, which is a lot smaller than the whole country. Planting a lot of stubs is carpet bombing, dropping stubs on various topics, hoping to find the topics of those future manual edits. The 50% number in your report means that 50% of those future edits (targets) were hit by the stub carpet bombing. But that number doesn't say anything of the precision of the carpet bombing. How many of the planted stubs failed to attract any manual edits?

As of this moment, you're absolutely right, the vast majority of gene stubs have gone unedited. But I'm sure everyone here recognizes that there's a "critical mass" aspect to growth, and the recent publication is only the first step in this process. Also, it's worth noting that there are tens (hundreds?) of thousands of graduate students worldwide whose mission is to discover the function of a human gene or genes. I'm quite confident that over time, >95% of those gene pages will be edited. Unfortunately, there's no good way to predict precisely which gene pages will take off first, so we limited ourselves to the top ~9000 genes (ranked by # of citations). Ultimately, this was a threshold that EN WP was comfortable with, and I think a good tradeoff between short-term stagnant articles and long-term growth.

...

That would be an interesting study, especially if you could repeat it with different size and quality of the stubs.

While the scientist in me says that this type of controlled experiment would be interesting, practically speaking I don't think this is a realistic project. Why would any wiki community support purposely creating suboptimal stubs? The goal of this Gene Wiki effort (and I have to assume all bot-creation efforts) is to create the best stubs possible. And then it's up to the community to decide whether the benefit of creating those stubs (factoring in the potential for downstream manual edits) is worthwhile.

Cheers, -andrew

Ting Chen

24 Jul 24 Jul

7 a.m.

New subject: [Foundation-l] Another look a bot creation of articles

...

Perhaps you and I are the only ones on this list who are interested, but since I'm enjoying the discussion...

I have followed this discussion from the beginning. My opinion to the bot created article is if the articles thus created have content, then we should welcome it. I don't see the necessaty for a human being to put on all the informations if a mashine can do it either. If the informations are there, people can use it for free, if they got later edited or not is not that interesting. Our goal is not to have people editing articles, but to provide free information.

Ting.

-- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Gerard Meijssen

7:04 a.m.

New subject: [Foundation-l] Another look a bot creation of articles

Hoi, You may want to consider the scale of things ... when you are talking chemicals, proteins a number like 240 million articles can be expected. With such numbers you have to wonder to what extend Wikipedia can cope. Thanks, GerardM

On Thu, Jul 24, 2008 at 9:00 AM, Ting Chen Wing.Philopp@gmx.de wrote:

...

...
Perhaps you and I are the only ones on this list who are interested, but since I'm enjoying the discussion...

I have followed this discussion from the beginning. My opinion to the bot created article is if the articles thus created have content, then we should welcome it. I don't see the necessaty for a human being to put on all the informations if a mashine can do it either. If the informations are there, people can use it for free, if they got later edited or not is not that interesting. Our goal is not to have people editing articles, but to provide free information.

Ting.

Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Ting Chen

8:22 a.m.

New subject: [Foundation-l] Another look a bot creation of articles

...

Hoi, You may want to consider the scale of things ... when you are talking chemicals, proteins a number like 240 million articles can be expected. With such numbers you have to wonder to what extend Wikipedia can cope.

sure, and not to mention the milliards of galaxies out there that are surely slowly cataloged by the current and future telescopes. None the less they are informations. Naturally we can consider if it is not better to let the most of them in the professional database and only list a few of them, which had made to the press in the last years. On the other side, if someone is willing to write a bot to transfer these informations into wikipedia, I don't see the reason against his effort.

Ting

-- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Magnus Manske

8:44 a.m.

New subject: [Foundation-l] Another look a bot creation of articles

On Thu, Jul 24, 2008 at 8:04 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, You may want to consider the scale of things ... when you are talking chemicals, proteins a number like 240 million articles can be expected. With such numbers you have to wonder to what extend Wikipedia can cope.

IMHO the point is database vs. free-style text annotation.

It is reasonable to expect that every human gene, in the not-so-long run, will have loads of text annotation that doesn't fit well in a classic database; in fact, it will have a few data points and a lot of text. Remember, we're talking <25.000 genes in human. This is what a wiki is best at, and pre-creating articles for them that contain the bare facts is perfectly valid.

OTOH, millions of real/predicted/hypothetical molecules that will, for the most part, have nothing but a few numbers with them, would fit better in a "normal" database. That doesn't exclude the possibility of writing about some of these molecules on wikipedia when there's something to write about.

Magnus

Andrew Su

4:48 p.m.

New subject: [Foundation-l] [Junk released by User action] Re: Another look a bot creation of articles

On the issue of scientific data knowledge in Wikipedia versus "specialist databases"... Magnus emphasizes a great point that the unstructured nature of Wikipedia (free-text, figures, diagrams, etc.) is very complementary to the structured databases of existing resources. Consider these two links:

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&... ToSearch=5649

http://en.wikipedia.org/wiki/Reelin

Clearly they have different goals in what information to present and how to present it. I'd also say that Wikipedia has an advantage of presenting data to a more diverse audience, having information of interest to both lay-people and scientists.

-andrew

...

-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org [mailto:foundation-l- bounces@lists.wikimedia.org] On Behalf Of Magnus Manske Sent: Thursday, July 24, 2008 1:45 AM To: Wikimedia Foundation Mailing List Subject: [Junk released by User action] Re: [Foundation-l] Another

look a

...

bot creation of articles

On Thu, Jul 24, 2008 at 8:04 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, You may want to consider the scale of things ... when you are

talking

...

...
chemicals, proteins a number like 240 million articles can be

expected.

...

With

...
such numbers you have to wonder to what extend Wikipedia can cope.

IMHO the point is database vs. free-style text annotation.

It is reasonable to expect that every human gene, in the not-so-long run, will have loads of text annotation that doesn't fit well in a classic database; in fact, it will have a few data points and a lot of text. Remember, we're talking <25.000 genes in human. This is what a wiki is best at, and pre-creating articles for them that contain the bare facts is perfectly valid.

OTOH, millions of real/predicted/hypothetical molecules that will, for the most part, have nothing but a few numbers with them, would fit better in a "normal" database. That doesn't exclude the possibility of writing about some of these molecules on wikipedia when there's something to write about.

Magnus

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Chad

12:22 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

On Thu, Jul 24, 2008 at 3:04 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, You may want to consider the scale of things ... when you are talking chemicals, proteins a number like 240 million articles can be expected. With such numbers you have to wonder to what extend Wikipedia can cope. Thanks, GerardM

I couldn't agree more. My major complaint to mass creation of articles by bots is the simple problem of maintainability.

Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to about 705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

Now granted, not every article is being updated and maintained on a daily (or even weekly or monthly) basis. However, those articles _still_ need a helpful eye kept on them. Vandalism and libel are still very much a part of the projects, and without someone to keep an eye on things, it degenerates rather quickly. Antivandalism bots can only help so much.

Personally, I don't have the time in the day to sit there and revert vandalism on 240 million articles, nor do many others, I would gather.

-Chad

Ting Chen

1:21 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

...

I couldn't agree more. My major complaint to mass creation of articles by bots is the simple problem of maintainability.

I personnally don't think that most of articles need to be intensively maintained. Vandals tend to attack articles that are already hot, and these articles are watched carefully already. Vandals the other sort, who launch mass attack indiscriminally are also relatively easy to be detected and handled.

I believe most of the say bot created gene oder molecule articles would simply stay there, much of them would probably no more be touched. They don't need maintainance.

But if they have content and information, I feel comfortable if they are there. It often happens that I read about something in a magazine like Scientific American, I would look in Wikipedia after that something to get more information, and often I would then follow the links there to read more.

...

Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to about 705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

If we reverse this logic, we must put up a policy, that if our dedicated contributors doesn't increase, we must at some point stopp allow people create new articles, because we cannot monitor them all. Personnally I dislike the ideal that my job on Wikipedia is to monitor articles.

Ting

-- GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion! http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/616919...

Chad

1:30 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

On Thu, Jul 24, 2008 at 9:21 AM, Ting Chen Wing.Philopp@gmx.de wrote:

...

...
I couldn't agree more. My major complaint to mass creation of articles by bots is the simple problem of maintainability.

I personnally don't think that most of articles need to be intensively maintained. Vandals tend to attack articles that are already hot, and these articles are watched carefully already. Vandals the other sort, who launch mass attack indiscriminally are also relatively easy to be detected and handled.

I believe most of the say bot created gene oder molecule articles would simply stay there, much of them would probably no more be touched. They don't need maintainance.

But if they have content and information, I feel comfortable if they are there. It often happens that I read about something in a magazine like Scientific American, I would look in Wikipedia after that something to get more information, and often I would then follow the links there to read more.

...
Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to about 705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

If we reverse this logic, we must put up a policy, that if our dedicated contributors doesn't increase, we must at some point stopp allow people create new articles, because we cannot monitor them all. Personnally I dislike the ideal that my job on Wikipedia is to monitor articles.

Ting

GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion! http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/616919...

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

I never said to stop the number of articles if our userbase doesn't increase. The idea is that the userbase (hopefully) increases proportionally to the number of articles. By normal human creation, this more or less happens. When you add a bunch of artificially created articles, that fails to happen.

And to say that vandalism doesn't happen on low-viewed articles is patently wrong. While "intense maintenance" may be a bit extreme, they at _least_ need someone to look over them once and awhile to make sure that someone hasn't screwed with it.

-Chad

Andrew Su

4:57 p.m.

New subject: [Foundation-l] [Junk released by User action] Re: Another look a bot creation of articles

...

-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org [mailto:foundation-l- bounces@lists.wikimedia.org] On Behalf Of Chad Sent: Thursday, July 24, 2008 6:30 AM To: Wikimedia Foundation Mailing List Subject: [Junk released by User action] Re: [Foundation-l] Another

look a

...

bot creation of articles

[snip]

...

I never said to stop the number of articles if our userbase doesn't increase. The idea is that the userbase (hopefully) increases proportionally to the number of articles. By normal human creation,

this

...

more or less happens. When you add a bunch of artificially created articles, that fails to happen.

Chad, I'd be interested in hearing more about this. Has there been a systematic study of this? If this is a conjecture, then I'd suggest that it's possible that if you create really *useful* stubs, then that will actually increase your user base. The way I think of it, there are thousands of active biologists who had no interest in Wikipedia before because there wasn't anything there that they were passionate about contributing to. Now there is, and we hope that's reason for them to become contributors. Of course, that's conjecture too, but we can (will) do a retrospective analysis later to try to quantify that effect.

...

And to say that vandalism doesn't happen on low-viewed articles is patently wrong. While "intense maintenance" may be a bit extreme, they at _least_ need someone to look over them once and awhile to make sure that someone hasn't

screwed

...

with it.

As of now, I manage to maintain an eye on all these articles without too much difficulty. For example, this link makes it quite simple:

http://en.wikipedia.org/w/index.php?title=Special:RecentChangesLinked&hi deminor=0&target=Template%3AGNF+Protein+box&showlinkedto=1

Right now, there's a few dozen edits a day, and most of them by established contributors I've come to recognize. It's actually quite easy to do a spot-check of new/anonymous edits. Of course, I hope that the number edits quickly grows beyond what I as one person can "oversee". In that case though, it will likely mean that the WP:MCB community has also grown, and there will effectively be a "Gene Wiki new page patrol" that can share this responsibility.

-andrew

Gerard Meijssen

1:53 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

Hoi, I am involved in a project where protein and chemical information is of relevance. I disagree strongly that information on proteins is static. I also doubt that we have the capacity to ensure that the information is correct. When you are going to allow for this information how will you ensure that there will be no original research in there ?

I wish I felt as comfortable about these types of information in Wikipedia.. When people believe this information to be true and base there actions on it, what does this do to our liability.. In the presentation of professor Bill Wedemeyer http://wm08reg.wikimedia.org/schedule/speakers/51.en.htmlat Wikimania this idea was considered and rejected. He is with me of the opinion that this is best left to specialist databases and that a specific subset of proteins may be of interest, I hope his presentation will be online soon. Thanks, GerardM

On Thu, Jul 24, 2008 at 3:21 PM, Ting Chen Wing.Philopp@gmx.de wrote:

...

...
I couldn't agree more. My major complaint to mass creation of articles by bots is the simple problem of maintainability.

I personnally don't think that most of articles need to be intensively maintained. Vandals tend to attack articles that are already hot, and these articles are watched carefully already. Vandals the other sort, who launch mass attack indiscriminally are also relatively easy to be detected and handled.

I believe most of the say bot created gene oder molecule articles would simply stay there, much of them would probably no more be touched. They don't need maintainance.

But if they have content and information, I feel comfortable if they are there. It often happens that I read about something in a magazine like Scientific American, I would look in Wikipedia after that something to get more information, and often I would then follow the links there to read more.

...
Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to about 705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

If we reverse this logic, we must put up a policy, that if our dedicated contributors doesn't increase, we must at some point stopp allow people create new articles, because we cannot monitor them all. Personnally I dislike the ideal that my job on Wikipedia is to monitor articles.

Ting

GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion!

http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/616919...

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

John Vandenberg

1:31 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

On Thu, Jul 24, 2008 at 10:22 PM, Chad innocentkiller@gmail.com wrote:

...

On Thu, Jul 24, 2008 at 3:04 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, You may want to consider the scale of things ... when you are talking chemicals, proteins a number like 240 million articles can be expected. With such numbers you have to wonder to what extend Wikipedia can cope. Thanks, GerardM

I couldn't agree more. My major complaint to mass creation of articles by bots is the simple problem of maintainability.

Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to about 705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

Now granted, not every article is being updated and maintained on a daily (or even weekly or monthly) basis. However, those articles _still_ need a helpful eye kept on them. Vandalism and libel are still very much a part of the projects, and without someone to keep an eye on things, it degenerates rather quickly. Antivandalism bots can only help so much.

Personally, I don't have the time in the day to sit there and revert vandalism on 240 million articles, nor do many others, I would gather.

More articles does not make the vandals more prolific, or more adept and agile. There will still be the same percentage of the little buggers, and their methods will not alter much.

If anything, more accessible knowledge will mean there are less people that become vandals.

-- John

Andrew Su

5:04 p.m.

New subject: [Foundation-l] [Junk released by User action] Re: Another look a bot creation of articles

...

-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org [mailto:foundation-l- bounces@lists.wikimedia.org] On Behalf Of Chad Sent: Thursday, July 24, 2008 5:23 AM To: Wikimedia Foundation Mailing List Subject: [Junk released by User action] Re: [Foundation-l] Another

look a

...

bot creation of articles

[snip]

...

Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to about 705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

Last point I wanted to bring up. Yes, the few thousand "dedicated contributors" are very important to article growth. But so are the hundreds of thousands (millions?) of infrequent contributors, the people who make individually small but collectively large contributions. From our article (http://dx.doi.org/10.1371/journal.pbio.0060175):

"A recent study found that the number of contributions from new editors (less than 100 total edits) in total equals the number of contributions from the most established editors (greater than 10,000 edits) [7], illustrating the collective importance of the Long Tail."

Of course, this doesn't argue that we should maintain a page on every chemical compound (which by definition is infinite). But I think it suggests that bot article creation on the scale of a few thousand will not substantially increase maintenance burden or decrease quality.

-andrew

[7] Kittur A, Chi EH, Pendleton BA, Suh B, Mytkowicz T (2007) Power of the few vs. wisdom of the crowd, Wikipedia and the rise of the bourgeoisie. 25th Annual ACM Confernce on Human Factors in Computing Systems (CHI 2007). 28 April-3 May 2007; San Jose, California, United States.

Gerard Meijssen

8:53 p.m.

New subject: [Foundation-l] [Junk released by User action] Re: Another look a bot creation of articles

Hoi, A few thousand articles is perfectly ok and will create no problems.. But what will the boundaries be.. How do you restrict to which few thousand articles? Once bots start creating articles it makes no difference to create 2.000 or 20.000 or 200.000 or 2.000.000 or 20.000.000 articles... The difference on the impact on the Wikipedia community is however profound.

Without some clear ideas what we are talking about and what the criteria for inclusion will be, I would advice the English Wikipedia to think really hard if this is what they want and what they can absorb. Thanks. GerardM

On Thu, Jul 24, 2008 at 7:04 PM, Andrew Su asu@gnf.org wrote:

...

...
-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org [mailto:foundation-l- bounces@lists.wikimedia.org] On Behalf Of Chad Sent: Thursday, July 24, 2008 5:23 AM To: Wikimedia Foundation Mailing List Subject: [Junk released by User action] Re: [Foundation-l] Another

look a

...
bot creation of articles

[snip]

...
Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to about 705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

Last point I wanted to bring up. Yes, the few thousand "dedicated contributors" are very important to article growth. But so are the hundreds of thousands (millions?) of infrequent contributors, the people who make individually small but collectively large contributions. From our article (http://dx.doi.org/10.1371/journal.pbio.0060175):

"A recent study found that the number of contributions from new editors (less than 100 total edits) in total equals the number of contributions from the most established editors (greater than 10,000 edits) [7], illustrating the collective importance of the Long Tail."

Of course, this doesn't argue that we should maintain a page on every chemical compound (which by definition is infinite). But I think it suggests that bot article creation on the scale of a few thousand will not substantially increase maintenance burden or decrease quality.

-andrew

[7] Kittur A, Chi EH, Pendleton BA, Suh B, Mytkowicz T (2007) Power of the few vs. wisdom of the crowd, Wikipedia and the rise of the bourgeoisie. 25th Annual ACM Confernce on Human Factors in Computing Systems (CHI 2007). 28 April-3 May 2007; San Jose, California, United States.

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Andrew Su

10:18 p.m.

New subject: [Foundation-l] Another looka bot creation of articles

Absolutely agreed, the real numbers do make a difference. Ultimately though I'm not sure how hard and fast you could make the acceptance criteria. I think it would be a complex weighting between the number of articles, the content in each article, the size of the existing user community, the size of the community of new editors which you hope to attract, etc. Ultimately, I believe that weighting should be done by humans (rather than by comparing to some rigid rule set), and that it's up to each Wikipedia's governing bodies to decide what is right for them.

Speaking as someone who has gone through the bot approval process at the English Wikipedia, I was quite happy with how it turned out. We got some great suggestions from experienced users, we reached consensus on what the appropriate trial run and full run would look like, and ultimately I think everyone was satisfied with the process and the result. For context, here is the archived discussion:

http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Protei nBoxBot

-andrew

...

-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org [mailto:foundation-l- bounces@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: Thursday, July 24, 2008 1:54 PM To: Wikimedia Foundation Mailing List Subject: Re: [Foundation-l] [Junk released by User action] Re: Another looka bot creation of articles

Hoi, A few thousand articles is perfectly ok and will create no problems..

But

...

what will the boundaries be.. How do you restrict to which few

thousand

...

articles? Once bots start creating articles it makes no difference to create 2.000 or 20.000 or 200.000 or 2.000.000 or 20.000.000 articles... The difference on the impact on the Wikipedia community is however

profound.

...

Without some clear ideas what we are talking about and what the

criteria

...

for inclusion will be, I would advice the English Wikipedia to think

really

...

hard if this is what they want and what they can absorb. Thanks. GerardM

On Thu, Jul 24, 2008 at 7:04 PM, Andrew Su asu@gnf.org wrote:

...
...
-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org

[mailto:foundation-l-

...

...
...
bounces@lists.wikimedia.org] On Behalf Of Chad Sent: Thursday, July 24, 2008 5:23 AM To: Wikimedia Foundation Mailing List Subject: [Junk released by User action] Re: [Foundation-l] Another

look a

...
bot creation of articles

[snip]

...
Assuming the English Wikipedia has (more or less) a few thousand dedicated contributors (let's say 3500), that approximates to

about

...

...
...
705 articles per person. Now, balloon that number up to 4 million articles, and you now have 1142 articles per person.

Last point I wanted to bring up. Yes, the few thousand "dedicated contributors" are very important to article growth. But so are the hundreds of thousands (millions?) of infrequent contributors, the

people

...

...
who make individually small but collectively large contributions.

From

...

...
our article (http://dx.doi.org/10.1371/journal.pbio.0060175):

"A recent study found that the number of contributions from new

editors

...

...
(less than 100 total edits) in total equals the number of

contributions

...

...
from the most established editors (greater than 10,000 edits) [7], illustrating the collective importance of the Long Tail."

Of course, this doesn't argue that we should maintain a page on

every

...

...
chemical compound (which by definition is infinite). But I think it suggests that bot article creation on the scale of a few thousand

will

...

...
not substantially increase maintenance burden or decrease quality.

-andrew

[7] Kittur A, Chi EH, Pendleton BA, Suh B, Mytkowicz T (2007) Power

...

...
the few vs. wisdom of the crowd, Wikipedia and the rise of the bourgeoisie. 25th Annual ACM Confernce on Human Factors in Computing Systems (CHI 2007). 28 April-3 May 2007; San Jose, California,

United

...

...
States.

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/foundation-l

...

...

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Waerth

25 Jul 25 Jul

7:11 p.m.

New subject: [Foundation-l] Missed opportunity: NASA AND INTERNET ARCHIVE LAUNCH CENTRALIZED RESOURCE FOR IMAGES

I just received this press release from NASA. Since NASA images are mostly PD to my knowledhe we missed an opportunity here:

July 24, 2008

David E. Steitz Headquarters, Washington 202-358-1730 david.steitz@nasa.gov

Paul Hickman Internet Archive 415-462-1509, 415-561-6767 paul@archive.org

RELEASE: 08-173

NASA AND INTERNET ARCHIVE LAUNCH CENTRALIZED RESOURCE FOR IMAGES

WASHINGTON -- NASA and Internet Archive, a non-profit digital library based in San Francisco, made available the most comprehensive compilation ever of NASA's vast collection of photographs, historic film and video Thursday. Located at www.nasaimages.org, the Internet site combines for the first time 21 major NASA imagery collections into a single, searchable online resource. A link to the Web site will appear on the http://www.nasa.gov home page.

The Web site launch is the first step in a five-year partnership that will add millions of images and thousands of hours of video and audio content, with enhanced search and viewing capabilities, and new user features on a continuing basis. Over time, integration of www.nasaimages.org with http://www.nasa.gov will become more seamless and comprehensive.

"This partnership with Internet Archive enables NASA to provide the American public with access to its vast collection of imagery from one searchable source, unlocking a new treasure trove of discoveries for students, historians, enthusiasts and researchers," said NASA Deputy Administrator Shana Dale. "This new resource also will enable the agency to digitize and preserve historical content now not available on the Internet for future generations."

Through a competitive process, NASA selected Internet Archive to manage the NASA Images Web site under a non-exclusive Space Act agreement, signed in July 2007. The five-year project is at no cost to the taxpayer and the images are free to the public.

"NASA's media is an incredibly important and valuable national asset. It is a tremendous honor for the Internet Archive to be NASA's partner in this project," says Brewster Kahle, founder of Internet Archive. "We are excited to mark this first step in a long-term collaboration to create a rich and growing public resource."

The content of the Web site covers all the diverse activities of America's space program, including imagery from the Apollo moon missions, Hubble Space Telescope views of the universe and experimental aircraft past and present. Keyword searching is available with easy-to-use resources for teachers and students.

Internet Archive is developing the NASA Images project using software donated by Luna Imaging Inc. of Los Angeles and with the generous support of the Kahle-Austin Foundation of San Francisco.

For more information about NASA and agency programs, visit:

http://www.nasa.gov

Waerth

http://fi.ndit.at

http://www.archive.org

Andrew Gray

7:23 p.m.

New subject: [Foundation-l] Missed opportunity: NASA AND INTERNET ARCHIVE LAUNCH CENTRALIZED RESOURCE FOR IMAGES

2008/7/25 Waerth waerth@asianet.co.th:

...

film and video Thursday. Located at www.nasaimages.org, the Internet site combines for the first time 21 major NASA imagery collections into a single, searchable online resource. A link to the Web site will appear on the http://www.nasa.gov home page.

I may be a bit confused, but I thought this is a niche that was already filled by the (sadly under-advertised) NIX:

http://nix.nasa.gov/

-- - Andrew Gray andrew.gray@dunelm.org.uk

Joe Szilagyi

7:53 p.m.

New subject: [Foundation-l] Missed opportunity: NASA AND INTERNET ARCHIVE LAUNCH CENTRALIZED RESOURCE FOR IMAGES

Since it says there is no cost to taxpayers, I'm guessing that IA probably is footing all costs and did most of the heavy lifting here. Were we even in the running for this on the Foundation side?

Joe

On Fri, Jul 25, 2008 at 12:11 PM, Waerth waerth@asianet.co.th wrote:

...

I just received this press release from NASA. Since NASA images are mostly PD to my knowledhe we missed an opportunity here:

NASA AND INTERNET ARCHIVE LAUNCH CENTRALIZED RESOURCE FOR IMAGES

WASHINGTON -- NASA and Internet Archive, a non-profit digital library based in San Francisco, made available the most comprehensive compilation ever of NASA's vast collection of photographs, historic film and video Thursday. Located at www.nasaimages.org, the Internet site combines for the first time 21 major NASA imagery collections into a single, searchable online resource. A link to the Web site will appear on the http://www.nasa.gov home page.

...

snip

Mathias Schindler

26 Jul 26 Jul

7:39 a.m.

New subject: [Foundation-l] Missed opportunity: NASA AND INTERNET ARCHIVE LAUNCH CENTRALIZED RESOURCE FOR IMAGES

On Fri, Jul 25, 2008 at 10:11 PM, Waerth waerth@asianet.co.th wrote:

...

I just received this press release from NASA. Since NASA images are mostly PD to my knowledhe we missed an opportunity here:

Since the images are most likely PD, this is not a missed opportunity but an opportunity: Transfer everything with a clear license status to Commons.

Mathias

John Vandenberg

9:02 a.m.

New subject: [Foundation-l] Missed opportunity: NASA AND INTERNET ARCHIVE LAUNCH CENTRALIZED RESOURCE FOR IMAGES

On Sat, Jul 26, 2008 at 5:39 PM, Mathias Schindler mathias.schindler@gmail.com wrote:

...

On Fri, Jul 25, 2008 at 10:11 PM, Waerth waerth@asianet.co.th wrote:

...
I just received this press release from NASA. Since NASA images are mostly PD to my knowledhe we missed an opportunity here:

Since the images are most likely PD, this is not a missed opportunity but an opportunity: Transfer everything with a clear license status to Commons.

Mathias

We wouldnt know where to put them, and possibly couldnt even upload some of the content with our 20MB upload limit. IA, on the other hand, will barely notice this additional stream of PD content.

-- John

geni

24 Jul 24 Jul

3:44 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

2008/7/24 Gerard Meijssen gerard.meijssen@gmail.com:

...

Hoi, You may want to consider the scale of things ... when you are talking chemicals, proteins a number like 240 million articles can be expected. With such numbers you have to wonder to what extend Wikipedia can cope. Thanks, GerardM

Which is why you don't have for the most part articles on individual chemicals but instead the chemical families. Proteins rather than one article per protein you stick all the various species analogues in one article.

-- geni

Gerard Meijssen

3:59 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

Hoi, Possibly but we are talking about bot generation right ? Thanks, GerardM

On Thu, Jul 24, 2008 at 5:44 PM, geni geniice@gmail.com wrote:

...

2008/7/24 Gerard Meijssen gerard.meijssen@gmail.com:

...
Hoi, You may want to consider the scale of things ... when you are talking chemicals, proteins a number like 240 million articles can be expected.

With

...
such numbers you have to wonder to what extend Wikipedia can cope. Thanks, GerardM

Which is why you don't have for the most part articles on individual chemicals but instead the chemical families. Proteins rather than one article per protein you stick all the various species analogues in one article.

-- geni

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

geni

4:34 p.m.

New subject: [Foundation-l] Another look a bot creation of articles

2008/7/24 Gerard Meijssen gerard.meijssen@gmail.com:

...

Hoi, Possibly but we are talking about bot generation right ? Thanks, GerardM

Shouldn't be that much of a challenge for bot generation. There are a number of chemical databases around where you can draw the carbon backbone and ask for derivatives.

-- geni

5976

Age (days ago)

5987

Last active (days ago)

wikimedia-l@lists.wikimedia.org

24 comments

12 participants

tags (0)

participants (12)

Andrew Gray
Andrew Su
Chad
geni
Gerard Meijssen
Joe Szilagyi
John Vandenberg
Lars Aronsson
Magnus Manske
Mathias Schindler
Ting Chen
Waerth