The William and Flora Hewlett Foundation has decided to generously
support us with $500,000 in operational funding. More information in
the press release:
http://wikimediafoundation.org/wiki/Press_releases/Hewlwett_Fdn_grant_Augus…
This is not a project-based grant like the Stanton and Ford
initiatives, but designed to advance the mission of the Wikimedia
Foundation as a whole. We're very grateful to Hewlett for their
support! :-) It comes out of Hewlett's Open Educational Resources
initiative, which is an acknowledgment that Wikimedia is an important
part of the OER movement.
Big thanks to Sara Crouse for her work on the proposal that led to this grant.
--
Erik Möller
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
Wikimedians--
Sadly, due to some technical and planning issues, we are postponing the
implementation and testing of the re-designed donation button until
after Wikimania. While we had hoped to conduct the test starting Monday,
we're just not perfectly happy with the implementation and testing
process yet.
The new time line can be seen here:
http://meta.wikimedia.org/wiki/Fundraising_2009/Donation_buttons_upgrade
Thanks.
-Rand
--
Rand Montoya
Head of Community Giving
Wikimedia Foundation
www.wikimedia.org
Email: rand(a)wikimedia.org
Phone: 415.839.6885 x615
Fax: 415.882.0495
Cell: 510.685.7030
“At some future time, I hope to have something witty,
intelligent, or funny in this space.”
While the time and effort that went into Robert Rohde's analysis is
certainly extensive, the outcomes are based on so many flawed assumptions
about the nature of vandalism and vandalism reversion, publicize at one's
peril the key "finding" of a 0.4% vandalism rate.
http://en.wikipedia.org/w/index.php?title=John_McCain&diff=169808394&oldid=…
11 hours
Reverted with no tags.
http://en.wikipedia.org/w/index.php?title=Maria_Cantwell&diff=prev&oldid=16…
46 days
Reverted with note: "Undid revision 160400298 by 75.133.82.218"
By the way, there was a two-minute vandalism in the interim, so in many
cases, just because an analyst finds a "recent and short" incident, he or
she may be completely missing a longer-term incident.
http://en.wikipedia.org/w/index.php?title=Ted_Stevens&diff=prev&oldid=17085…
There goes your "rvv" theory. In this case, "rvv" was a flag for even more
preposterous vandalism.
The notion that these are lightly-watched or lightly-edited articles is a
bit difficult to swallow, since they are the biographical articles about
three United States senators. These articles were analyzed by an
independent team of volunteers, and we found that the 100 senatorial
articles were in deliberate disrepair about 6.8% of the time, which would
vastly differ from Rohde's analysis. Certainly, one could argue that
articles about political figures may be vandalized more often, but one might
also counter that argument with the assumption that "more eyes" ought to be
watching these articles and repairing them. More detail here:
http://www.mywikibiz.com/Wikipedia_Vandalism_Study
Admittedly, there were some minor flaws with our study's methodology, too.
These are reviewed on the Discussion page. But, as with Rohde's assessment,
if anything, we may have underrepresented the problem at 6.8%.
I remain unimpressed with Wikipedia's accuracy rate, and I am bewildered why
"flagged revisions" have not been implemented yet.
Greg
---------- Forwarded message ----------
From: Reid Priedhorsky <reid(a)umn.edu>
Date: Thu, Aug 20, 2009 at 9:58 AM
Subject: Re: [Wiki-research-l] [Foundation-l] How much of Wikipedia
is vandalized? 0.4% of Articles
To: wiki-research-l(a)lists.wikimedia.org
On 08/20/2009 11:34 AM, Gregory Maxwell wrote:
> On Thu, Aug 20, 2009 at 6:06 AM, Robert Rohde<rarohde(a)gmail.com> wrote:
> [snip]
>> When one downloads a dump file, what percentage of the pages are
>> actually in a vandalized state?
>
> Although you don't actually answer that question, you answer a
> different question:
>
> [snip]
>> approximations: I considered that "vandalism" is that thing which
>> gets reverted, and that "reverts" are those edits tagged with "revert,
>> rv, undo, undid, etc." in the edit summary line. Obviously, not all
>> vandalism is cleanly reverted, and not all reverts are cleanly tagged.
>
> Which is interesting too, but part of the problem with calling this a
> measure of vandalism is that it isn't really, and we don't really have
> a good handle on how solid an approximation it is beyond gut feelings
> and arm-waving.
We looked into this a couple of years ago and came up with a similar
number (though I won't quote it because I don't quite remember what it
was), though we estimated the probability that a viewer would encounter
a damaged article rather than how many articles were currently damaged.
We used the term "damaged" instead of "vandalized" for essentially the
reasons you mention (though I confess I didn't fully read your whole
letter).
Priedhorsky et al., GROUP 2007.
Reid
-----
I'm forwarding Reid's message from Wikiresearch-l to Foundation-l
because, for those interested, it's worth noting that their group
looked into this question & published a paper in 2007. Here's the
link:
http://www.grouplens.org/node/113
-- phoebe
I am supposed to be taking a wiki-vacation to finish my PhD thesis and
find a job for next year. However, this afternoon I decided to take a
break and consider an interesting question recently suggested to me by
someone else:
When one downloads a dump file, what percentage of the pages are
actually in a vandalized state?
This is equivalent to asking, if one chooses a random page from
Wikipedia right now, what is the probability of receiving a vandalized
revision?
Understanding what fraction of Wikipedia is vandalized at any given
instant is obviously of both practical and public relations interest.
In addition it bears on the motivation for certain development
projects like flagged revisions. So, I decided to generate a rough
estimate.
For the purposes of making an estimate I used the main namespace of
the English Wikipedia and adopted the following operational
approximations: I considered that "vandalism" is that thing which
gets reverted, and that "reverts" are those edits tagged with "revert,
rv, undo, undid, etc." in the edit summary line. Obviously, not all
vandalism is cleanly reverted, and not all reverts are cleanly tagged.
In addition, some things flagged as reverts aren't really addressing
what we would conventionally consider to be vandalism. Such caveats
notwithstanding, I have had some reasonable success with using a
revert heuristic in the past. With the right keywords one can easily
catch the standardized comments created by admin rollback, the undo
function, the revert bots, various editing tools, and commonly used
phrases like "rv", "rvv", etc. It won't be perfect, but it is a quick
way of getting an automated estimate. I would usually expect the
answer I get in this way to be correct within an order of magnitude,
and perhaps within a factor of a few, though it is still just a crude
estimate.
I analyzed the edit history up to the mid-June dump for a sample
29,999 main namespace pages (sampling from everything in main
including redirects). This included 1,333,829 edits, from which I
identified 102,926 episodes of reverted "vandalism". As a further
approximation, I assumed that whenever a revert occurred, it applied
to the immediately preceding edit and any additional consecutive
changes by the same editor (this is how admin rollback operates, but
is not necessarily true of tools like undo).
With those assumptions, I then used the timestamps on my identified
intervals of vandalism to figure out how much time each page had spent
in a vandalized state. Over the entire history of Wikipedia, this
sample of pages was vandalized during 0.28% of its existence. Or,
more relevantly, focusing on just this year vandalism was present
0.21% of the time, which suggests that one should expect 0.21% of
mainspace pages in any recent enwiki dump will be in a vandalized
state (i.e. 1 in 480).
(Note that since redirects represent 55% of the main namespace and are
rarely vandalized, one could argue that 0.37% [1 in 270] would be a
better estimate for the portion of actual articles that are in a
vandalized condition at any given moment.)
I also took a look at the time distribution of vandalism. Not
surprisingly, it has a very long tail. The median time to revert over
the entire history is 6.7 minutes, but the mean time to revert is 18.2
hours, and my sample included one revert going back 45 months (though
examples of such very long lags also imply the page had gone years
without any edits, which would imply an obscure topic that was also
almost never visited). In the recent period these factors becomes 5.2
minutes and 14.4 hours for the median and mean respectively. The
observation that nearly 50% of reverts are occurring in 5 minutes or
less is a testament to the efficient work of recent changes reviewers
and watchlists.
Unfortunately the 5% of vandalism that persists longer than 35 hours
is responsible for 90% of the actual vandalism a visitor is likely to
encounter at random. Hence, as one might guess, it is the vandalism
that slips through and persists the longest that has the largest
practical effect.
It is also worth noting that the prevalence figures for February-May
of this year are slightly lower than at any time since 2006. There is
also a drop in the mean duration of vandalism coupled to a slight
increase in the median duration. However, these effects mostly
disappear if we limit our considerations to only vandalism events
lasting 1 month or shorter. Hence those changes may be in significant
part linked to cut-off biasing from longer-term vandalism events that
have yet to be identified. The ambiguity in the change from earlier
in the year is somewhat surprising as the AbuseFilter was launched in
March and was intended to decrease the burden of vandalism. One might
speculate that the simple vandalism amenable to the AbuseFilter was
already being addressed quickly in nearly all cases and hence its
impact on the persistence of vandalism may already have been fairly
limited.
I've posted some summary data on the wiki at:
http://en.wikipedia.org/wiki/Wikipedia:Vandalism_statistics
Given the nature of the approximations I made in doing this analysis I
suspect it is more likely that I have somewhat underestimated the
vandalism problem rather than overestimated it, but as I said in the
beginning I'd like to believe I am in the right ballpark. If that's
true, I personally think that having less than 0.5% of Wikipedia be
vandalized at any given instant is actually rather comforting. It's
not a perfect number, but it would suggest that nearly everyone still
gets to see Wikipedia as intended rather than in a vandalized state.
(Though to be fair I didn't try to figure out if the vandalism
occurred in more frequently visited parts or not.)
Unfortunately, that's it for now as I need to get back to my thesis /
job search.
-Robert Rohde
Hoi,
For me while interesting, it is hardly new and therefore not that
interesting what people like Ed H Chi write about Wikipedia. They do not
write about Wikipedia, they write about the English language Wikipedia.
Invariably news written about Wikipedia concentrates on just one of over 260
projects. It diminishes what Wikipedia is about and it ignores important
things that are happening.
I would be interested in more study looking at the "other" wikipedias. This
is where all kinds of other phenomena exist.
Yesterday Siebrand observed that there is a group of languages that have
solid localisations and, the current localisation rally makes this group
stand out even more. We have the impression that this coincides with the
vitality of projects; German French Dutch are top performers in localisation
they have a healthy community and provide a great Wikipedia. For languages
like Spanish Turkish Swedish Italian it is still possible for people to take
part in the translatewiki.net localisation rally. People who participate on
languages like Estonian and Khmer find that they have to concentrate on
doing the most used and MediaWiki core messages first (our rationale being
that our Wikipedia readers are best served in this way.
With a sample size fof 260, it becomes possible to do research into the
effect of localisation and the performance of a project. As
LocalisationUpdate is being tested for use in the WMF, timely delivery of
localisations becomes a reality once it is implemented. This will give the
numbers of localisation and performance a much more direct relation with
each other... The question is, if someone is interested in the numbers
provided by such research..
It is known for languages like Bangla that Wikipedia is the biggest resource
in that language in that language, I can imagine that this is true for other
languages as well. When a Wikipedia has such a status, it changes the
relevance of that Wikipedia for scientists who study thea language. It is
interesting to learn what the effects are on the people who use the internet
in these languages. With Wikipedia being the biggest resource does this
populate the Google search results and, does this make the Internet more of
a worthwhile experience?
We know that things like sources, NPOV, BLP are particularly relevant on our
biggest projects. On our smaller projects these things do not get the same
attention. Here it is more important to have articles in the first place.
The make-up of these communities is likely to be utterly different as well.
Would it not be nice to understand how our projects are populated and study
how it evolves over time? At what stage all kinds of policies start to kick
in?
Research, the numbers they provide are important on many levels. They
indicate issues, they indicate where we want to put our resources. The lack
of research on the other Wikipedias make the other Wikipedias invisible,
issues particular to other languages do not get attention and consequently
resources needed to address issues are not available.
My argument is that there is a lack of research on Wikipedia, Wikipedia as a
whole would benefit from research and indeed where the English Wikipedia's
growth is slowing down, there is plenty of room for growth elsewhere of
standard encyclopaedic information in the other projects. This in turn will
bring up many subjects that en.wp does not cover. The existence of articles
on subjects not covered in en.wp are indicative of a bias and once en.wp
starts to cover these subjects it will improve its neutral point of view..
Consequently ALL our Wikipedias including en.wp will benefit from research
on the "other" Wikipedias.
Thanks,
GerardM
> Also a say 30% share of bot edits on some Wikipedia does not mean 30% of
> articles have been created by bots. My guess is that share is higher.
That was too rash. I simply don't know the actual amount, but there is no
linear relation for sure.
Let me rephrase that more safely:
If say only 1% of bots edits is for the creation of an article it might
still mean 10% or 20% of articles have been created by bots.
Erik Zachte
There is another way to detect 100% reverts. It won't catch manual reverts
that are not 100 accurate but most vandal patrollers will use undo, and the
like.
For every revision calculate md5 checksum of content. Then you can easily
look back say 100 revisions to see whether this checksum occurred earlier.
It is efficient and unambiguous.
This will work for any Wikipedia for which a full archive dump is available.
Erik Zachte
This seems like an amazing chance for WikiProjects in almost any area.
Describe how your work supports open education, set a project with
milestones and metrics for success, and submit a grant request:
http://blogs.talis.com/education/incubator/guidelines/
SJ
---------- Forwarded message ----------
From: Brianna Laugher <brianna.laugher(a)gmail.com>
Date: Wed, Aug 19, 2009 at 10:08 PM
Subject: [Internal-l] Talis Incubator for Open Education funding available
To: "Local Chapters, board and officers coordination (closed
subscription)" <internal-l(a)lists.wikimedia.org>, chapters(a)wikimedia.ch
Via the Creative Commons blog - http://creativecommons.org/weblog/entry/17005
Talis Incubator for Open Education
For the latest news follow us on twitter: @talisincubator
Talis understands the growing importance of the Open Education
movement and its potential impact on how education is accessed,
assessed and certified.
Aimed at individuals or small groups, the Talis Incubator for Open
Education provides angel funding and other forms of assistance for
ideas and projects that have the potential to further the cause of
Open Education through the use of technology. All we ask in return is
that you donate or ‘open source’ the intellectual property generated
back to the communities that could benefit most from your work.
The brief
1. Write a proposal outlining your Open Education related project
or idea, making a bid for funding of between £1,000 and £15,000.
2. After reviewing and making sure your proposal meets the
guidelines, submit it to incubator(a)talis.com.
3. A proposal review board made up of independent thought leaders
and Talis representatives decide which projects get funding.
4. For successful bids, Talis awards you the funds and organises
any other help you have asked for.
5. Complete the project according to the schedule outlined in your proposal.
6. Talis helps you to make sure your work is disseminated amongst
the community.
from http://blogs.talis.com/education/incubator/
they also note "We also welcome applications from outside the UK,
however we regret that we can only consider and award amounts in GBP
(£), so if you are from outside the UK please account for exchange
rate fluctuation, and make sure you can receive funds paid in GBP."
Looks a bit interesting!
cheers
Brianna
--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/
_______________________________________________
Internal-l mailing list
Internal-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/internal-l