Shilad -- I was on the verge of parsing a recent dump using my Exploratory Parsing tools.
I'd be interested in duplicating your work to see if I could get the same numbers. Are
you willing to share the criteria you have for citations, even if messy? Thanks and best
regards. -- Ward
On Apr 22, 2012, at 7:15 PM, Shilad Sen wrote:
Greetings!
I'm a CS Professor at Macalester College in St. Paul and I'm on research
sabbatical at GroupLens this year. I've been working with Heather Ford and Dave
Musicant to explore several research questions related to citation use on Wikipedia.
We're still in the middle of analyzing data, and working through parsing lots of
messy forms of citation references. However, I'll summarize our findings as they
stand.
As of Jan 1, 2011 there are 6384425 total citations in the main namespace for English
Wikipedia.
Our top-line research questions focus on citations containing URLs, so we broke down our
results into citations with a URL (78%) and those without (22%).
The top 5 domains in citations with a URL are:
1.
books.google.com (73777 - 1.48%)
2. news.bbc.co.uk (52347 - 1.05%)
3.
www.stat.gov.pl (51598 - 1.03%)
4.
www.nytimes.com (39454 - 0.79%)
5.
www.imdb.com (24993 - 0.50%)
The top 5 types of citations without a URL are:
1. cite book (190090 - 13.65%)
2. citation needed (148339 - 10.65%)
3. cite journal (63722 - 4.58%)
4. cite news (25052 - 1.80%)
5. citation (22773 - 1.64%)
We have also looked at the *inequality* in citation domains. In other words, what share
of citations do the most popular domains receive? Citation inequality has been steadily
growing; the Gini coefficient grew from 0.63 in Jan 2007 to 0.81 in Nov 2011.
We hope to write up our results to share them formally in the not-too-distant future.
Until then, I'm happy to answer questions!
-Shilad
On Sun, Apr 22, 2012 at 3:30 PM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:
Joe: that's the same question the alt-metrics people were getting at
in the paper I posted earlier... does being cited in WP give you a
measurable citations boost? Does the same boost carry over even if the
work is only in print or behind a paywall vs open access? Or, does it
have an effect at all on citations (versus *viewings*, where the
additional exposure in WP must make a difference) since most published
scholarship depends on larger lit reviews than are typically done in
WP -- and there is a larger filter effect at work among the body of
already published literature that may have a stronger effect on what
gets cited and what doesn't?
From the "building an encyclopedia"
perspective, it's in Wikipedia's
interest to cite general works, famous
works, and review papers/works,
which already are more likely to be more highly cited than average
research papers. So one possibility is any citation boost from
citations in WP just reinforces existing citation trends.
-- phoebe
On Sat, Apr 21, 2012 at 11:01 AM, Joe Corneli <holtzermann17(a)gmail.com> wrote:
Another interesting question (that would take a
broader scope) would
look at the frequency of *downstream* citations to works that are
cited in Wikipedia versus all other citations.
Per usual, correlation would definitely not be causation, BUT my usual
practice is
1. Google
2. Wikipedia
3. Read some of the papers cited
4. (A miracle occurs)
5. Write my own paper
Interesting to wonder how many of the papers read at step 3 survive
the semantic leap to step 5.
On Sat, Apr 21, 2012 at 5:52 PM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:
Thank you everyone! Grouplens folks, if you could
send a link to your
work too, that would be awesome.
What I'm curious about: if we can give (within an order of magnitude,
say) an approximation of how many sources are cited within Wikipedia
-- then maybe broken out into references to printed works and
references to online-only, etc. What does our project look like viewed
as an ad-hoc catalog of scholarship? How does that compare to the
major databases? (It's going to be a tiny, tiny percentage of the
total scholarship in the world -- Pubmed has 21M records, Worldcat
around 246M -- but how tiny?) This may only be answerable if someone
creates a wikicite project :)
thanks,
phoebe
On Sat, Apr 21, 2012 at 7:52 AM, Paolo Massa <paolo(a)gnuband.org> wrote:
I know of this paper
"Scientific citations in Wikipedia" by Finn Årup Nielsen
First Monday, volume 12, number 8 (August 2007),
URL:
http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to
scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers <phoebe.wiki(a)gmail.com> wrote:
> Hi all,
>
> Has there been any research done into: the number of citations (e.g.
> to books, journal articles, online sources, everything together) on
> Wikipedia (any language, or all)? The distribution of citations over
> different kinds or qualities of articles? # of uses of citation
> templates? Anything like this?
>
> I realize this is hard to count, averages are meaningless in this
> context, and any number will no doubt be imprecise! But anything would
> be helpful. I have vague memories of seeing some citation studies like
> this but don't remember the details.
>
> Thanks,
> -- phoebe
>
> --
> * I use this address for lists; send personal messages to phoebe.ayers
> <at>
gmail.com *
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
* I use this address for lists; send personal messages to phoebe.ayers
<at>
gmail.com *
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Shilad W. Sen
Assistant Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
ssen(a)macalester.edu
651-696-6273
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l