Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
I talked about something like this with Liam that GLAM was trying to develop and utilize to inform it's work.
Liam - sorry to call you out like this, but any thoughts on Phoebe's question?
On Fri, Apr 20, 2012 at 5:31 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I've been collaborating with the group lens folks on citations. They've done basic statistics of sources of cites etc across .en. Will ask them about sending.
Sent from my iPhone
On Apr 20, 2012, at 11:16 AM, Jessie Wild jwild@wikimedia.org wrote:
I talked about something like this with Liam that GLAM was trying to develop and utilize to inform it's work.
Liam - sorry to call you out like this, but any thoughts on Phoebe's question?
On Fri, Apr 20, 2012 at 5:31 PM, phoebe ayers phoebe.wiki@gmail.com wrote: Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Jessie Wild Global Development, Manager Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I did a search on "citation*" on our Wikipedia lit review site. I'm not sure any of the articles we've identified so far answers your question, but the search returns a list that could contain some interesting articles: http://wikilit.referata.com/w/index.php?title=Special:Search&redirs=1&am....
~ Chitu
Heather Ford a écrit :
I've been collaborating with the group lens folks on citations. They've done basic statistics of sources of cites etc across .en. Will ask them about sending.
Sent from my iPhone
On Apr 20, 2012, at 11:16 AM, Jessie Wild <jwild@wikimedia.org mailto:jwild@wikimedia.org> wrote:
I talked about something like this with Liam that GLAM was trying to develop and utilize to inform it's work.
Liam - sorry to call you out like this, but any thoughts on Phoebe's question?
On Fri, Apr 20, 2012 at 5:31 PM, phoebe ayers <phoebe.wiki@gmail.com mailto:phoebe.wiki@gmail.com> wrote:
Hi all, Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this? I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details. Thanks, -- phoebe -- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com <http://gmail.com> * _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- /Jessie Wild Global Development, Manager Wikimedia Foundation /
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org mailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I know of this paper "Scientific citations in Wikipedia" by Finn Årup Nielsen First Monday, volume 12, number 8 (August 2007), URL: http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Thank you everyone! Grouplens folks, if you could send a link to your work too, that would be awesome.
What I'm curious about: if we can give (within an order of magnitude, say) an approximation of how many sources are cited within Wikipedia -- then maybe broken out into references to printed works and references to online-only, etc. What does our project look like viewed as an ad-hoc catalog of scholarship? How does that compare to the major databases? (It's going to be a tiny, tiny percentage of the total scholarship in the world -- Pubmed has 21M records, Worldcat around 246M -- but how tiny?) This may only be answerable if someone creates a wikicite project :)
thanks, phoebe
On Sat, Apr 21, 2012 at 7:52 AM, Paolo Massa paolo@gnuband.org wrote:
I know of this paper "Scientific citations in Wikipedia" by Finn Årup Nielsen First Monday, volume 12, number 8 (August 2007), URL: http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Another interesting question (that would take a broader scope) would look at the frequency of *downstream* citations to works that are cited in Wikipedia versus all other citations.
Per usual, correlation would definitely not be causation, BUT my usual practice is
1. Google 2. Wikipedia 3. Read some of the papers cited 4. (A miracle occurs) 5. Write my own paper
Interesting to wonder how many of the papers read at step 3 survive the semantic leap to step 5.
On Sat, Apr 21, 2012 at 5:52 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Thank you everyone! Grouplens folks, if you could send a link to your work too, that would be awesome.
What I'm curious about: if we can give (within an order of magnitude, say) an approximation of how many sources are cited within Wikipedia -- then maybe broken out into references to printed works and references to online-only, etc. What does our project look like viewed as an ad-hoc catalog of scholarship? How does that compare to the major databases? (It's going to be a tiny, tiny percentage of the total scholarship in the world -- Pubmed has 21M records, Worldcat around 246M -- but how tiny?) This may only be answerable if someone creates a wikicite project :)
thanks, phoebe
On Sat, Apr 21, 2012 at 7:52 AM, Paolo Massa paolo@gnuband.org wrote:
I know of this paper "Scientific citations in Wikipedia" by Finn Årup Nielsen First Monday, volume 12, number 8 (August 2007), URL: http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Joe: that's the same question the alt-metrics people were getting at in the paper I posted earlier... does being cited in WP give you a measurable citations boost? Does the same boost carry over even if the work is only in print or behind a paywall vs open access? Or, does it have an effect at all on citations (versus *viewings*, where the additional exposure in WP must make a difference) since most published scholarship depends on larger lit reviews than are typically done in WP -- and there is a larger filter effect at work among the body of already published literature that may have a stronger effect on what gets cited and what doesn't?
From the "building an encyclopedia" perspective, it's in Wikipedia's
interest to cite general works, famous works, and review papers/works, which already are more likely to be more highly cited than average research papers. So one possibility is any citation boost from citations in WP just reinforces existing citation trends.
-- phoebe
On Sat, Apr 21, 2012 at 11:01 AM, Joe Corneli holtzermann17@gmail.com wrote:
Another interesting question (that would take a broader scope) would look at the frequency of *downstream* citations to works that are cited in Wikipedia versus all other citations.
Per usual, correlation would definitely not be causation, BUT my usual practice is
- Wikipedia
- Read some of the papers cited
- (A miracle occurs)
- Write my own paper
Interesting to wonder how many of the papers read at step 3 survive the semantic leap to step 5.
On Sat, Apr 21, 2012 at 5:52 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Thank you everyone! Grouplens folks, if you could send a link to your work too, that would be awesome.
What I'm curious about: if we can give (within an order of magnitude, say) an approximation of how many sources are cited within Wikipedia -- then maybe broken out into references to printed works and references to online-only, etc. What does our project look like viewed as an ad-hoc catalog of scholarship? How does that compare to the major databases? (It's going to be a tiny, tiny percentage of the total scholarship in the world -- Pubmed has 21M records, Worldcat around 246M -- but how tiny?) This may only be answerable if someone creates a wikicite project :)
thanks, phoebe
On Sat, Apr 21, 2012 at 7:52 AM, Paolo Massa paolo@gnuband.org wrote:
I know of this paper "Scientific citations in Wikipedia" by Finn Årup Nielsen First Monday, volume 12, number 8 (August 2007), URL: http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Greetings!
I'm a CS Professor at Macalester College in St. Paul and I'm on research sabbatical at GroupLens this year. I've been working with Heather Ford and Dave Musicant to explore several research questions related to citation use on Wikipedia.
We're still in the middle of analyzing data, and working through parsing lots of messy forms of citation references. However, I'll summarize our findings as they stand.
As of Jan 1, 2011 there are 6384425 total citations in the main namespace for English Wikipedia.
Our top-line research questions focus on citations containing URLs, so we broke down our results into citations with a URL (78%) and those without (22%).
The top 5 domains in citations with a URL are: 1. books.google.com (73777 - 1.48%) 2. news.bbc.co.uk (52347 - 1.05%) 3. www.stat.gov.pl (51598 - 1.03%) 4. www.nytimes.com (39454 - 0.79%) 5. www.imdb.com (24993 - 0.50%)
The top 5 types of citations without a URL are: 1. cite book (190090 - 13.65%) 2. citation needed (148339 - 10.65%) 3. cite journal (63722 - 4.58%) 4. cite news (25052 - 1.80%) 5. citation (22773 - 1.64%)
We have also looked at the *inequality* in citation domains. In other words, what share of citations do the most popular domains receive? Citation inequality has been steadily growing; the Gini coefficient grew from 0.63 in Jan 2007 to 0.81 in Nov 2011.
We hope to write up our results to share them formally in the not-too-distant future. Until then, I'm happy to answer questions!
-Shilad
On Sun, Apr 22, 2012 at 3:30 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Joe: that's the same question the alt-metrics people were getting at in the paper I posted earlier... does being cited in WP give you a measurable citations boost? Does the same boost carry over even if the work is only in print or behind a paywall vs open access? Or, does it have an effect at all on citations (versus *viewings*, where the additional exposure in WP must make a difference) since most published scholarship depends on larger lit reviews than are typically done in WP -- and there is a larger filter effect at work among the body of already published literature that may have a stronger effect on what gets cited and what doesn't?
From the "building an encyclopedia" perspective, it's in Wikipedia's interest to cite general works, famous works, and review papers/works, which already are more likely to be more highly cited than average research papers. So one possibility is any citation boost from citations in WP just reinforces existing citation trends.
-- phoebe
On Sat, Apr 21, 2012 at 11:01 AM, Joe Corneli holtzermann17@gmail.com wrote:
Another interesting question (that would take a broader scope) would look at the frequency of *downstream* citations to works that are cited in Wikipedia versus all other citations.
Per usual, correlation would definitely not be causation, BUT my usual practice is
- Wikipedia
- Read some of the papers cited
- (A miracle occurs)
- Write my own paper
Interesting to wonder how many of the papers read at step 3 survive the semantic leap to step 5.
On Sat, Apr 21, 2012 at 5:52 PM, phoebe ayers phoebe.wiki@gmail.com
wrote:
Thank you everyone! Grouplens folks, if you could send a link to your work too, that would be awesome.
What I'm curious about: if we can give (within an order of magnitude, say) an approximation of how many sources are cited within Wikipedia -- then maybe broken out into references to printed works and references to online-only, etc. What does our project look like viewed as an ad-hoc catalog of scholarship? How does that compare to the major databases? (It's going to be a tiny, tiny percentage of the total scholarship in the world -- Pubmed has 21M records, Worldcat around 246M -- but how tiny?) This may only be answerable if someone creates a wikicite project :)
thanks, phoebe
On Sat, Apr 21, 2012 at 7:52 AM, Paolo Massa paolo@gnuband.org wrote:
I know of this paper "Scientific citations in Wikipedia" by Finn Årup Nielsen First Monday, volume 12, number 8 (August 2007), URL: http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers phoebe.wiki@gmail.com
wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Shilad -- I was on the verge of parsing a recent dump using my Exploratory Parsing tools. I'd be interested in duplicating your work to see if I could get the same numbers. Are you willing to share the criteria you have for citations, even if messy? Thanks and best regards. -- Ward
On Apr 22, 2012, at 7:15 PM, Shilad Sen wrote:
Greetings!
I'm a CS Professor at Macalester College in St. Paul and I'm on research sabbatical at GroupLens this year. I've been working with Heather Ford and Dave Musicant to explore several research questions related to citation use on Wikipedia.
We're still in the middle of analyzing data, and working through parsing lots of messy forms of citation references. However, I'll summarize our findings as they stand.
As of Jan 1, 2011 there are 6384425 total citations in the main namespace for English Wikipedia.
Our top-line research questions focus on citations containing URLs, so we broke down our results into citations with a URL (78%) and those without (22%).
The top 5 domains in citations with a URL are:
- books.google.com (73777 - 1.48%)
- news.bbc.co.uk (52347 - 1.05%)
- www.stat.gov.pl (51598 - 1.03%)
- www.nytimes.com (39454 - 0.79%)
- www.imdb.com (24993 - 0.50%)
The top 5 types of citations without a URL are:
- cite book (190090 - 13.65%)
- citation needed (148339 - 10.65%)
- cite journal (63722 - 4.58%)
- cite news (25052 - 1.80%)
- citation (22773 - 1.64%)
We have also looked at the *inequality* in citation domains. In other words, what share of citations do the most popular domains receive? Citation inequality has been steadily growing; the Gini coefficient grew from 0.63 in Jan 2007 to 0.81 in Nov 2011.
We hope to write up our results to share them formally in the not-too-distant future. Until then, I'm happy to answer questions!
-Shilad
On Sun, Apr 22, 2012 at 3:30 PM, phoebe ayers phoebe.wiki@gmail.com wrote: Joe: that's the same question the alt-metrics people were getting at in the paper I posted earlier... does being cited in WP give you a measurable citations boost? Does the same boost carry over even if the work is only in print or behind a paywall vs open access? Or, does it have an effect at all on citations (versus *viewings*, where the additional exposure in WP must make a difference) since most published scholarship depends on larger lit reviews than are typically done in WP -- and there is a larger filter effect at work among the body of already published literature that may have a stronger effect on what gets cited and what doesn't?
From the "building an encyclopedia" perspective, it's in Wikipedia's
interest to cite general works, famous works, and review papers/works, which already are more likely to be more highly cited than average research papers. So one possibility is any citation boost from citations in WP just reinforces existing citation trends.
-- phoebe
On Sat, Apr 21, 2012 at 11:01 AM, Joe Corneli holtzermann17@gmail.com wrote:
Another interesting question (that would take a broader scope) would look at the frequency of *downstream* citations to works that are cited in Wikipedia versus all other citations.
Per usual, correlation would definitely not be causation, BUT my usual practice is
- Wikipedia
- Read some of the papers cited
- (A miracle occurs)
- Write my own paper
Interesting to wonder how many of the papers read at step 3 survive the semantic leap to step 5.
On Sat, Apr 21, 2012 at 5:52 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Thank you everyone! Grouplens folks, if you could send a link to your work too, that would be awesome.
What I'm curious about: if we can give (within an order of magnitude, say) an approximation of how many sources are cited within Wikipedia -- then maybe broken out into references to printed works and references to online-only, etc. What does our project look like viewed as an ad-hoc catalog of scholarship? How does that compare to the major databases? (It's going to be a tiny, tiny percentage of the total scholarship in the world -- Pubmed has 21M records, Worldcat around 246M -- but how tiny?) This may only be answerable if someone creates a wikicite project :)
thanks, phoebe
On Sat, Apr 21, 2012 at 7:52 AM, Paolo Massa paolo@gnuband.org wrote:
I know of this paper "Scientific citations in Wikipedia" by Finn Årup Nielsen First Monday, volume 12, number 8 (August 2007), URL: http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Shilad W. Sen Assistant Professor Mathematics, Statistics, and Computer Science Dept. Macalester College ssen@macalester.edu 651-696-6273 _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Sure! At a high level, our process for matching citations is:
1. Look for any template starting whose name begins with "cite" or "citation". We search through at most one nested level of templates.
2. For each "<ref>" tag: * For each wiki links and url inside the ref tag: * Add any links that haven't already been added as part of a citation template.
We spot checked the procedure on a sample of randomly-selected pages, and it does miss some things that appear to be citations to humans. For example, a wiki list of urls under the "References" section not in <ref> tags. But citations are messy, and we decided to draw the line at these few simple steps for now. If you have any suggestions for extra rules to add to the above procedure, I'd love to include them!
FYI, you can browse the current code at: http://code.google.com/p/wikipedia-map-reduce/source/browse/trunk/wikipedia-...
On Sun, Apr 22, 2012 at 10:04 PM, Ward Cunningham ward@c2.com wrote:
Shilad -- I was on the verge of parsing a recent dump using my Exploratory Parsing tools. I'd be interested in duplicating your work to see if I could get the same numbers. Are you willing to share the criteria you have for citations, even if messy? Thanks and best regards. -- Ward
On Apr 22, 2012, at 7:15 PM, Shilad Sen wrote:
Greetings!
I'm a CS Professor at Macalester College in St. Paul and I'm on research sabbatical at GroupLens this year. I've been working with Heather Ford and Dave Musicant to explore several research questions related to citation use on Wikipedia.
We're still in the middle of analyzing data, and working through parsing lots of messy forms of citation references. However, I'll summarize our findings as they stand.
As of Jan 1, 2011 there are 6384425 total citations in the main namespace for English Wikipedia.
Our top-line research questions focus on citations containing URLs, so we broke down our results into citations with a URL (78%) and those without (22%).
The top 5 domains in citations with a URL are:
- books.google.com (73777 - 1.48%)
- news.bbc.co.uk (52347 - 1.05%)
- www.stat.gov.pl (51598 - 1.03%)
- www.nytimes.com (39454 - 0.79%)
- www.imdb.com (24993 - 0.50%)
The top 5 types of citations without a URL are:
- cite book (190090 - 13.65%)
- citation needed (148339 - 10.65%)
- cite journal (63722 - 4.58%)
- cite news (25052 - 1.80%)
- citation (22773 - 1.64%)
We have also looked at the *inequality* in citation domains. In other words, what share of citations do the most popular domains receive? Citation inequality has been steadily growing; the Gini coefficient grew from 0.63 in Jan 2007 to 0.81 in Nov 2011.
We hope to write up our results to share them formally in the not-too-distant future. Until then, I'm happy to answer questions!
-Shilad
On Sun, Apr 22, 2012 at 3:30 PM, phoebe ayers phoebe.wiki@gmail.comwrote:
Joe: that's the same question the alt-metrics people were getting at in the paper I posted earlier... does being cited in WP give you a measurable citations boost? Does the same boost carry over even if the work is only in print or behind a paywall vs open access? Or, does it have an effect at all on citations (versus *viewings*, where the additional exposure in WP must make a difference) since most published scholarship depends on larger lit reviews than are typically done in WP -- and there is a larger filter effect at work among the body of already published literature that may have a stronger effect on what gets cited and what doesn't?
From the "building an encyclopedia" perspective, it's in Wikipedia's
interest to cite general works, famous works, and review papers/works, which already are more likely to be more highly cited than average research papers. So one possibility is any citation boost from citations in WP just reinforces existing citation trends.
-- phoebe
On Sat, Apr 21, 2012 at 11:01 AM, Joe Corneli holtzermann17@gmail.com wrote:
Another interesting question (that would take a broader scope) would look at the frequency of *downstream* citations to works that are cited in Wikipedia versus all other citations.
Per usual, correlation would definitely not be causation, BUT my usual practice is
- Wikipedia
- Read some of the papers cited
- (A miracle occurs)
- Write my own paper
Interesting to wonder how many of the papers read at step 3 survive the semantic leap to step 5.
On Sat, Apr 21, 2012 at 5:52 PM, phoebe ayers phoebe.wiki@gmail.com
wrote:
Thank you everyone! Grouplens folks, if you could send a link to your work too, that would be awesome.
What I'm curious about: if we can give (within an order of magnitude, say) an approximation of how many sources are cited within Wikipedia -- then maybe broken out into references to printed works and references to online-only, etc. What does our project look like viewed as an ad-hoc catalog of scholarship? How does that compare to the major databases? (It's going to be a tiny, tiny percentage of the total scholarship in the world -- Pubmed has 21M records, Worldcat around 246M -- but how tiny?) This may only be answerable if someone creates a wikicite project :)
thanks, phoebe
On Sat, Apr 21, 2012 at 7:52 AM, Paolo Massa paolo@gnuband.org
wrote:
I know of this paper "Scientific citations in Wikipedia" by Finn Årup Nielsen First Monday, volume 12, number 8 (August 2007), URL: http://firstmonday.org/issues/issue12_8/nielsen/index.html
but, as the title says, it took into account only citations to scientific journals.
On Fri, Apr 20, 2012 at 7:31 PM, phoebe ayers phoebe.wiki@gmail.com
wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything
would
be helpful. I have vague memories of seeing some citation studies
like
this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to
phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Shilad W. Sen Assistant Professor Mathematics, Statistics, and Computer Science Dept. Macalester College ssen@macalester.edu 651-696-6273 _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Shilad,
Very cool! Thanks for sharing. I do have a couple of questions...
On Sun, Apr 22, 2012 at 7:15 PM, Shilad Sen ssen@macalester.edu wrote:
Greetings!
I'm a CS Professor at Macalester College in St. Paul and I'm on research sabbatical at GroupLens this year. I've been working with Heather Ford and Dave Musicant to explore several research questions related to citation use on Wikipedia.
We're still in the middle of analyzing data, and working through parsing lots of messy forms of citation references. However, I'll summarize our findings as they stand.
As of Jan 1, 2011 there are 6384425 total citations in the main namespace for English Wikipedia.
Does this count both templated and non-templated citations? Do you count citations appearing in any area of the article (e.g. inline footnotes, "references" section, "further reading" or "bibliography" section, and "external links"?) Or is anything left out?
Our top-line research questions focus on citations containing URLs, so we broke down our results into citations with a URL (78%) and those without (22%).
The top 5 domains in citations with a URL are:
- books.google.com (73777 - 1.48%)
- news.bbc.co.uk (52347 - 1.05%)
- www.stat.gov.pl (51598 - 1.03%)
- www.nytimes.com (39454 - 0.79%)
- www.imdb.com (24993 - 0.50%)
This will probably be part of your published results, but it would be very interesting to see a long-tail list of these domains, and maybe try and break them out into types -- that would start to get at questions like how many paywalled journals are cited, etc.
The top 5 types of citations without a URL are:
- cite book (190090 - 13.65%)
- citation needed (148339 - 10.65%)
- cite journal (63722 - 4.58%)
- cite news (25052 - 1.80%)
- citation (22773 - 1.64%)
"Citation needed" is really the absence of a citation, not an actual citation, right? :) The others look like the standard reference templates.. so my question above about templates applies.
We have also looked at the *inequality* in citation domains. In other words, what share of citations do the most popular domains receive? Citation inequality has been steadily growing; the Gini coefficient grew from 0.63 in Jan 2007 to 0.81 in Nov 2011.
Interesting! Thanks so much for sharing!
-- phoebe
Dear Phoebe,
On 20-04-2012 19:31, phoebe ayers wrote:
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thank to Paolo Massa for mentioning the following First Monday paper in his answer to Phoebe: my "Scientific citations in Wikipedia" http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5271/pdf/imm5271.pdf
I got a follow-up on that on Wikimania - clustering the citations - reporting a bit on the longitudianal development. "Clustering of scientific citations in Wikipedia " http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5666/pdf/imm5666.pdf
The results are available on this page:
http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5666/pdf/imm5666.pdf
(Note that the server were the webserver is runnings is presently halfway crashed, so the results might not be accessible. Although the presently are for me - I will look into that in the coming days)
If anyone wants to look more closely into which journals are cited I got an XML-file with journal names and variations http://neuro.imm.dtu.dk/services/brededatabase/wojous.xml
Also there are further journals and naming variations in the Brede Wiki
The Wikimania slides is available from: http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5677/pdf/imm5677.pdf
What I noted (as seen on the slides) was that the count of 'cite journal' citations depended on the ref-tag. The GeneWiki bot added a lot of citations around 2008/2009 without the ref-tag and the number of citations was very much increased by that. I wonder if other forms of citations depend very much on bot-additions.
Furthermore:
There has been a few blog post on citations. Ed Summers did one:
http://inkdroid.org/journal/2010/08/25/top-hosts-referenced-in-wikipedia-par...
and I also did one looking on the 'cite news' template:
http://fnielsen.posterous.com/top-news-cites-referenced-from-wikipedia
Finn Årup Nielsen http://www.imm.dtu.dk/~fn/
Phoebe,
Stats about {{cite journal .. }} citations can be found at
I dont know if the parser/bot are 'free'. The bot approval is
https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/JL-Bot_7
On Sat, Apr 21, 2012 at 3:31 AM, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On this topic, will Wikipedia ever implement a 'citation tool'? I'm thinking something like Mendeley or EndNote for wiki text? It would be great to have citations more formally implemented within MW (via an extension).
Cheers, Dan.
On 20 April 2012 18:31, phoebe ayers phoebe.wiki@gmail.com wrote:
Hi all,
Has there been any research done into: the number of citations (e.g. to books, journal articles, online sources, everything together) on Wikipedia (any language, or all)? The distribution of citations over different kinds or qualities of articles? # of uses of citation templates? Anything like this?
I realize this is hard to count, averages are meaningless in this context, and any number will no doubt be imprecise! But anything would be helpful. I have vague memories of seeing some citation studies like this but don't remember the details.
Thanks, -- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org