Hi everyone,
We are a research group conducting a systematic literature review on Wikipedia-related peer-reviewed academic studies published in the English language. (Although there are many excellent studies in other languages, we unfortunately do not have the resources to systematically review these at any kind of acceptable scholarly level. Also, our study is about Wikipedia only, not about other Wikimedia Foundation projects. However, we do include studies about other language Wikipedias, as long as the studies are published in English.) We have completed a search using many major databases of scholarly research. In a separate thread, we will also talk about research questions related to our review.
As of the end of November 2010, when we stopped searching, we had identified over 2,100 peer-reviewed studies that have "wikipedia", "wikipedian", or "wikipedians" in their title, abstract or keywords. As this number of studies is far too large for conducting a review synthesis, we have decided to focus only on peer-reviewed journal publications and doctoral theses; we identified 625 such studies. In addition, we identified around 1,500 peer-reviewed conference articles; we will discuss these in a separate thread.
In addition to the scholarly databases that we searched, we have very carefully compared the lists of studies from the following Wikimedia pages to verify what we may have missed: * http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia * http://meta.wikimedia.org/wiki/Wiki_Research_Bibliography * http://en.wikipedia.org/wiki/Academic_studies_about_Wikipedia * http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_in_research * http://meta.wikimedia.org/wiki/Research
From these pages, we identified an additional 13 journal articles and 3 doctoral theses that we had not previously identified. These were either articles published after November 2010, articles in journals indexed in very few scholarly databases, a few European journals, and doctoral theses from outside North America. After adding these, we have identified a total of 638 publications, of which 610 journal articles and 28 doctoral theses. (However, as we begin to read these, we will remove some from our lists if we find that they are really not about Wikipedia.)
We have now updated the following page with the peer-reviewed journal articles and doctoral theses we have identified: http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia. Please note that we have only updated the sections on peer-reviewed journal articles and on theses; we have not updated other sections with newly identified studies, except for correcting some misclassified items.
To help us in identifying all eligible studies, we would really appreciate it if you could look at the sections on peer-reviewed journal articles and theses in http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia, and send us any citations (by yourself or others) that you know are missing. In particular, please inform us of: * Doctoral theses conducted outside North America * Peer-reviewed articles in journals not well indexed by North American databases * Peer-reviewed journal articles and doctoral theses published or accepted and forthcoming after November 2010.
Thanks for your help.
Chitu Okoli, Concordia University, Montreal, Canada (http://chitu.okoli.org/professional/open-content/wikipedia-and-open-content....) Arto Lanamäki, University of Agder, Kristiansand, Norway Mohamad Mehdi, Concordia University, Montreal, Canada Mostafa Mesgari, Concordia University, Montreal, Canada
Hi there,
Great project; massive but will be much appreciated. We did something similar for empirical studies of Open Source, recently accepted at ACM Computing Surveys (PDF pre-print available here [1], article not in print until 2012 (!! that's another email entirely, bah))
I recognize the need to cut down the number of articles for review, we reviewed around 600 and that was a multi-year effort. We did that mainly by excluding conceptual (hence empirical) or passing reference articles (ie we did a two-step filter on many more articles), but were forced to only do journal articles for updates during the (long) revision process. I regret that necessity, it decreases the utility of the work.
Given the publication venues of choice for many academics in this community I do wonder if you aren't shooting yourself in the foot by excluding peer-reviewed conferences and restricting to journals. Personally I'd rather read a review that included the top journals and top conferences than one that included all journals. Or even rather read a review over a shorter time period that included publications over journals and conferences, or on more specified topics. The interesting question is, "what do we know about wikipedia" not "what did we publish in journals about wikipedia". In particular you will find you have systematically excluded the contribution of HCI authors.
Given the commendable and massive effort you are providing (and your approach to coverage below is really interesting), getting that wrong at the outset seems a shame.
Best regards, James Howison
[1] Crowston, K., Wei, K., Howison, J., and Wiggins, A. (2012). Free (libre) open source software development: What we know and what we do not know. ACM Computing Surveys, 44(2): http://floss.syr.edu/content/freelibre-open-source-software-development-what...
On Mar 14, 2011, at 13:58, Chitu Okoli wrote:
Hi everyone,
We are a research group conducting a systematic literature review on Wikipedia-related peer-reviewed academic studies published in the English language. (Although there are many excellent studies in other languages, we unfortunately do not have the resources to systematically review these at any kind of acceptable scholarly level. Also, our study is about Wikipedia only, not about other Wikimedia Foundation projects. However, we do include studies about other language Wikipedias, as long as the studies are published in English.) We have completed a search using many major databases of scholarly research. In a separate thread, we will also talk about research questions related to our review.
As of the end of November 2010, when we stopped searching, we had identified over 2,100 peer-reviewed studies that have "wikipedia", "wikipedian", or "wikipedians" in their title, abstract or keywords. As this number of studies is far too large for conducting a review synthesis, we have decided to focus only on peer-reviewed journal publications and doctoral theses; we identified 625 such studies. In addition, we identified around 1,500 peer-reviewed conference articles; we will discuss these in a separate thread.
In addition to the scholarly databases that we searched, we have very carefully compared the lists of studies from the following Wikimedia pages to verify what we may have missed:
- http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia
- http://meta.wikimedia.org/wiki/Wiki_Research_Bibliography
- http://en.wikipedia.org/wiki/Academic_studies_about_Wikipedia
- http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_in_research
- http://meta.wikimedia.org/wiki/Research
From these pages, we identified an additional 13 journal articles and 3 doctoral theses that we had not previously identified. These were either articles published after November 2010, articles in journals indexed in very few scholarly databases, a few European journals, and doctoral theses from outside North America. After adding these, we have identified a total of 638 publications, of which 610 journal articles and 28 doctoral theses. (However, as we begin to read these, we will remove some from our lists if we find that they are really not about Wikipedia.)
We have now updated the following page with the peer-reviewed journal articles and doctoral theses we have identified: http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia. Please note that we have only updated the sections on peer-reviewed journal articles and on theses; we have not updated other sections with newly identified studies, except for correcting some misclassified items.
To help us in identifying all eligible studies, we would really appreciate it if you could look at the sections on peer-reviewed journal articles and theses in http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia, and send us any citations (by yourself or others) that you know are missing. In particular, please inform us of:
- Doctoral theses conducted outside North America
- Peer-reviewed articles in journals not well indexed by North American databases
- Peer-reviewed journal articles and doctoral theses published or accepted and forthcoming after November 2010.
Thanks for your help.
Chitu Okoli, Concordia University, Montreal, Canada (http://chitu.okoli.org/professional/open-content/wikipedia-and-open-content....) Arto Lanamäki, University of Agder, Kristiansand, Norway Mohamad Mehdi, Concordia University, Montreal, Canada Mostafa Mesgari, Concordia University, Montreal, Canada
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
As an HCI/CS researcher who has published at top peer-reviewed conferences about Wikipedia, but not journals, I'd like to echo James' statements. Journals are not the norm in CS/HCI research. Knowledge is shared through conferences, not journals.
On 3/14/11 11:32 AM, James Howison wrote:
Hi there,
Great project; massive but will be much appreciated. We did something similar for empirical studies of Open Source, recently accepted at ACM Computing Surveys (PDF pre-print available here [1], article not in print until 2012 (!! that's another email entirely, bah))
I recognize the need to cut down the number of articles for review, we reviewed around 600 and that was a multi-year effort. We did that mainly by excluding conceptual (hence empirical) or passing reference articles (ie we did a two-step filter on many more articles), but were forced to only do journal articles for updates during the (long) revision process. I regret that necessity, it decreases the utility of the work.
Given the publication venues of choice for many academics in this community I do wonder if you aren't shooting yourself in the foot by excluding peer-reviewed conferences and restricting to journals. Personally I'd rather read a review that included the top journals and top conferences than one that included all journals. Or even rather read a review over a shorter time period that included publications over journals and conferences, or on more specified topics. The interesting question is, "what do we know about wikipedia" not "what did we publish in journals about wikipedia". In particular you will find you have systematically excluded the contribution of HCI authors.
Given the commendable and massive effort you are providing (and your approach to coverage below is really interesting), getting that wrong at the outset seems a shame.
Best regards, James Howison
[1] Crowston, K., Wei, K., Howison, J., and Wiggins, A. (2012). Free (libre) open source software development: What we know and what we do not know. ACM Computing Surveys, 44(2): http://floss.syr.edu/content/freelibre-open-source-software-development-what...
On Mar 14, 2011, at 13:58, Chitu Okoli wrote:
Hi everyone,
We are a research group conducting a systematic literature review on Wikipedia-related peer-reviewed academic studies published in the English language. (Although there are many excellent studies in other languages, we unfortunately do not have the resources to systematically review these at any kind of acceptable scholarly level. Also, our study is about Wikipedia only, not about other Wikimedia Foundation projects. However, we do include studies about other language Wikipedias, as long as the studies are published in English.) We have completed a search using many major databases of scholarly research. In a separate thread, we will also talk about research questions related to our review.
As of the end of November 2010, when we stopped searching, we had identified over 2,100 peer-reviewed studies that have "wikipedia", "wikipedian", or "wikipedians" in their title, abstract or keywords. As this number of studies is far too large for conducting a review synthesis, we have decided to focus only on peer-reviewed journal publications and doctoral theses; we identified 625 such studies. In addition, we identified around 1,500 peer-reviewed conference articles; we will discuss these in a separate thread.
In addition to the scholarly databases that we searched, we have very carefully compared the lists of studies from the following Wikimedia pages to verify what we may have missed:
- http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia
- http://meta.wikimedia.org/wiki/Wiki_Research_Bibliography
- http://en.wikipedia.org/wiki/Academic_studies_about_Wikipedia
- http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_in_research
- http://meta.wikimedia.org/wiki/Research
From these pages, we identified an additional 13 journal articles and 3 doctoral theses that we had not previously identified. These were either articles published after November 2010, articles in journals indexed in very few scholarly databases, a few European journals, and doctoral theses from outside North America. After adding these, we have identified a total of 638 publications, of which 610 journal articles and 28 doctoral theses. (However, as we begin to read these, we will remove some from our lists if we find that they are really not about Wikipedia.)
We have now updated the following page with the peer-reviewed journal articles and doctoral theses we have identified: http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia. Please note that we have only updated the sections on peer-reviewed journal articles and on theses; we have not updated other sections with newly identified studies, except for correcting some misclassified items.
To help us in identifying all eligible studies, we would really appreciate it if you could look at the sections on peer-reviewed journal articles and theses in http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia, and send us any citations (by yourself or others) that you know are missing. In particular, please inform us of:
- Doctoral theses conducted outside North America
- Peer-reviewed articles in journals not well indexed by North American databases
- Peer-reviewed journal articles and doctoral theses published or accepted and forthcoming after November 2010.
Thanks for your help.
Chitu Okoli, Concordia University, Montreal, Canada (http://chitu.okoli.org/professional/open-content/wikipedia-and-open-content....) Arto Lanamäki, University of Agder, Kristiansand, Norway Mohamad Mehdi, Concordia University, Montreal, Canada Mostafa Mesgari, Concordia University, Montreal, Canada
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
We considered including top-tier conferences, but the question is, what is a "top conference"? In trying to answer this, we looked at a couple of sources: * Top Tier and 2nd tier conferences from http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html * A-ranked conferences in Information and Computing Sciences from http://lamp.infosys.deakin.edu.au/era/?page=cforsel10 * We also considered including all WikiSym articles on Wikipedia
We identified which of the 1,500 conference papers from http://en.wikipedia.org/wiki/User:Moudy83/conference_papers were "top conferences" by those definitions, and we found over 400. On top of our 600 journal articles and doctoral theses, we think 1,000 papers is just too much for us to handle.
If we could somehow narrow it down to 100 relevant conference papers, we could add that in, but no more. However, how do we select which conferences are "must includes" while unfortunately leaving out the rest? We just don't know how to do this in a non-arbitrary, objective manner that would truly identify the top 100 conference papers on Wikipedia that contribute to scholarly knowledge.
Any ideas on how to do this would be very much appreciated.
Regards, Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Request to verify articles for Wikipedia literature review De : Travis Kriplean travis@cs.washington.edu Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : 14/03/2011 3:46 PM
As an HCI/CS researcher who has published at top peer-reviewed conferences about Wikipedia, but not journals, I'd like to echo James' statements. Journals are not the norm in CS/HCI research. Knowledge is shared through conferences, not journals.
On 3/14/11 11:32 AM, James Howison wrote:
Hi there,
Great project; massive but will be much appreciated. We did something similar for empirical studies of Open Source, recently accepted at ACM Computing Surveys (PDF pre-print available here [1], article not in print until 2012 (!! that's another email entirely, bah))
I recognize the need to cut down the number of articles for review, we reviewed around 600 and that was a multi-year effort. We did that mainly by excluding conceptual (hence empirical) or passing reference articles (ie we did a two-step filter on many more articles), but were forced to only do journal articles for updates during the (long) revision process. I regret that necessity, it decreases the utility of the work.
Given the publication venues of choice for many academics in this community I do wonder if you aren't shooting yourself in the foot by excluding peer-reviewed conferences and restricting to journals. Personally I'd rather read a review that included the top journals and top conferences than one that included all journals. Or even rather read a review over a shorter time period that included publications over journals and conferences, or on more specified topics. The interesting question is, "what do we know about wikipedia" not "what did we publish in journals about wikipedia". In particular you will find you have systematically excluded the contribution of HCI authors.
Given the commendable and massive effort you are providing (and your approach to coverage below is really interesting), getting that wrong at the outset seems a shame.
Best regards, James Howison
[1] Crowston, K., Wei, K., Howison, J., and Wiggins, A. (2012). Free (libre) open source software development: What we know and what we do not know. ACM Computing Surveys, 44(2): http://floss.syr.edu/content/freelibre-open-source-software-development-what...
On Mar 14, 2011, at 13:58, Chitu Okoli wrote:
Hi everyone,
We are a research group conducting a systematic literature review on Wikipedia-related peer-reviewed academic studies published in the English language. (Although there are many excellent studies in other languages, we unfortunately do not have the resources to systematically review these at any kind of acceptable scholarly level. Also, our study is about Wikipedia only, not about other Wikimedia Foundation projects. However, we do include studies about other language Wikipedias, as long as the studies are published in English.) We have completed a search using many major databases of scholarly research. In a separate thread, we will also talk about research questions related to our review.
Thanks for your help.
Chitu Okoli, Concordia University, Montreal, Canada (http://chitu.okoli.org/professional/open-content/wikipedia-and-open-content....) Arto Lanamäki, University of Agder, Kristiansand, Norway Mohamad Mehdi, Concordia University, Montreal, Canada Mostafa Mesgari, Concordia University, Montreal, Canada
I am a little sheepish; clearly you've really struggled with this, it's certainly a huge amount of papers.
I'm tempted to ask what happens if you cut by publication date, but I suspect that that doesn't help much because of the accelerating rate of publication. In any case not entirely sure of the justification for not including older things, it's not as though one stops knowing them :)
Ah, I know: randomize ;) Ok, that's not really in the spirit of a review article.
Have you considered cutting by some first quick pass characteristics, such as topic (using some framework relevant to your interests, we used Input-Process-Output for organizing studies of FLOSS)/empirical vs conceptual/perhaps even quant. vs qual. That is, of course, a lot of work just there but it seems to deal with the selection bias the best. That would also help give a conceptual focus to the review article.
To avoid the full selection bias of excluding conferences, perhaps you could include only those that are cited in your journal articles? (hmmm, issues there, but perhaps worth thinking about; could one seek out some variant of "the connected set" of articles, with some cutting factor on the strength of linkage to bring the number down to something managable?).
Adding people to your review team is another option, I'm sure you've thought about that. Difficulties there are obvious (a good review goes beyond 'tagging' articles and conducts cross-cutting conceptually organized perspective, hard to coordinate or build through disconnected work).
Best wishes for the work, James
On Mar 15, 2011, at 14:56, Chitu Okoli wrote:
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
We considered including top-tier conferences, but the question is, what is a "top conference"? In trying to answer this, we looked at a couple of sources:
- Top Tier and 2nd tier conferences from http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html
- A-ranked conferences in Information and Computing Sciences from http://lamp.infosys.deakin.edu.au/era/?page=cforsel10
- We also considered including all WikiSym articles on Wikipedia
We identified which of the 1,500 conference papers from http://en.wikipedia.org/wiki/User:Moudy83/conference_papers were "top conferences" by those definitions, and we found over 400. On top of our 600 journal articles and doctoral theses, we think 1,000 papers is just too much for us to handle.
If we could somehow narrow it down to 100 relevant conference papers, we could add that in, but no more. However, how do we select which conferences are "must includes" while unfortunately leaving out the rest? We just don't know how to do this in a non-arbitrary, objective manner that would truly identify the top 100 conference papers on Wikipedia that contribute to scholarly knowledge.
Any ideas on how to do this would be very much appreciated.
Regards, Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Request to verify articles for Wikipedia literature review De : Travis Kriplean travis@cs.washington.edu Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : 14/03/2011 3:46 PM
As an HCI/CS researcher who has published at top peer-reviewed conferences about Wikipedia, but not journals, I'd like to echo James' statements. Journals are not the norm in CS/HCI research. Knowledge is shared through conferences, not journals.
On 3/14/11 11:32 AM, James Howison wrote:
Hi there,
Great project; massive but will be much appreciated. We did something similar for empirical studies of Open Source, recently accepted at ACM Computing Surveys (PDF pre-print available here [1], article not in print until 2012 (!! that's another email entirely, bah))
I recognize the need to cut down the number of articles for review, we reviewed around 600 and that was a multi-year effort. We did that mainly by excluding conceptual (hence empirical) or passing reference articles (ie we did a two-step filter on many more articles), but were forced to only do journal articles for updates during the (long) revision process. I regret that necessity, it decreases the utility of the work.
Given the publication venues of choice for many academics in this community I do wonder if you aren't shooting yourself in the foot by excluding peer-reviewed conferences and restricting to journals. Personally I'd rather read a review that included the top journals and top conferences than one that included all journals. Or even rather read a review over a shorter time period that included publications over journals and conferences, or on more specified topics. The interesting question is, "what do we know about wikipedia" not "what did we publish in journals about wikipedia". In particular you will find you have systematically excluded the contribution of HCI authors.
Given the commendable and massive effort you are providing (and your approach to coverage below is really interesting), getting that wrong at the outset seems a shame.
Best regards, James Howison
[1] Crowston, K., Wei, K., Howison, J., and Wiggins, A. (2012). Free (libre) open source software development: What we know and what we do not know. ACM Computing Surveys, 44(2): http://floss.syr.edu/content/freelibre-open-source-software-development-what...
On Mar 14, 2011, at 13:58, Chitu Okoli wrote:
Hi everyone,
We are a research group conducting a systematic literature review on Wikipedia-related peer-reviewed academic studies published in the English language. (Although there are many excellent studies in other languages, we unfortunately do not have the resources to systematically review these at any kind of acceptable scholarly level. Also, our study is about Wikipedia only, not about other Wikimedia Foundation projects. However, we do include studies about other language Wikipedias, as long as the studies are published in English.) We have completed a search using many major databases of scholarly research. In a separate thread, we will also talk about research questions related to our review.
Thanks for your help.
Chitu Okoli, Concordia University, Montreal, Canada (http://chitu.okoli.org/professional/open-content/wikipedia-and-open-content....) Arto Lanamäki, University of Agder, Kristiansand, Norway Mohamad Mehdi, Concordia University, Montreal, Canada Mostafa Mesgari, Concordia University, Montreal, Canada
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Sorry for the late responses; with classes, meetings, office hours, baby, and so on, I can't respond as fast as I'd like, but I'm really grateful for all the great responses.
Thanks, James for the ideas you've suggested; I summarize them thus: * Publication date cut-off: We'll play with these and see how many we're left with. * Randomize: ha ha ha * Topic/empirical vs. conceptual/quantitative vs. qualitative: Actually, one of the features of our review is that we explicitly want to include non-computer science works in our review, many of which are conceptual, qualitative, and covering unusual topics (e.g. music). Any of these criteria would systematically exclude these articles. Unfortunately, we see that our journal vs. conference cut-off systematically excludes many computer science articles :-( * Cited articles: We hadn't thought about this; I'll talk more about it in responding to Travis' thread. * Adding more reviewers: I'll follow up on this in responding to Reid's thread.
Thanks a lot.
~ Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles (was Request to verify articles for Wikipedia literature review) De : James Howison james@howison.name Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : 15/03/2011 4:57 PM
I am a little sheepish; clearly you've really struggled with this, it's certainly a huge amount of papers.
I'm tempted to ask what happens if you cut by publication date, but I suspect that that doesn't help much because of the accelerating rate of publication. In any case not entirely sure of the justification for not including older things, it's not as though one stops knowing them :)
Ah, I know: randomize ;) Ok, that's not really in the spirit of a review article.
Have you considered cutting by some first quick pass characteristics, such as topic (using some framework relevant to your interests, we used Input-Process-Output for organizing studies of FLOSS)/empirical vs conceptual/perhaps even quant. vs qual. That is, of course, a lot of work just there but it seems to deal with the selection bias the best. That would also help give a conceptual focus to the review article.
To avoid the full selection bias of excluding conferences, perhaps you could include only those that are cited in your journal articles? (hmmm, issues there, but perhaps worth thinking about; could one seek out some variant of "the connected set" of articles, with some cutting factor on the strength of linkage to bring the number down to something managable?).
Adding people to your review team is another option, I'm sure you've thought about that. Difficulties there are obvious (a good review goes beyond 'tagging' articles and conducts cross-cutting conceptually organized perspective, hard to coordinate or build through disconnected work).
Best wishes for the work, James
On Mar 15, 2011, at 14:56, Chitu Okoli wrote:
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
yay!
On Mar 16, 2011, at 1:08 PM, Chitu Okoli wrote:
Sorry for the late responses; with classes, meetings, office hours, baby, and so on, I can't respond as fast as I'd like, but I'm really grateful for all the great responses.
Thanks, James for the ideas you've suggested; I summarize them thus: * Publication date cut-off: We'll play with these and see how many we're left with. * Randomize: ha ha ha * Topic/empirical vs. conceptual/quantitative vs. qualitative: Actually, one of the features of our review is that we explicitly want to include non-computer science works in our review, many of which are conceptual, qualitative, and covering unusual topics (e.g. music). Any of these criteria would systematically exclude these articles. Unfortunately, we see that our journal vs. conference cut-off systematically excludes many computer science articles :-( * Cited articles: We hadn't thought about this; I'll talk more about it in responding to Travis' thread. * Adding more reviewers: I'll follow up on this in responding to Reid's thread.
Thanks a lot.
~ Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles (was Request to verify articles for Wikipedia literature review) De : James Howison james@howison.namemailto:james@howison.name Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.orgmailto:wiki-research-l@lists.wikimedia.org Date : 15/03/2011 4:57 PM
I am a little sheepish; clearly you've really struggled with this, it's certainly a huge amount of papers.
I'm tempted to ask what happens if you cut by publication date, but I suspect that that doesn't help much because of the accelerating rate of publication. In any case not entirely sure of the justification for not including older things, it's not as though one stops knowing them :)
Ah, I know: randomize ;) Ok, that's not really in the spirit of a review article.
Have you considered cutting by some first quick pass characteristics, such as topic (using some framework relevant to your interests, we used Input-Process-Output for organizing studies of FLOSS)/empirical vs conceptual/perhaps even quant. vs qual. That is, of course, a lot of work just there but it seems to deal with the selection bias the best. That would also help give a conceptual focus to the review article.
To avoid the full selection bias of excluding conferences, perhaps you could include only those that are cited in your journal articles? (hmmm, issues there, but perhaps worth thinking about; could one seek out some variant of "the connected set" of articles, with some cutting factor on the strength of linkage to bring the number down to something managable?).
Adding people to your review team is another option, I'm sure you've thought about that. Difficulties there are obvious (a good review goes beyond 'tagging' articles and conducts cross-cutting conceptually organized perspective, hard to coordinate or build through disconnected work).
Best wishes for the work, James
On Mar 15, 2011, at 14:56, Chitu Okoli wrote:
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wow my mailer must have tricked me into responding to the wrong piece of email. Sorry for the unnecessary noise, folks.
-E
On Mar 16, 2011, at 1:34 PM, Eric Bloch wrote:
yay!
On Mar 16, 2011, at 1:08 PM, Chitu Okoli wrote:
Sorry for the late responses; with classes, meetings, office hours, baby, and so on, I can't respond as fast as I'd like, but I'm really grateful for all the great responses.
Thanks, James for the ideas you've suggested; I summarize them thus: * Publication date cut-off: We'll play with these and see how many we're left with. * Randomize: ha ha ha * Topic/empirical vs. conceptual/quantitative vs. qualitative: Actually, one of the features of our review is that we explicitly want to include non-computer science works in our review, many of which are conceptual, qualitative, and covering unusual topics (e.g. music). Any of these criteria would systematically exclude these articles. Unfortunately, we see that our journal vs. conference cut-off systematically excludes many computer science articles :-( * Cited articles: We hadn't thought about this; I'll talk more about it in responding to Travis' thread. * Adding more reviewers: I'll follow up on this in responding to Reid's thread.
Thanks a lot.
~ Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles (was Request to verify articles for Wikipedia literature review) De : James Howison james@howison.namemailto:james@howison.name Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.orgmailto:wiki-research-l@lists.wikimedia.org Date : 15/03/2011 4:57 PM
I am a little sheepish; clearly you've really struggled with this, it's certainly a huge amount of papers.
I'm tempted to ask what happens if you cut by publication date, but I suspect that that doesn't help much because of the accelerating rate of publication. In any case not entirely sure of the justification for not including older things, it's not as though one stops knowing them :)
Ah, I know: randomize ;) Ok, that's not really in the spirit of a review article.
Have you considered cutting by some first quick pass characteristics, such as topic (using some framework relevant to your interests, we used Input-Process-Output for organizing studies of FLOSS)/empirical vs conceptual/perhaps even quant. vs qual. That is, of course, a lot of work just there but it seems to deal with the selection bias the best. That would also help give a conceptual focus to the review article.
To avoid the full selection bias of excluding conferences, perhaps you could include only those that are cited in your journal articles? (hmmm, issues there, but perhaps worth thinking about; could one seek out some variant of "the connected set" of articles, with some cutting factor on the strength of linkage to bring the number down to something managable?).
Adding people to your review team is another option, I'm sure you've thought about that. Difficulties there are obvious (a good review goes beyond 'tagging' articles and conducts cross-cutting conceptually organized perspective, hard to coordinate or build through disconnected work).
Best wishes for the work, James
On Mar 15, 2011, at 14:56, Chitu Okoli wrote:
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
When you consider a "top tier conference", how do you know you are not excluding contributions that might be not just novel but also truly important?
It seems that page rank plays the role of beauty contest in the sense that top-ranked pages are those already in the view of others. I have seen comments that this filters against novelty, possibly crucial novelty.
Jack
On Tue, Mar 15, 2011 at 11:56 AM, Chitu Okoli Chitu.Okoli@concordia.ca wrote:
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
We considered including top-tier conferences, but the question is, what is a "top conference"? In trying to answer this, we looked at a couple of sources:
- Top Tier and 2nd tier conferences from
http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html
- A-ranked conferences in Information and Computing Sciences from
http://lamp.infosys.deakin.edu.au/era/?page=cforsel10
- We also considered including all WikiSym articles on Wikipedia
We identified which of the 1,500 conference papers from http://en.wikipedia.org/wiki/User:Moudy83/conference_papers were "top conferences" by those definitions, and we found over 400. On top of our 600 journal articles and doctoral theses, we think 1,000 papers is just too much for us to handle.
If we could somehow narrow it down to 100 relevant conference papers, we could add that in, but no more. However, how do we select which conferences are "must includes" while unfortunately leaving out the rest? We just don't know how to do this in a non-arbitrary, objective manner that would truly identify the top 100 conference papers on Wikipedia that contribute to scholarly knowledge.
Any ideas on how to do this would be very much appreciated.
Regards, Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Request to verify articles for Wikipedia literature review De : Travis Kriplean travis@cs.washington.edu Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : 14/03/2011 3:46 PM
As an HCI/CS researcher who has published at top peer-reviewed conferences about Wikipedia, but not journals, I'd like to echo James' statements. Journals are not the norm in CS/HCI research. Knowledge is shared through conferences, not journals.
On 3/14/11 11:32 AM, James Howison wrote:
Hi there,
Great project; massive but will be much appreciated. We did something similar for empirical studies of Open Source, recently accepted at ACM Computing Surveys (PDF pre-print available here [1], article not in print until 2012 (!! that's another email entirely, bah))
I recognize the need to cut down the number of articles for review, we reviewed around 600 and that was a multi-year effort. We did that mainly by excluding conceptual (hence empirical) or passing reference articles (ie we did a two-step filter on many more articles), but were forced to only do journal articles for updates during the (long) revision process. I regret that necessity, it decreases the utility of the work.
Given the publication venues of choice for many academics in this community I do wonder if you aren't shooting yourself in the foot by excluding peer-reviewed conferences and restricting to journals. Personally I'd rather read a review that included the top journals and top conferences than one that included all journals. Or even rather read a review over a shorter time period that included publications over journals and conferences, or on more specified topics. The interesting question is, "what do we know about wikipedia" not "what did we publish in journals about wikipedia". In particular you will find you have systematically excluded the contribution of HCI authors.
Given the commendable and massive effort you are providing (and your approach to coverage below is really interesting), getting that wrong at the outset seems a shame.
Best regards, James Howison
[1] Crowston, K., Wei, K., Howison, J., and Wiggins, A. (2012). Free (libre) open source software development: What we know and what we do not know. ACM Computing Surveys, 44(2): http://floss.syr.edu/content/freelibre-open-source-software-development-what...
On Mar 14, 2011, at 13:58, Chitu Okoli wrote:
Hi everyone,
We are a research group conducting a systematic literature review on Wikipedia-related peer-reviewed academic studies published in the English language. (Although there are many excellent studies in other languages, we unfortunately do not have the resources to systematically review these at any kind of acceptable scholarly level. Also, our study is about Wikipedia only, not about other Wikimedia Foundation projects. However, we do include studies about other language Wikipedias, as long as the studies are published in English.) We have completed a search using many major databases of scholarly research. In a separate thread, we will also talk about research questions related to our review.
Thanks for your help.
Chitu Okoli, Concordia University, Montreal, Canada (http://chitu.okoli.org/professional/open-content/wikipedia-and-open-content....) Arto Lanamäki, University of Agder, Kristiansand, Norway Mohamad Mehdi, Concordia University, Montreal, Canada Mostafa Mesgari, Concordia University, Montreal, Canada
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Jack,
Actually, the reason we're talking about top-tier is based on the same reason we talk about peer-reviewed versus non-peer-reviewed. No one can argue that non-peer-reviewed work (such as working papers) often have completely novel ideas. The problem is that someone has to wade through tens of thousands of works of hugely varying quality to find a few pearls. The peer-review process does this wading; while it might miss a few novel items, it would probably get most of the high-quality ones. Similarly, there are at least 2,000 Wikipedia studies. Since we can't go through all of them, we hope that most of the high-quality novel ideas do appear in publication outlets that are universally recognized to be of higher quality than average.
Thanks, Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles (was Request to verify articles for Wikipedia literature review) De : Jack Park jackpark@gmail.com Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : 15/03/2011 5:26 PM
When you consider a "top tier conference", how do you know you are not excluding contributions that might be not just novel but also truly important?
It seems that page rank plays the role of beauty contest in the sense that top-ranked pages are those already in the view of others. I have seen comments that this filters against novelty, possibly crucial novelty.
Jack
On Tue, Mar 15, 2011 at 11:56 AM, Chitu OkoliChitu.Okoli@concordia.ca wrote:
We considered including top-tier conferences, but the question is, what is a "top conference"? In trying to answer this, we looked at a couple of sources:
- Top Tier and 2nd tier conferences from
http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html
- A-ranked conferences in Information and Computing Sciences from
http://lamp.infosys.deakin.edu.au/era/?page%C3%8Forsel10
- We also considered including all WikiSym articles on Wikipedia
Hey there,
I sympathize with your dilemma...and I think we might have actually talked about this at Wikimania 2009. Unfortunately, while you may be satisfied that 600 journal articles + theses is enough (I certainly would be too), you should be equipped to recognize that if you keep it that way you are systematically excluding large, significant bodies of research deriving from computer science and HCI. As you make this choice, read through one or two of these conference papers and measure it against the quality of a randomly selected set of journal articles in your set: - http://dub.washington.edu/djangosite/media/papers/tmpZ77p1r.pdf - http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1485/1841 - http://www.cs.cornell.edu/~danco/research/papers/suggestbot-iui07.pdf - http://users.soe.ucsc.edu/~luca/papers/07/wikiwww2007.pdf - http://portal.acm.org/citation.cfm?id=1518928
I bet that these conference papers are on the balance of higher quality than a random journal article in your set.
Unfortunately, there isn't a good answer for the best methods to follow. Everyone in my field (HCI) pretty much knows what the first tier conferences are where wikipedia research is published: CHI, CSCW, and UIST; and second tier at GROUP. These are all under the ACM SIGCHI banner (http://www.sigchi.org/). Another way to put this is that there are no objective measures, its a question of what the researchers themselves see as high quality. Ultimately, this is the same as with journals, although they tend to have impact factors. If I were to estimate how many high quality conference papers from the HCI angle there are, I would put it at about 20-30.
Of course, this is only for HCI research, not all CS research. Conferences such as WWW have published excellent research on Wikipedia, such as the initial paper out of the WikiTrust group, which, if you've been around the wiki community, know that they have had a big impact. WWW is considered to be a high quality CS conference. Likewise, there has been Wiki research published at database and AI conferences. For example, the Intelligence in Wikipedia project (summarized here http://portal.acm.org/citation.cfm?id=1620344).
Unfortunately, your two links to top conferences are pretty much inaccurate pictures of the CS conference field (for example, the deakin link puts GECCO as the top conference in one of the major categories, which is basically laughable). And while we might all love wikisym, it from an academic standpoint, it is definitely not a tier one venue.
I cringe to suggest this, but one possible methodology you might follow is to do citation count filtering, using, e.g. google scholar. Citations give you an indicator of whether other researchers have found it useful to draw on. Look at the average citation count of the journal papers, then filter your list of 1500 conference papers down to those papers that have, say, twice the citations as the average citation count of a journal article.
Honestly though, your best methodology would be to have a small group of HCI researchers, a small group of AI researchers, and a small group of database researchers who have worked on wikipedia compile a list of the conference papers that they believe are best representative of the research that that community has done on wikipedia.
Hope that helps, and sorry to hear you still struggling with this issue.
Best, Travis
On 3/15/11 11:56 AM, Chitu Okoli wrote:
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
We considered including top-tier conferences, but the question is, what is a "top conference"? In trying to answer this, we looked at a couple of sources:
- Top Tier and 2nd tier conferences from
http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html
- A-ranked conferences in Information and Computing Sciences from
http://lamp.infosys.deakin.edu.au/era/?page=cforsel10
- We also considered including all WikiSym articles on Wikipedia
We identified which of the 1,500 conference papers from http://en.wikipedia.org/wiki/User:Moudy83/conference_papers were "top conferences" by those definitions, and we found over 400. On top of our 600 journal articles and doctoral theses, we think 1,000 papers is just too much for us to handle.
If we could somehow narrow it down to 100 relevant conference papers, we could add that in, but no more. However, how do we select which conferences are "must includes" while unfortunately leaving out the rest? We just don't know how to do this in a non-arbitrary, objective manner that would truly identify the top 100 conference papers on Wikipedia that contribute to scholarly knowledge.
Any ideas on how to do this would be very much appreciated.
Regards, Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Request to verify articles for Wikipedia literature review De : Travis Kriplean travis@cs.washington.edu Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : 14/03/2011 3:46 PM
As an HCI/CS researcher who has published at top peer-reviewed conferences about Wikipedia, but not journals, I'd like to echo James' statements. Journals are not the norm in CS/HCI research. Knowledge is shared through conferences, not journals.
On 3/14/11 11:32 AM, James Howison wrote:
Hi there,
Great project; massive but will be much appreciated. We did something similar for empirical studies of Open Source, recently accepted at ACM Computing Surveys (PDF pre-print available here [1], article not in print until 2012 (!! that's another email entirely, bah))
I recognize the need to cut down the number of articles for review, we reviewed around 600 and that was a multi-year effort. We did that mainly by excluding conceptual (hence empirical) or passing reference articles (ie we did a two-step filter on many more articles), but were forced to only do journal articles for updates during the (long) revision process. I regret that necessity, it decreases the utility of the work.
Given the publication venues of choice for many academics in this community I do wonder if you aren't shooting yourself in the foot by excluding peer-reviewed conferences and restricting to journals. Personally I'd rather read a review that included the top journals and top conferences than one that included all journals. Or even rather read a review over a shorter time period that included publications over journals and conferences, or on more specified topics. The interesting question is, "what do we know about wikipedia" not "what did we publish in journals about wikipedia". In particular you will find you have systematically excluded the contribution of HCI authors.
Given the commendable and massive effort you are providing (and your approach to coverage below is really interesting), getting that wrong at the outset seems a shame.
Best regards, James Howison
[1] Crowston, K., Wei, K., Howison, J., and Wiggins, A. (2012). Free (libre) open source software development: What we know and what we do not know. ACM Computing Surveys, 44(2): http://floss.syr.edu/content/freelibre-open-source-software-development-what...
On Mar 14, 2011, at 13:58, Chitu Okoli wrote:
Hi everyone,
We are a research group conducting a systematic literature review on Wikipedia-related peer-reviewed academic studies published in the English language. (Although there are many excellent studies in other languages, we unfortunately do not have the resources to systematically review these at any kind of acceptable scholarly level. Also, our study is about Wikipedia only, not about other Wikimedia Foundation projects. However, we do include studies about other language Wikipedias, as long as the studies are published in English.) We have completed a search using many major databases of scholarly research. In a separate thread, we will also talk about research questions related to our review.
Thanks for your help.
Chitu Okoli, Concordia University, Montreal, Canada (http://chitu.okoli.org/professional/open-content/wikipedia-and-open-content....)
Arto Lanamäki, University of Agder, Kristiansand, Norway Mohamad Mehdi, Concordia University, Montreal, Canada Mostafa Mesgari, Concordia University, Montreal, Canada
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Chitu and others,
I too see great need for a comprehensive survey paper in this field. My own personal interest is in one that covers wiki research in general, not just research of Wikipedia; this of course makes the intractable number of papers even more intractable.
In fact, I am involved with a team of researchers with the same goal as you, though we are just getting started.
It seems to me that you are in a very difficult position. As others have noted, the scoping filter you propose is not a good one, but the number of papers is simply intractable without a very aggressive filter that excludes 2/3 or more of the known papers. (To further complicate the issue, I am skeptical of machine filtering period, fearing that any useful filter would necessarily be complex and difficult to justify in a writeup.)
However, I believe that there is a solution, and that is to dramatically increase the team size by doing the analysis wiki style. Rather than a small team creating the review, do it in public with an open set of contributors. Specifically, I propose:
1. Create a public Mediawiki instance. 2. Decide on a relatively standardized format of reviewing each paper (metadata formats, an infobox, how to write reviews of each, etc.) 3. Upload your existing Zotero database into this new wiki (I would be happy to write a script to do this). 4. Proceed with paper readings, with the goal that every single paper is looked at by human eyes. 5. Use this content to produce one or more review articles.
The goals of the effort would be threefold.
* Create an annotated bibliography of wiki research that is easy to keep up to date. * Identify the N most important papers for more focused study and synthesis (perhaps leading towards more than one survey article). * Provide metadata on the complete set of papers so that it can be described statistically.
Simply put, I believe that we as modern researchers need to be able to build survey articles which analyze 2,000-5,000 or more papers, and maybe this is a way to do that.
I and the other members of my team have already planned significant time towards this effort and would be very excited to join forces to lead such a mass collaboration.
Why use Mediawiki rather than Zotero or some other bibliography manager? First, it would be easy for anyone to participate because there is no software to install, no database to import, etc. Second, I personally have found Zotero, CiteULike, and every other bibliography manager I've tried to be clunky and tedious to use and not flexible enough for my needs (for example, three-state tags that let us say a paper has, does not have, or we do not know if it has, a certain property could be useful). We can always export the data into whatever bibliography software is preferred by particular authors.
Authorship is of course an issue, and one that should be worked out before people start contributing IMO, but not an intractable one, and there is precedent for scientific papers to have hundreds of authors (and it would certainly be in the wiki spirit). I myself would love to have a prominent place in the author list, but having the survey article written at all is a much higher priority.
Finally, one of my dreams has been to create a more or less complete database of *all* scientific publications, with reviews, a citation graph, private notes, and a robust data model (e.g., one that can tell two John Smiths apart and know when J. Smith is the same as John Smith). Maybe this is the first step along that path. (I did work a bit on data models for citation databases a bit about five years go and still use the software I created - Yabman, http://yabman.sf.net/.)
Thoughts?
Reid
p.s. Chitu, do you subscribe to this list? If so, we'll stop CC'ing you; if not, I encourage you to do so - it's pretty low traffic and certainly relevant to your work.
I like this idea.
I see this as a topic map problem, keeping track of provenance and topics covered. Wikipedia is run very much like a topic map; there are ways to disambiguate name collisions, and each article is restricted to one and only one topic, with occasional merge suggestions.
As a side note, consider the addition of available open source software for "reading" each and every document that is readable (not all PDFs, for instance, can be read by machine). Doing so, you can create a searchable index into each document, perform varieties of studies, e.g. wordgrams, topic models (Latent Dirichlet), and even clustering, as a background task that provides a larger context for all reviews.
Decision making on which documents rate highest is complex, possibly wicked, and perhaps should relate more to particular goals; which papers contribute how much information to which topics, etc. What animates that thought is the "evidence profiles" used by Watson in the Jeopardy competition. Watson was nowhere near competitive until they created and refined evidence profiles.
Jack
On Wed, Mar 16, 2011 at 9:26 AM, Reid Priedhorsky reid@reidster.net wrote:
Chitu and others,
I too see great need for a comprehensive survey paper in this field. My own personal interest is in one that covers wiki research in general, not just research of Wikipedia; this of course makes the intractable number of papers even more intractable.
In fact, I am involved with a team of researchers with the same goal as you, though we are just getting started.
It seems to me that you are in a very difficult position. As others have noted, the scoping filter you propose is not a good one, but the number of papers is simply intractable without a very aggressive filter that excludes 2/3 or more of the known papers. (To further complicate the issue, I am skeptical of machine filtering period, fearing that any useful filter would necessarily be complex and difficult to justify in a writeup.)
However, I believe that there is a solution, and that is to dramatically increase the team size by doing the analysis wiki style. Rather than a small team creating the review, do it in public with an open set of contributors. Specifically, I propose:
- Create a public Mediawiki instance.
- Decide on a relatively standardized format of reviewing each paper
(metadata formats, an infobox, how to write reviews of each, etc.) 3. Upload your existing Zotero database into this new wiki (I would be happy to write a script to do this). 4. Proceed with paper readings, with the goal that every single paper is looked at by human eyes. 5. Use this content to produce one or more review articles.
The goals of the effort would be threefold.
- Create an annotated bibliography of wiki research that is easy to keep
up to date.
- Identify the N most important papers for more focused study and
synthesis (perhaps leading towards more than one survey article).
- Provide metadata on the complete set of papers so that it can be
described statistically.
Simply put, I believe that we as modern researchers need to be able to build survey articles which analyze 2,000-5,000 or more papers, and maybe this is a way to do that.
I and the other members of my team have already planned significant time towards this effort and would be very excited to join forces to lead such a mass collaboration.
Why use Mediawiki rather than Zotero or some other bibliography manager? First, it would be easy for anyone to participate because there is no software to install, no database to import, etc. Second, I personally have found Zotero, CiteULike, and every other bibliography manager I've tried to be clunky and tedious to use and not flexible enough for my needs (for example, three-state tags that let us say a paper has, does not have, or we do not know if it has, a certain property could be useful). We can always export the data into whatever bibliography software is preferred by particular authors.
Authorship is of course an issue, and one that should be worked out before people start contributing IMO, but not an intractable one, and there is precedent for scientific papers to have hundreds of authors (and it would certainly be in the wiki spirit). I myself would love to have a prominent place in the author list, but having the survey article written at all is a much higher priority.
Finally, one of my dreams has been to create a more or less complete database of *all* scientific publications, with reviews, a citation graph, private notes, and a robust data model (e.g., one that can tell two John Smiths apart and know when J. Smith is the same as John Smith). Maybe this is the first step along that path. (I did work a bit on data models for citation databases a bit about five years go and still use the software I created - Yabman, http://yabman.sf.net/.)
Thoughts?
Reid
p.s. Chitu, do you subscribe to this list? If so, we'll stop CC'ing you; if not, I encourage you to do so - it's pretty low traffic and certainly relevant to your work.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
- Create a public Mediawiki instance.
- Decide on a relatively standardized format of reviewing each paper
(metadata formats, an infobox, how to write reviews of each, etc.) 3. Upload your existing Zotero database into this new wiki (I would be happy to write a script to do this). 4. Proceed with paper readings, with the goal that every single paper is looked at by human eyes. 5. Use this content to produce one or more review articles.
There has been some talk of a wiki for papers - also on this list as far as I remember. There is Bibdex (http://www.bibdex.com/), AcaWiki (http://acawiki.org) and I have the "Brede Wiki" (http://neuro.imm.dtu.dk/wiki/). The AcaWiki use Semantic Mediawiki (AFAIK) and I use MediaWiki templates. You can see an example here:
http://neuro.imm.dtu.dk/wiki/Putting_Wikipedia_to_the_test:_a_case_study
There is an infobox with citation information and sections on "related studies" and "critique".
It is a question though whether such more general targeted wikis are appropriate for composing a collaborative paper.
I have also begun a small Wikipedia review that I upload to our server yesterday:
http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6012/pdf/imm6012.pdf
I think I will never be able to do an exhaustive review of all papers, but my idea was to give an overview of as many aspect as possible. I think that some research published outside journals and conferences are interesting, e.g., surveys and some of the statistics performed by Erik Zachte. I don't think that Pew's survey has be peer-reviewed, so "just" including journal and conference papers is in my opinion not quite enough to give a complete picture.
/Finn
___________________________________________________________________
Finn Aarup Nielsen, DTU Informatics, Denmark Lundbeck Foundation Center for Integrated Molecular Brain Imaging http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ ___________________________________________________________________
Hi Travis,
I thought that was you when I read your post; yes, we did indeed talk. Actually, it was after our talk that I went through extensive searching to find what is considered top-tier in computer science. Here are brief comments I should have included earlier explaining how I came up with the three sources of computer science "high quality" conferences:
* Top Tier and 2nd tier conferences from http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html: In extensive searching for computer science conference rankings, this is the absolute best I could find, and most other rankings I found have either referred to or copied from this list. * A-ranked conferences in Information and Computing Sciences from http://lamp.infosys.deakin.edu.au/era/?page=cforsel10: This is the most exhaustive journal ranking exercise I have ever found anywhere. Unfortunately, I like you have serious questions about the face validity of these rankings; I think they heavily overrate many conferences in my own field of information systems; I assume the same is true with other fields that I don't know so well. (My primary reservation with conference or journal rankings by professors is that I strongly suspect that one of the main criteria for their rankings is whether or not they have published in that outlet before.) Unfortunately, I don't know of anything that approaches this ranking in comprehensiveness. * We also considered including all WikiSym articles on Wikipedia: This is not because of any statement of WikiSym's quality, but simply because WikiSym is probably the closest thing that exists to an academic conference specifically for Wikipedia-related research.
Is there no widely-accepted listing of computer science conference rankings? You say, "Everyone in my field (HCI) pretty much knows what the first tier conferences are where wikipedia research is published." The problem is that I could say the same thing about my field, but another researcher would have a different list. There is generally consensus about the top two or three in any field, but the huge grey zone comes when you try to draw a line. Even your idea of getting small groups of experts to validate a number of conferences is pretty shaky, since another small group of experts would almost definitely give different results.
Citation counts are always a sticky issue; they depend mainly on indexing by citation count databases and recency of articles. However, I do consider them one of the most objective (not necessarily one of the best, but one of the most objective) criteria for paper quality. Based on your suggestion, I just now discovered that ACM Digital Library includes citation counts for conference papers. By way of brainstorming, I'm thinking of this possible inclusion rule:
* Calculate (a) the average citation count for Wikipedia articles (either only journal average, only conference average, or average of both), and (b) average citation count for each journal and/or conference that publishes Wikipedia research. (b) is basically (a) grouped by journal/conference. * Rather than doing raw citation counts, we could try to calculate citations per year or some other weighting that recognizes that more recent articles would have fewer citations than older ones. * Include all conference papers greater than the average (whichever average we choose) and/or include conference papers from all conferences greater than the average. Or we could include all conference papers whose average citations per year are greater than the average for journal articles. Or just include the top 100 ranked conference papers, or however many we can handle.
Although still somewhat artificial, this could give possibly give us a somewhat objective basis to filter up the "higher quality" conference papers based on citation analysis.
I don't know if I'm trying to go far with this citation count possibility, but what do you all think?
Thanks again, Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles (was Request to verify articles for Wikipedia literature review) De : Travis Kriplean travis@cs.washington.edu Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Copie à : Chitu Okoli Chitu.Okoli@concordia.ca Date : 15/03/2011 5:26 PM
Hey there,
I sympathize with your dilemma...and I think we might have actually talked about this at Wikimania 2009. Unfortunately, while you may be satisfied that 600 journal articles + theses is enough (I certainly would be too), you should be equipped to recognize that if you keep it that way you are systematically excluding large, significant bodies of research deriving from computer science and HCI. As you make this choice, read through one or two of these conference papers and measure it against the quality of a randomly selected set of journal articles in your set:
- http://dub.washington.edu/djangosite/media/papers/tmpZ77p1r.pdf
- http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1485/1841
- http://www.cs.cornell.edu/~danco/research/papers/suggestbot-iui07.pdf
- http://users.soe.ucsc.edu/~luca/papers/07/wikiwww2007.pdf
- http://portal.acm.org/citation.cfm?id%1518928
I bet that these conference papers are on the balance of higher quality than a random journal article in your set.
Unfortunately, there isn't a good answer for the best methods to follow. Everyone in my field (HCI) pretty much knows what the first tier conferences are where wikipedia research is published: CHI, CSCW, and UIST; and second tier at GROUP. These are all under the ACM SIGCHI banner (http://www.sigchi.org/). Another way to put this is that there are no objective measures, its a question of what the researchers themselves see as high quality. Ultimately, this is the same as with journals, although they tend to have impact factors. If I were to estimate how many high quality conference papers from the HCI angle there are, I would put it at about 20-30.
Of course, this is only for HCI research, not all CS research. Conferences such as WWW have published excellent research on Wikipedia, such as the initial paper out of the WikiTrust group, which, if you've been around the wiki community, know that they have had a big impact. WWW is considered to be a high quality CS conference. Likewise, there has been Wiki research published at database and AI conferences. For example, the Intelligence in Wikipedia project (summarized here http://portal.acm.org/citation.cfm?id%1620344).
Unfortunately, your two links to top conferences are pretty much inaccurate pictures of the CS conference field (for example, the deakin link puts GECCO as the top conference in one of the major categories, which is basically laughable). And while we might all love wikisym, it from an academic standpoint, it is definitely not a tier one venue.
I cringe to suggest this, but one possible methodology you might follow is to do citation count filtering, using, e.g. google scholar. Citations give you an indicator of whether other researchers have found it useful to draw on. Look at the average citation count of the journal papers, then filter your list of 1500 conference papers down to those papers that have, say, twice the citations as the average citation count of a journal article.
Honestly though, your best methodology would be to have a small group of HCI researchers, a small group of AI researchers, and a small group of database researchers who have worked on wikipedia compile a list of the conference papers that they believe are best representative of the research that that community has done on wikipedia.
Hope that helps, and sorry to hear you still struggling with this issue.
Best, Travis
On 3/15/11 11:56 AM, Chitu Okoli wrote:
James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.
On Thu, Mar 17, 2011 at 8:41 AM, Chitu Okoli Chitu.Okoli@concordia.ca wrote:
...
- A-ranked conferences in Information and Computing Sciences from
http://lamp.infosys.deakin.edu.au/era/?page=cforsel10: This is the most exhaustive journal ranking exercise I have ever found anywhere.
With regards to John Lamps journal list, it is a copy of the *first* ERA journal list.
http://en.wikipedia.org/wiki/Excellence_in_Research_for_Australia
There is a second ERA journal list being compiled for 2012. Submissions closed yesterday, and review of ranking is now underway.
The journal list can be browsed via the website.
However there is no publicly download-able dataset available yet.
If anyone wants a copy of the second ERA journal list in xml or csv, I can provide it offlist.
Public consultation about the ranking is open until April 4.
Unfortunately, I like you have serious questions about the face validity of these rankings; I think they heavily overrate many conferences in my own field of information systems; I assume the same is true with other fields that I don't know so well. (My primary reservation with conference or journal rankings by professors is that I strongly suspect that one of the main criteria for their rankings is whether or not they have published in that outlet before.) Unfortunately, I don't know of anything that approaches this ranking in comprehensiveness.
One important point to note in regards to conferences in that journal list is that conferences are only ranked for the disciplines of
* 08 Information and computer science http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/4C3249439D3285D6CA257...
* 09 Engineering http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/050A7395E86A9719CA257...
* 12 Built environment and design http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/B20002D4CAD6966DCA257...
IMO the ranked conference list was useless in the 2010 ERA process and results. I've yet to see any improvement in this area for the 2012 ERA.
-- John Vandenberg
Thanks, John. It's great to get an Australian perspective on the ERA ranking. You've definitely confirmed that it's not very useful for our purposes here. So, I'll pass on the CSV copy of the 2012 version :-(
~ Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles De : John Vandenberg jayvdb@gmail.com Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : Lun. 21 Mars 2011 23:06:06 EST
On Thu, Mar 17, 2011 at 8:41 AM, Chitu OkoliChitu.Okoli@concordia.ca wrote: ...
* A-ranked conferences in Information and Computing Sciences from http://lamp.infosys.deakin.edu.au/era/?page=cforsel10: This is the most exhaustive journal ranking exercise I have ever found anywhere.
With regards to John Lamps journal list, it is a copy of the *first* ERA journal list.
http://en.wikipedia.org/wiki/Excellence_in_Research_for_Australia
There is a second ERA journal list being compiled for 2012. Submissions closed yesterday, and review of ranking is now underway.
The journal list can be browsed via the website.
However there is no publicly download-able dataset available yet.
If anyone wants a copy of the second ERA journal list in xml or csv, I can provide it offlist.
Public consultation about the ranking is open until April 4.
Unfortunately, I like you have serious questions about the face validity of these rankings; I think they heavily overrate many conferences in my own field of information systems; I assume the same is true with other fields that I don't know so well. (My primary reservation with conference or journal rankings by professors is that I strongly suspect that one of the main criteria for their rankings is whether or not they have published in that outlet before.) Unfortunately, I don't know of anything that approaches this ranking in comprehensiveness.
One important point to note in regards to conferences in that journal list is that conferences are only ranked for the disciplines of
* 08 Information and computer science http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/4C3249439D3285D6CA257...
* 09 Engineering http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/050A7395E86A9719CA257...
* 12 Built environment and design http://www.abs.gov.au/AUSSTATS/abs@.nsf/Latestproducts/B20002D4CAD6966DCA257...
IMO the ranked conference list was useless in the 2010 ERA process and results. I've yet to see any improvement in this area for the 2012 ERA.
-- John Vandenberg
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
2011/3/14 Chitu Okoli Chitu.Okoli@concordia.ca:
We have now updated the following page with the peer-reviewed journal articles and doctoral theses we have identified: http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia. Please note that we have only updated the sections on peer-reviewed journal articles and on theses; we have not updated other sections with newly identified studies, except for correcting some misclassified items.
This is incredibly useful work, thanks for publishing to the wiki. Wonderful.
Chitu Okoli wrote:
We have now updated the following page with the peer-reviewed journal articles and doctoral theses we have identified: http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia. Please note that we have only updated the sections on peer-reviewed journal articles and on theses; we have not updated other sections with newly identified studies, except for correcting some misclassified items.
I am glad somebody found time and will to update this page again. I stopped doing so two or three years back, sadly.
To help us in identifying all eligible studies, we would really appreciate it if you could look at the sections on peer-reviewed journal articles and theses in http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia, and send us any citations (by yourself or others) that you know are missing. In particular, please inform us of:
I see you found three out of my four papers, I will add the missing one :>
A while ago I started a draft of a paper in which I attempted to review the Wikipedia research up to 2007. I never finished it, but perhaps you'd find it of use. Let me know if you'd like a copy.
Hi Piotr,
Thanks for adding the article. I see that Interface is a relatively new journal, and open access as of November 2010. Unfortunately, open access journals are often poorly indexed in major commercial research databases, so this is exactly the kind of paper we're looking for by making this request. Thanks!
Yes, please do send me your 2007 working paper; I'd certainly like to take a look at it. Please e-mail it directly to me.
Regards, Chitu
-------- Message original -------- Sujet: Re: [Wiki-research-l] Request to verify articles for Wikipedia literature review De : Piotr Konieczny piokon@post.pl Pour : Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Date : 19/03/2011 10:16 PM
Chitu Okoli wrote:
We have now updated the following page with the peer-reviewed journal articles and doctoral theses we have identified: http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia. Please note that we have only updated the sections on peer-reviewed journal articles and on theses; we have not updated other sections with newly identified studies, except for correcting some misclassified items.
I am glad somebody found time and will to update this page again. I stopped doing so two or three years back, sadly.
To help us in identifying all eligible studies, we would really appreciate it if you could look at the sections on peer-reviewed journal articles and theses in http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia, and send us any citations (by yourself or others) that you know are missing. In particular, please inform us of:
I see you found three out of my four papers, I will add the missing one :>
A while ago I started a draft of a paper in which I attempted to review the Wikipedia research up to 2007. I never finished it, but perhaps you'd find it of use. Let me know if you'd like a copy.
wiki-research-l@lists.wikimedia.org