Hi Travis,

I thought that was you when I read your post; yes, we did indeed talk. Actually, it was after our talk that I went through extensive searching to find what is considered top-tier in computer science. Here are brief comments I should have included earlier explaining how I came up with the three sources of computer science "high quality" conferences:

* Top Tier and 2nd tier conferences from
http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html: In extensive searching for computer science conference rankings, this is the absolute best I could find, and most other rankings I found have either referred to or copied from this list.
* A-ranked conferences in Information and Computing Sciences from
http://lamp.infosys.deakin.edu.au/era/?page=cforsel10: This is the most exhaustive journal ranking exercise I have ever found anywhere. Unfortunately, I like you have serious questions about the face validity of these rankings; I think they heavily overrate many conferences in my own field of information systems; I assume the same is true with other fields that I don't know so well. (My primary reservation with conference or journal rankings by professors is that I strongly suspect that one of the main criteria for their rankings is whether or not they have published in that outlet before.) Unfortunately, I don't know of anything that approaches this ranking in comprehensiveness.
* We also considered including all WikiSym articles on Wikipedia: This is not because of any statement of WikiSym's quality, but simply because WikiSym is probably the closest thing that exists to an academic conference specifically for Wikipedia-related research.

Is there no widely-accepted listing of computer science conference rankings? You say, "Everyone in my field (HCI) pretty much knows what the first tier conferences are where wikipedia research is published." The problem is that I could say the same thing about my field, but another researcher would have a different list. There is generally consensus about the top two or three in any field, but the huge grey zone comes when you try to draw a line. Even your idea of getting small groups of experts to validate a number of conferences is pretty shaky, since another small group of experts would almost definitely give different results.

Citation counts are always a sticky issue; they depend mainly on indexing by citation count databases and recency of articles. However, I do consider them one of the most objective (not necessarily one of the best, but one of the most objective) criteria for paper quality. Based on your suggestion, I just now discovered that ACM Digital Library includes citation counts for conference papers. By way of brainstorming, I'm thinking of this possible inclusion rule:

* Calculate (a) the average citation count for Wikipedia articles (either only journal average, only conference average, or average of both), and (b) average citation count for each journal and/or conference that publishes Wikipedia research. (b) is basically (a) grouped by journal/conference.
* Rather than doing raw citation counts, we could try to calculate citations per year or some other weighting that recognizes that more recent articles would have fewer citations than older ones.
* Include all conference papers greater than the average (whichever average we choose) and/or include conference papers from all conferences greater than the average. Or we could include all conference papers whose average citations per year are greater than the average for journal articles. Or just include the top 100 ranked conference papers, or however many we can handle.

Although still somewhat artificial, this could give possibly give us a somewhat objective basis to filter up the "higher quality" conference papers based on citation analysis.

I don't know if I'm trying to go far with this citation count possibility, but what do you all think?

Thanks again,
Chitu


-------- Message original --------
Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or exclude conference articles (was Request to verify articles for Wikipedia literature review)
De : Travis Kriplean <travis@cs.washington.edu>
Pour : Research into Wikimedia content and communities
    <wiki-research-l@lists.wikimedia.org>
Copie à : Chitu Okoli <Chitu.Okoli@concordia.ca>
Date : 15/03/2011 5:26 PM
Hey there,

I sympathize with your dilemma...and I think we might have actually talked about this at Wikimania 2009. Unfortunately, while you may be satisfied that 600 journal articles + theses is enough (I certainly would be too), you should be equipped to recognize that if you keep it that way you are systematically excluding large, significant bodies of research deriving from computer science and HCI. As you make this choice, read through one or two of these conference papers and measure it against the quality of a randomly selected set of journal articles in your set:
   - http://dub.washington.edu/djangosite/media/papers/tmpZ77p1r.pdf
   - http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1485/1841
   - http://www.cs.cornell.edu/~danco/research/papers/suggestbot-iui07.pdf
   - http://users.soe.ucsc.edu/~luca/papers/07/wikiwww2007.pdf
   - http://portal.acm.org/citation.cfm?id18928

I bet that these conference papers are on the balance of higher quality than a random journal article in your set.

Unfortunately, there isn't a good answer for the best methods to follow. Everyone in my field (HCI) pretty much knows what the first tier conferences are where wikipedia research is published: CHI, CSCW, and UIST; and second tier at GROUP. These are all under the ACM SIGCHI banner (http://www.sigchi.org/). Another way to put this is that there are no objective measures, its a question of what the researchers themselves see as high quality. Ultimately, this is the same as with journals, although they tend to have impact factors. If I were to estimate how many high quality conference papers from the HCI angle there are, I would put it at about 20-30.

Of course, this is only for HCI research, not all CS research. Conferences such as WWW have published excellent research on Wikipedia, such as the initial paper out of the WikiTrust group, which, if you've been around the wiki community, know that they have had a big impact. WWW is considered to be a high quality CS conference. Likewise, there has been Wiki research published at database and AI conferences. For example, the Intelligence in Wikipedia project (summarized here http://portal.acm.org/citation.cfm?id20344).

Unfortunately, your two links to top conferences are pretty much inaccurate pictures of the CS conference field (for example, the deakin link puts GECCO as the top conference in one of the major categories, which is basically laughable). And while we might all love wikisym, it from an academic standpoint, it is definitely not a tier one venue.

I cringe to suggest this, but one possible methodology you might follow is to do citation count filtering, using, e.g. google scholar. Citations give you an indicator of whether other researchers have found it useful to draw on. Look at the average citation count of the journal papers, then filter your list of 1500 conference papers down to those papers that have, say, twice the citations as the average citation count of a journal article.

Honestly though, your best methodology would be to have a small group of HCI researchers, a small group of AI researchers, and a small group of database researchers who have worked on wikipedia compile a list of the conference papers that they believe are best representative of the research that that community has done on wikipedia.

Hope that helps, and sorry to hear you still struggling with this issue.

Best,
Travis




On 3/15/11 11:56 AM, Chitu Okoli wrote:
James and Travis, you bring up a point that we have struggled back and
forth with for several months. We really, really would like to include
conference articles, but we just can't see how we could handle many more
articles than what we've got now. We've been working on and off on this
project for over two years now. (You can find works in progress at the
link at the bottom to my website.) We'd like to get it done eventually,
and we can only handle so many articles.