Hi Travis,
I thought that was you when I read your post; yes, we did indeed
talk. Actually, it was after our talk that I went through
extensive searching to find what is considered top-tier in
computer science. Here are brief comments I should have included
earlier explaining how I came up with the three sources of
computer science "high quality" conferences:
* Top Tier and 2nd tier conferences from
http://webdocs.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html:
In extensive searching for computer science conference rankings,
this is the absolute best I could find, and most other rankings I
found have either referred to or copied from this list.
* A-ranked conferences in Information and Computing Sciences from
http://lamp.infosys.deakin.edu.au/era/?page=cforsel10:
This is the most exhaustive journal ranking exercise I have ever
found anywhere. Unfortunately, I like you have serious questions
about the face validity of these rankings; I think they heavily
overrate many conferences in my own field of information systems; I
assume the same is true with other fields that I don't know so well.
(My primary reservation with conference or journal rankings by
professors is that I strongly suspect that one of the main criteria
for their rankings is whether or not they have published in that
outlet before.) Unfortunately, I don't know of anything that
approaches this ranking in comprehensiveness.
* We also considered including all WikiSym articles on Wikipedia:
This is not because of any statement of WikiSym's quality, but
simply because WikiSym is probably the closest thing that exists to
an academic conference specifically for Wikipedia-related research.
Is there no widely-accepted listing of computer science conference
rankings? You say, "Everyone in my field (HCI) pretty much knows
what the first tier conferences are where wikipedia research is
published." The problem is that I could say the same thing about my
field, but another researcher would have a different list. There is
generally consensus about the top two or three in any field, but the
huge grey zone comes when you try to draw a line. Even your idea of
getting small groups of experts to validate a number of conferences
is pretty shaky, since another small group of experts would almost
definitely give different results.
Citation counts are always a sticky issue; they depend mainly on
indexing by citation count databases and recency of articles.
However, I do consider them one of the most objective (not
necessarily one of the best, but one of the most objective) criteria
for paper quality. Based on your suggestion, I just now discovered
that ACM Digital Library includes citation counts for conference
papers. By way of brainstorming, I'm thinking of this possible
inclusion rule:
* Calculate (a) the average citation count for Wikipedia articles
(either only journal average, only conference average, or average of
both), and (b) average citation count for each journal and/or
conference that publishes Wikipedia research. (b) is basically (a)
grouped by journal/conference.
* Rather than doing raw citation counts, we could try to calculate
citations per year or some other weighting that recognizes that more
recent articles would have fewer citations than older ones.
* Include all conference papers greater than the average (whichever
average we choose) and/or include conference papers from all
conferences greater than the average. Or we could include all
conference papers whose average citations per year are greater than
the average for journal articles. Or just include the top 100 ranked
conference papers, or however many we can handle.
Although still somewhat artificial, this could give possibly give us
a somewhat objective basis to filter up the "higher quality"
conference papers based on citation analysis.
I don't know if I'm trying to go far with this citation count
possibility, but what do you all think?
Thanks again,
Chitu
Hey
there,
I sympathize with your dilemma...and I think we might have
actually talked about this at Wikimania 2009. Unfortunately, while
you may be satisfied that 600 journal articles + theses is enough
(I certainly would be too), you should be equipped to recognize
that if you keep it that way you are systematically excluding
large, significant bodies of research deriving from computer
science and HCI. As you make this choice, read through one or two
of these conference papers and measure it against the quality of a
randomly selected set of journal articles in your set:
-
http://dub.washington.edu/djangosite/media/papers/tmpZ77p1r.pdf
-
http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1485/1841
-
http://www.cs.cornell.edu/~danco/research/papers/suggestbot-iui07.pdf
- http://users.soe.ucsc.edu/~luca/papers/07/wikiwww2007.pdf
- http://portal.acm.org/citation.cfm?id18928
I bet that these conference papers are on the balance of higher
quality than a random journal article in your set.
Unfortunately, there isn't a good answer for the best methods to
follow. Everyone in my field (HCI) pretty much knows what the
first tier conferences are where wikipedia research is published:
CHI, CSCW, and UIST; and second tier at GROUP. These are all under
the ACM SIGCHI banner (http://www.sigchi.org/). Another way to put
this is that there are no objective measures, its a question of
what the researchers themselves see as high quality. Ultimately,
this is the same as with journals, although they tend to have
impact factors. If I were to estimate how many high quality
conference papers from the HCI angle there are, I would put it at
about 20-30.
Of course, this is only for HCI research, not all CS research.
Conferences such as WWW have published excellent research on
Wikipedia, such as the initial paper out of the WikiTrust group,
which, if you've been around the wiki community, know that they
have had a big impact. WWW is considered to be a high quality CS
conference. Likewise, there has been Wiki research published at
database and AI conferences. For example, the Intelligence in
Wikipedia project (summarized here
http://portal.acm.org/citation.cfm?id20344).
Unfortunately, your two links to top conferences are pretty much
inaccurate pictures of the CS conference field (for example, the
deakin link puts GECCO as the top conference in one of the major
categories, which is basically laughable). And while we might all
love wikisym, it from an academic standpoint, it is definitely not
a tier one venue.
I cringe to suggest this, but one possible methodology you might
follow is to do citation count filtering, using, e.g. google
scholar. Citations give you an indicator of whether other
researchers have found it useful to draw on. Look at the average
citation count of the journal papers, then filter your list of
1500 conference papers down to those papers that have, say, twice
the citations as the average citation count of a journal article.
Honestly though, your best methodology would be to have a small
group of HCI researchers, a small group of AI researchers, and a
small group of database researchers who have worked on wikipedia
compile a list of the conference papers that they believe are best
representative of the research that that community has done on
wikipedia.
Hope that helps, and sorry to hear you still struggling with this
issue.
Best,
Travis
On 3/15/11 11:56 AM, Chitu Okoli wrote:
James
and Travis, you bring up a point that we have struggled back and
forth with for several months. We really, really would like to
include
conference articles, but we just can't see how we could handle
many more
articles than what we've got now. We've been working on and off
on this
project for over two years now. (You can find works in progress
at the
link at the bottom to my website.) We'd like to get it done
eventually,
and we can only handle so many articles.