On Tue, Jul 20, 2010 at 9:26 PM, Brian J Mingus Brian.Mingus@colorado.edu wrote:
I like your suggestion that the abc disambiguator be chosen based on the first date of publication, and I also like the prospect of using slashes since they can't be contained in names. Using the full year is a good idea too. We can combine these to come up with a key that, in principle, is guaranteed to be unique. This key would contain:
- The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and don't cause problems with wiki page titles.
- If there are more than three authors, an EtAl
don't think that's necessary if we get the abc part right.
- Some or all of the date. For instance, if there is only one source by
this set of authors that year, we can just use YYYY. However, once another source by those set of authors is added, the key should change to MMDDYYYY or similar.
I don't think it is a good idea to change one key as a function of updates on another, except for a generic disambiguation tag.
If there are multiple publications on the same day, we can resort to abc. Redirects and disambiguation pages can be set up when a key changes.
As Jodi pointed out already, the exact date is often not clearly identifiable, so I would go simply for the year. Instead of an alphabetic abc, one could use some function of the article title (e.g. the first three words thereof, or the initials of the first three words), always in lower case.
An even less ambiguous abc would be starting page (for printed stuff) or article number (for online only) but this brings us back to the 7523225 problem you mentioned above.
Since the slashes are somewhat cumbersome, perhaps we can not make them mandatory, but similarly use them only when they are necessary in order to "escape" a name. In the case that one of the authors does not have a slash in their name - the dominant case - we can stick to the easily legible and niecly compact CamelCase format.
Example keys generated by this algorithm:
KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in or Kang+Hsu+Krajbich+2009+twi
also note that the CamelCase key does not yield results in a google search, whereas the first plused variant brings up the right work correctly, while the plused one with initialed title tends to bring at least something written by or cited from these authors.
Author1Author2/Author-Three/2009
Author1+Author2+Author-Three+2009+just+another+article or Author1+Author2+Author-Three+2009+jat
Of course, it does not have to be _exactly_ three authors, nor three words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
Daniel