On Tue, Jul 20, 2010 at 9:26 PM, Brian J Mingus
I like your suggestion that the abc disambiguator be
chosen based on the
first date of publication, and I also like the prospect of using slashes
since they can't be contained in names. Using the full year is a good idea
too. We can combine these to come up with a key that, in principle, is
guaranteed to be unique. This key would contain:
1) The first three author names separated by slashes
why not separate by pluses?
they don't form part of names either, and
don't cause problems with wiki page titles.
2) If there are more than three authors, an EtAl
don't think that's necessary if we get the abc part right.
3) Some or all of the date. For instance, if there is
only one source by
this set of authors that year, we can just use YYYY. However, once another
source by those set of authors is added, the key should change to MMDDYYYY
I don't think it is a good idea to change one key as a function of
updates on another, except for a generic disambiguation tag.
If there are multiple publications on the same day, we
resort to abc. Redirects and disambiguation pages can be set up when a key
As Jodi pointed out already, the exact date is often not clearly
identifiable, so I would go simply for the year.
Instead of an alphabetic abc, one could use some function of the
article title (e.g. the first three words thereof, or the initials of
the first three words), always in lower case.
An even less ambiguous abc would be starting page (for printed stuff)
or article number (for online only) but this brings us back to the
7523225 problem you mentioned above.
Since the slashes are somewhat cumbersome, perhaps we
can not make them
mandatory, but similarly use them only when they are necessary in order to
"escape" a name. In the case that one of the authors does not have a slash
in their name - the dominant case - we can stick to the easily legible and
niecly compact CamelCase format.
Example keys generated by this algorithm:
also note that the CamelCase key does not yield results in a google
search, whereas the first plused variant brings up the right work
correctly, while the plused one with initialed title tends to bring at
least something written by or cited from these authors.
Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng