On Wed, Jul 21, 2010 at 5:49 AM, Finn Aarup Nielsen <fn@imm.dtu.dk> wrote:


On Wed, 21 Jul 2010, Jodi Schneider wrote:

On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
Kang+Hsu+Krajbich+2009+the+wick+in

This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.

While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)

There are still going to be duplicates, alas...


Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.

It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
etc. In case of chinese names, it's often not easy to decide which part is the
last name.

I have a large bibtex file where I (mostly) use Surname + one initial + year + first important word (http://neuro.imm.dtu.dk/software/lyngby/doc/lyngby.bib)

So for example: AaltoS2002Neuroanatomical

There are lots of special cases

"M. C. B. {\AA}berg" becomes AbergM2006Multivariate (transliterate Å)

"Anissa Abi-Dargham" AbiDarghamA2000Measurement (discard dash).

ACM computer classification system "ACM1998Computing" (an organization as an author: do you use 'association' or 'ACM'?)

"A Content-Driven Reputation System for the {Wikipedia}" ->
AdlerB2007ContentDriven (discarding slash in title and camelcasing)

"$[^{15}$O$]$water {PET}: More ``Noise'' than Signal?" -> StrotherS1996Owater (here we have sharp parentheses that will be a problem in wiki text. I suppose that in chemistry it becomes even worse)

"On the Distribution of the Quotient of two chance variables" becomes CurtissJ1941On (as 'On' here is not regarded as a stopword).

Modelling the fMRI response using smooth FIR filters -> NielsenF2001ModelingfMRI (extra word because of collision with "Modeling of locations in the {BrainMap} database: Detection of outliers"

With 3 author + year + title you sometimes run into collisions:

 author =       {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
 title =        {Separating Processes within a Trial in Event-Related
                 Functional {MRI}. {II}. Analysis},

 author =       {J. M. Ollinger and Gordon L. Shulman and M. Corbetta},
 title =        {Separating Processes within a Trial in Event-Related
                 Functional {MRI}. {I}. The Method},


When dealing with scientific articles it is not always possible to use the full given name, since sometimes you just know the initial.

I know one called Vibe Frøkjær. Presumable because she is afraid the PubMed and others will not be able to handle the Nordic letters she writes her name as Vibe G. Frokjaer in science contexts. Other authors may write her as Vibe G. Frøkjær.


Articles usually one have one edition. Sometimes you find reprinted versions here and there. For books there might be different versions and you need to find out whether you want to have the key to the 'Work', 'Expression', 'Manifestation' or 'Item' to use the wording from

http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records

The French Wikipedia has a page for each book title ('work' regardless of language and editions). Editions are listed with multiple infoboxes on the page. In this way there is not a one-to-one correspondence between wiki page and, say, ISBN. It seems the best to me to have one page for a 'work' where you collect comments. However, in citations with page numbers you need the 'expression' because of page break differences between versions.

I like the French way, except that each book has two pages: One under the 'Reference' namespace and another under the 'Template' namespace.

The French tend to use "Title (authors)" as key in the Reference namespace. Mostly fullname:

http://fr.wikipedia.org/wiki/Référence:Weaving_the_Web_(Tim_Berners-Lee)

But sometimes diverge a bit:

http://fr.wikipedia.org/wiki/Référence:Theory_of_numbers_(HardyWright)

The associated template has somewhat unpredictable name, e.g.,

http://fr.wikipedia.org/wiki/Modèle:HardyWright

They link in the template instatiations, e.g., "auteurs=[[Tim Berners-Lee]], Mark Fischetti" which I still don't like and would instead suggest:

author1=Tim Berners-Lee | author2=Mark Fischetti and templates [[{{{author1}}}]], [[{{{author1}}}]] or perhaps better for disambig [[{{authorlink1}}}|{{{author1}}}]], [[{{{authorlink2|{{{author2}}}]] This way you allow for easier extraction and you do not need SMW array processing to distinguish the names.

It seems to me that the French has come a long way. I am surprised that only John Vandenberg has pointed to the French efforts. I was not aware of it before.

Do anyone knows anything about the French discussions on the introduction of the 'Reference' namespace? Should we just implement the French system on the English Wikipedia and we are there?

/Finn

 Finn,

I'm not a fan of including a portion of the the title for a couple of reasons. First, it's not required to make the key unique. Second, it makes the key longer than necessary. Third, the first word or words from a title are not guaranteed to convey any meaning.

Regarding a Reference: namespace, I can see how this has some utility and why projects have moved to it. However, I consider it a stopgap solution that projects have implemented when what they really want is a proper wiki for citations. Here are a few quick things that you can't do (or would have to go out of your way to do) with just a Reference namespace that you can do with a wiki dedicated to all the world's citations:

- Custom reports that are boolean combinations of citation fields, ala SMW. This requires substantive new technology as SMW doesn't scale.
- User bibliographies which are a logical subset of all literature ever published.
- Conduct a search of the literature.
- A new set of policies that are not necessarily NPOV, regarding the creation of articles that discuss collections of literature (lit review-like concept). The content of these policies will emerge over years with the help of a community. These articles could, for instance, help people who are navigating a new area of a literature avoid getting stuck in local minima. It could point out the true global context to them. It could point out experimenter biases in the literature; for example, a recent article was published where it was found that citation networks in academic literature can have a tendency to form based on the assumption of authority, when in fact that authority is false, bringing a whole thread of publications into doubt.
- Create wiki articles about individual sources.

While I am not dedicated to any of these things happening, I also do not wish to rule them out. The hope is that a new community will emerge around the project and guide it in the direction that is most useful. My hope in this thread is that we can identify some of the most likely cases and imagine what it will be like, so that we can convey this vision to the Foundation and they can get a sense of the potential importance of the project.

Brian