Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

22 Jul 2010

      On Wed, Jul 21, 2010 at 2:42 AM, Daniel Kinzler daniel@brightbyte.dewrote:
...
...
...

The first three author names separated by slashes

why not separate by pluses? they don't form part of names either, and
don't cause problems with wiki page titles.
I like this... however, how would you represent this in a URL? Also note
that
using plusses in page names don't work with all server configurations,
since
plus has a special meaning in URLs.
...
...

Some or all of the date. For instance, if there is only one source by

this set of authors that year, we can just use YYYY. However, once
another
...
...
source by those set of authors is added, the key should change to
MMDDYYYY
...
...
or similar.
I don't think it is a good idea to change one key as a function of
updates on another, except for a generic disambiguation tag.
I agree. And if you *have* to use the full date, use YYYYMMDD, not the
other way
around, please.
...
...
Since the slashes are somewhat cumbersome, perhaps we can not make them
mandatory, but similarly use them only when they are necessary in order
to
...
...
"escape" a name. In the case that one of the authors does not have a
slash
...
...
in their name - the dominant case - we can stick to the easily legible
and
...
...
niecly compact CamelCase format.
Example keys generated by this algorithm:
KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in
or
Kang+Hsu+Krajbich+2009+twi
Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.
...
Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.
It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even
"VoB",
etc. In case of chinese names, it's often not easy to decide which part is
the
last name.
To avoid this kind of ambiguity, i suggest to automatically apply some type
of
normalization and/or hashing. There is quite a bit of research about this
kind
of normalisation out there, generally with the aim of detecting duplicates.
Perhaps we can learn from bibsonomy.org, have a look how they do it:
http://www.bibsonomy.org/help/doc/inside.html.
Gotta love open source university research projects :)
-- daniel
Hey Daniel,
Bibsonomy seems to suffer from the same problem as CiteULike - urls which
convey no meaning. An example url id from CiteULike is 2434335, and one from
Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to continue to steer
the conversation away from that direction. These IDs guarantee uniqueness,
but I believe that we can create keys that both guarantee uniqueness and
convey some meaning to humans. Consider that this key will be embedded in
wiki articles any time a source is cited. It's important that it make some
sense.
Plus signs and slashes in the key appear to be cumbersome. Perhaps we can
avoid this by truncating last names that involve a slash to either the
portion before or after the slash.
Changing the key seems to be a bad idea, so we want a key system that is
unique from the start. That means we should use the full date, YYYYMMDD as
suggested by Daniel.
In the event that multiple sources are published by the same set of authors
on the same day, we can use a, b, c disambiguation.
This gives us the following key, guaranteed to be unique:
KangHsuKrajbich20091011b
Brian

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"