-----BEGIN PGP SIGNED MESSAGE-----
Moin,
On Monday 19 December 2005 01:35, Lars Aronsson wrote:
For Google-style page ranking, it is supposedly
important to have
links from one page to another. If the word "Colombia" is
mentioned in the article about "Bogota" but not linked, this
relationship will be missed in the ranking. One way to avoid such
misses would be for a robot to take the list of article titles and
search for their occurance in the text body of all articles, and
insert brackets where they are missing.
No, I don't suggest that such a robot should be used in Wikipedia.
For one thing, we do have articles about many common words and for
every year in history, but it would not make sense to make a link
for every mentioning of a year or such common words.
What I would like to ask is whether this kind of text mining is
common and has a name? So this is more of a general question
about information retrieval (IR) in large text corpuses than about
Wikipedia. Are there arithmetic rules for when such links should
be avoided?
One place where such automatic linking could be interesting is a
scanned paper encyclopedia, where no links exist beforehand, e.g.
http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work
I used a technique for that for
http://search.cpan.org/~tels/Convert-Wiki-0.05/
which can be used to convert READMEs into wikitext. There are a frew rules
like "dont link to the same article twice in a paragraph", and you can
supply a list of terms you want it to link. However, it is a hack, so any
insight into formal rules or techniques would be of interest to me.
Best wishes,
Tels
- --
Signed on Mon Dec 19 18:51:35 2005 with key 0x93B84C15.
Visit my photo gallery at
http://bloodgate.com/photos/
PGP key on
http://bloodgate.com/tels.asc or per email.
"Retsina?" - "Ja, Papa?" - "Rasenmähen." - "Is gut,
Papa."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iQEVAwUBQ6bzl3cLPEOTuEwVAQEbNgf+OlePwZIJsaAXv/LMhjFqo5mjESQEBaNQ
s/ehk7s5gDPyb2jgES6xPVArzZwd2XAm2x75qq4uHtyPe/KUEMpyWpZw5HKQqXu0
ph4vVM/3Wfv2rF4SWgfO0wq5miRBLyKykQfxPYkVcPLXSjZ4mo6xfGIXAscIM8Qi
x+ppWBntCmlFC2k12gOi3sSivvByRVi7d0rSrZMQFxVrCvjXJHEcvWyO2A42YzFi
qRc1I5pqGP+DwoAaVDNt+JlE+RZqcJCoH3rk5CR6SDD5RxeEbjowZ6cJwzAFhGOf
c/04Kv67DTdp16erYRWmuBvhFKvDNxQANc8TcpyUlumPo59P2H/B5Q==
=10Sk
-----END PGP SIGNATURE-----