[WikiEN-l] Fwd: [Foundation-l] Extensive Link Errors related to Proper Names - Needs Fixing

David Gerard dgerard at gmail.com
Sun Apr 1 17:48:30 UTC 2007


Jeff Merkey takes care not to post to en:wp or its mailing lists, but
he just posted this to foundation-l after analysing an en:wp data
dump.


- d.




---------- Forwarded message ----------
From: Jeffrey V. Merkey <jmerkey at wolfmountaingroup.com>
Date: 01-Apr-2007 06:23
Subject: [Foundation-l] Extensive Link Errors related to Proper Names
- Needs Fixing
To: Wikimedia developers <wikitech-l at lists.wikimedia.org>, Wikimedia
Foundation Mailing List <foundation-l at lists.wikimedia.org>



I have been compiling a machine compiled lexicon created from link and
disambiguation pages from the XML dumps.  Oddly, the associations
contained in [[ARTICLE_NAME | NAME]] form a comprehesive "real time"
thesauraus of common associations used by current English Speakers in
Wikipedia, and perhaps comprise the worlds largest and most comprehesive
Thesaurus on the planet emedded within the mesh of these links within
the dumps.

While going through the dumps and constructing associative link maps of
all these expressions, I have noticed a serious issue with embdded
linking with proper names.  It appears there may be a robot running
somewhere that is associating Proper Names listed in articles about
relationships between people
by linking blindly to any entry in Wikipedia that matches a name in an
article.

Some of the content may create controversy to post examples here, so I
will complete the thesaurus compilation, and folks should go through the
encyclopedia.  Articles about movies stars and other "gossipy" type
articles seem to have the highest errors linking proper names to
unrelated people without proper disambiguation pages.  It could be
interpreted as violations of WP:BLP and some of the error linkages could
be troublesome for the foundation.

Whomever is running bots that link between articles should look at
proper name links based on categories and check into this.  I found a
large number of these types of errors.  They are subtle, but will most
probably show up when browsing through articles unless you can analyze
the link targets and relationships in the dumps.

Jeff



_______________________________________________
foundation-l mailing list
foundation-l at lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/foundation-l



More information about the WikiEN-l mailing list