Although there might be links that do not have strong connections, if the articles are written according to Wikipedia guidelines there should be a minimal amount of such links (distractions and noise). Every article consists of words and semantic structures. If we could partition all the articles and analyse the occurrence of these structures statistically we could see a different distribution for every article. If we were proceeding further analysing these distributions, we could notice that there are different types of these distributions and that they can be categorised according to what they have in common. Let us pick up some articles that represent categories, http://en.wikipedia.org/wiki/Mathematicsbeing one of them. Then the system would assign to each article a probability or closeness to a certain category. So for example http://en.wikipedia.org/wiki/Isac_Newton could have 15% Physics, 13% Mathematics, 11% Famous People... There could be numerous methods applied for the analysis - Bayesian probabilities, PageRank and so on or their combinations. Percentages are highly illustrative, relevance can be expressed with more than one dimension as some combination of vectors or functions dependent on some factors (time, location, depth of information span...).
Possible applications: "See also" suggestions Search results Experimental semantic navigation Analysis of scientific papers
wikitech-l@lists.wikimedia.org