Hello!
We recently studied the properties of the English Wikipedia graph and observed that: (1) the graph consists of dense subgraphs (socalled "graph communities") that are in turn less densely connected to each other; (2) Wikipedia articles falling into the same community exhibit more semantic similarity to each other than randomly selected articles.
Encouraged by the above observations, i computed the community hierarchy for the English Wikipedia: http://modis.ispras.ru/wikipedia/ The hierarchy shows the grouping of similar Wikipedia articles into communities, based on purely Wikipedia link information, and reflects the link structure of the Wikipedia graph.
In your opinion, could such data organization be helpful for navigation and finding related information in Wikipedia?
Your feedback is welcome! Dmitry
2009/3/10 Dmitry Lizorkin lizorkin@ispras.ru:
Hello!
We recently studied the properties of the English Wikipedia graph and observed that: (1) the graph consists of dense subgraphs (socalled "graph communities") that are in turn less densely connected to each other; (2) Wikipedia articles falling into the same community exhibit more semantic similarity to each other than randomly selected articles.
Encouraged by the above observations, i computed the community hierarchy for the English Wikipedia: http://modis.ispras.ru/wikipedia/ The hierarchy shows the grouping of similar Wikipedia articles into communities, based on purely Wikipedia link information, and reflects the link structure of the Wikipedia graph.
In your opinion, could such data organization be helpful for navigation and finding related information in Wikipedia?
The idea seems interesting, but your interface doesn't seem to work very well - as far as I can tell, there is no way to get a list of the members (and/or subcommunities) of a community, just search within it. A list would be much more useful.
One use I can see for this software is suggestions for categories. Either suggesting that a particular article be added to a category (because most of the other articles it its community are already members) or suggesting that a new category be created (if there is no category containing a significant number of members of a community). The software would need to take subcategories into account, of course - eg. if two members of a community are in different subcategories of the same category that is probably intentional and the software shouldn't suggest changing it.
On Tue, Mar 10, 2009 at 8:21 AM, Dmitry Lizorkin lizorkin@ispras.ru wrote:
Hello!
We recently studied the properties of the English Wikipedia graph and observed that: (1) the graph consists of dense subgraphs (socalled "graph communities") that are in turn less densely connected to each other; (2) Wikipedia articles falling into the same community exhibit more semantic similarity to each other than randomly selected articles.
Encouraged by the above observations, i computed the community hierarchy for the English Wikipedia: http://modis.ispras.ru/wikipedia/ The hierarchy shows the grouping of similar Wikipedia articles into communities, based on purely Wikipedia link information, and reflects the link structure of the Wikipedia graph.
In your opinion, could such data organization be helpful for navigation and finding related information in Wikipedia?
Your feedback is welcome! Dmitry
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Sounds like it could be used as an automated 'See also' generator - one could write a bot that for each article, generates a list of 'top 10 related links', check to see whether they are all in the article already, and add any missing ones into the See also section (or create a new one).
Even more work would be on-the-fly generation of related links - so it'd be akin to Everything2's soft links (a feature I miss dearly): https://secure.wikimedia.org/wikipedia/en/wiki/Everything2#Soft_links
Thank you for your insightful replies! Automated suggestions for categories and generation of 'See also' & soft links sound like very useful applications of the observed community structure of Wikipedia. I'll study these issues in more detail.
Thank you for your assistance! Dmitry
Sounds like it could be used as an automated 'See also' generator - one could write a bot that for each article, generates a list of 'top 10 related links', check to see whether they are all in the article already, and add any missing ones into the See also section (or create a new one).
Even more work would be on-the-fly generation of related links - so it'd be akin to Everything2's soft links (a feature I miss dearly)
"Dmitry Lizorkin" lizorkin@ispras.ru wrote in message news:11d001c9a17a$c1f4ae90$4cc69553@fiona...
Hello!
We recently studied the properties of the English Wikipedia graph and observed that: (1) the graph consists of dense subgraphs (socalled "graph communities") that are in turn less densely connected to each other; (2) Wikipedia articles falling into the same community exhibit more semantic similarity to each other than randomly selected articles.
Encouraged by the above observations, i computed the community hierarchy for the English Wikipedia: http://modis.ispras.ru/wikipedia/ The hierarchy shows the grouping of similar Wikipedia articles into communities, based on purely Wikipedia link information, and reflects the link structure of the Wikipedia graph.
In your opinion, could such data organization be helpful for navigation and finding related information in Wikipedia?
Your feedback is welcome! Dmitry
I was thinking that Roget (Rohzhay), whose work I hav seen for free, could do a lot to help with category sujestions. His "jeneral categories of words, then smaller categories, and then synonyms with antonyms" was an immense and long-winded stroke of organization. It needed an index at the back for the word you could think of. I did not like revisions I found in 1998. I liked the one I had in 1987. _______ http://tinyurl.com/BlakDog (Revised today. One pause had to be longer)