I want to do an analysis of wikipedia
I don't know what has been done before and I was hoping to get feedback as to what has been done before.
what I want to do:
Each page on wikipedia would be treated as a hyperlink, which would be treated as an object, and put on a grid.
an arrow would be drawn between two hyperlinks if one referenced the other.
The hyperlinks would be organized such that there would be the smallest amount of arrow line length between hyperlinks. (I figure that this would give a crude way of organizing how the pages connected to each other)
The power of representing pages as hyperlinks and organizing them as objects on a grid is that you can then get a sense of what an idea means by how it relates to other ideas. I figure more important ideas would go towards the center.
You could also search for new ideas by looking at the grid (as opposed to knowing what you want and typing it in)
If anyone has any suggestions, and if anyone knows if anything like this has been done before please comment.
Oh and if anyone has suggestions on the best way to make the database of hyperlinks that reference each other (spidering all of wikipedia, or is there a better way to do it?)
Sylvan
2008/6/4 Sylvan Arevalo khakiducks@gmail.com:
Oh and if anyone has suggestions on the best way to make the database of hyperlinks that reference each other (spidering all of wikipedia, or is there a better way to do it?)
Spidering is bad!
(It's both time-consuming for you and very annoying for us)
You can get the dataset you're looking for via dumps.wikimedia.org - you want the enwiki pagelinks.sql.gz file, I believe. Not entirely sure what you'd do with it after that, but it ought to have the data you're looking for in a suitably stripped-down form.
You may be interested in the RDF data source that dbpedia provides for these intrawiki links.
http://wiki.dbpedia.org/Downloads30#pagelinks
Cheers,
Peter
2008/6/4 Sylvan Arevalo khakiducks@gmail.com:
I want to do an analysis of wikipedia
I don't know what has been done before and I was hoping to get feedback as to what has been done before.
what I want to do:
Each page on wikipedia would be treated as a hyperlink, which would be treated as an object, and put on a grid.
an arrow would be drawn between two hyperlinks if one referenced the other.
The hyperlinks would be organized such that there would be the smallest amount of arrow line length between hyperlinks. (I figure that this would give a crude way of organizing how the pages connected to each other)
The power of representing pages as hyperlinks and organizing them as objects on a grid is that you can then get a sense of what an idea means by how it relates to other ideas. I figure more important ideas would go towards the center.
You could also search for new ideas by looking at the grid (as opposed to knowing what you want and typing it in)
If anyone has any suggestions, and if anyone knows if anything like this has been done before please comment.
Oh and if anyone has suggestions on the best way to make the database of hyperlinks that reference each other (spidering all of wikipedia, or is there a better way to do it?)
Sylvan
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
2008/6/4 Sylvan Arevalo khakiducks@gmail.com:
I want to do an analysis of wikipedia
[...]
I figure more important ideas would go towards the center.
Well the article 2007 is the center of wikipedia.
/Martin
You might also want to look at this: http://wikip.blogspot.com/2005/12/good-data-visualizations-would-be.html
On Sat, Jun 7, 2008 at 3:49 PM, Martin Møller Skarbiniks Pedersen traxplayer@gmail.com wrote:
2008/6/4 Sylvan Arevalo khakiducks@gmail.com:
I want to do an analysis of wikipedia
[...]
I figure more important ideas would go towards the center.
Well the article 2007 is the center of wikipedia.
/Martin
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l