Hi Mara, 

since you were asking about ontologies, let me point you to our work on computational fact checking from knowledge networks PLoS ONE. We developed a measure of semantic similarity based on shortest paths between any two concepts of Wikipedia using the linked data from DBPedia; these the are links found in the infoboxes of Wikipedia articles; so it is a subset of the hyperlinks of the whole web page. 

In the article we use it as a way to check simple relational statements, but it could be used for other uses too. And there are also a couple other approaches from the literature, which we cite in the paper, that could also be relevant for what you are doing.

HTH!

Giovanni 


Giovanni Luca Ciampaglia  Assistant Research Scientist, Indiana University


On Sun, Feb 19, 2017 at 2:56 PM, Mara Sorella <sorella@dis.uniroma1.it> wrote:
Hi everybody, I'm new to the list and have been referred here by a comment from a SO user as per my question [1], that I'm quoting next:


I have been successfully able to use the Wikipedia pagelinks SQL dump to obtain hyperlinks between Wikipedia pages for a specific revision time.

However, there are cases where multiple instances of such links exist, e.g. the very same https://en.wikipedia.org/wiki/Wikipedia page and https://en.wikipedia.org/wiki/Wikimedia_Foundation. I'm interested to find number of links between pairs of pages for a specific revision.

Ideal solutions would involve dump files other than pagelinks (which I'm not aware of), or using the MediaWiki API.



To elaborate, I need this information to weight (almost) every hyperlink between article pages (that is, in NS0), that was present in a specific wikipedia revision (end of 2015), therefore, I would prefer not to follow the solution suggested by the SO user, that would be rather impractical.
 
Indeed, my final aim is to use this weight in a thresholding fashion to sparsify the wikipedia graph (that due to the short diameter is more or less a giant connected component), in a way that should reflect the "relatedness" of the linked pages (where relatedness is not intended as strictly semantic, but at a higher "concept" level, if I may say so). 
For this reason, other suggestions on how determine such weights (possibly using other data sources -- ontologies?) are more than welcome.

The graph will be used as dataset to test an event tracking algorithm I am doing research on.


Thanks,


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l