Hi Mara,
since you were asking about ontologies, let me point you to our work
on computational
fact checking from knowledge networks PLoS ONE
<http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128193>.
We developed a measure of semantic similarity based on shortest paths
between any two concepts of Wikipedia using the linked data from DBPedia;
these the are links found in the infoboxes of Wikipedia articles; so it is
a subset of the hyperlinks of the whole web page.
In the article we use it as a way to check simple relational statements,
but it could be used for other uses too. And there are also a couple other
approaches from the literature, which we cite in the paper, that could also
be relevant for what you are doing.
HTH!
Giovanni
Giovanni Luca Ciampaglia <http://glciampaglia.com> *∙* Assistant Research
Scientist, Indiana University
On Sun, Feb 19, 2017 at 2:56 PM, Mara Sorella <sorella(a)dis.uniroma1.it>
wrote:
Hi everybody, I'm new to the list and have been
referred here by a comment
from a SO user as per my question [1], that I'm quoting next:
I
* have been successfully able to use the Wikipedia pagelinks SQL dump to
obtain hyperlinks between Wikipedia pages for a specific revision
time.However, there are cases where multiple instances of such links exist,
e.g. the very same
https://en.wikipedia.org/wiki/Wikipedia
<https://en.wikipedia.org/wiki/Wikipedia> page and
https://en.wikipedia.org/wiki/Wikimedia_Foundation
<https://en.wikipedia.org/wiki/Wikimedia_Foundation>. I'm interested to
find number of links between pairs of pages for a specific revision. Ideal
solutions would involve dump files other than pagelinks (which I'm not
aware of), or using the MediaWiki API.*
To elaborate, I need this information to weight (almost) every hyperlink
between article pages (that is, in NS0), that was present in a specific
wikipedia revision (end of 2015), therefore, I would prefer not to follow
the solution suggested by the SO user, that would be rather impractical.
Indeed, my final aim is to use this weight in a thresholding fashion to
sparsify the wikipedia graph (that due to the short diameter is more or
less a giant connected component), in a way that should reflect the
"relatedness" of the linked pages (where relatedness is not intended as
strictly semantic, but at a higher "concept" level, if I may say so).
For this reason, other suggestions on how determine such weights (possibly
using other data sources -- ontologies?) are more than welcome.
The graph will be used as dataset to test an event tracking algorithm I am
doing research on.
Thanks,
Mara
[1]
http://stackoverflow.com/questions/42277773/number-of-
links-between-two-wikipedia-pages/
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l