Dear Haifeng,
Would you not be able to use ordinary information retrieval techniques such as bag-of-words/phrases and tfidf? Explicit semantic analysis (ESA) uses this approach (though its primary focus is word semantic similarity).
There are a few papers for ESA: https://tools.wmflabs.org/scholia/topic/Q5421270
I have also used it in "Open semantic analysis: The case of word level semantics in Danish" http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/7029/pdf/imm7029.pd...
Finn Årup Nielsen http://people.compute.dtu.dk/faan/
On 04/05/2019 13:47, Haifeng Zhang wrote:
Dear folks,
Is there a way to compute content similarity between two Wikipedia articles?
For example, I can think of representing each article as a vector of likelihoods over possible topics.
But, I wonder there are other work people have already explored in the past.
Thanks,
Haifeng _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l