Re: [Wiki-research-l] Content similarity between two Wikipedia articles

4 May 2019

Hi Haifeng,

Yes, you might want to look into some of the work done by Hecht et al. on
content similarity between languages, as well as work by Sen et al. on
semantic relatedness algorithms (which are implemented in the WikiBrain
framework <http://wikibrainapi.org/>, by the way, see reference below).
Some paper to start out with could be:

   - Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle,
D. "Omnipedia:
   Bridging the Wikipedia Language Gap
   <http://www.brenthecht.com/publications/bhecht_CHI2012_omnipedia.pdf>"
   CHI 2012
   - Hecht, B. and Gergle, D. "The Tower of Babel Meets Web 2.0:
   User-Generated Content and Its Applications in a Multilingual Context
   <http://www.brenthecht.com/publications/bhecht_chi2010_towerofbabel.pdf>"
   CHI 2010
   - Shilad Sen, Anja Beth Swoap, Qisheng Li, Brooke Boatman, Ilse
   Dippenaar, Rebecca Gold, Monica Ngo, Sarah Pujol, Bret Jackson, Brent Hecht
   "Cartograph: Unlocking Spatial Visualization Through Semantic Enhancement
   <http://www.shilad.com/static/cartograph-iui-2017-final.pdf>" IUI 2017
   - Sen, Shilad; Johnson, Isaac; Harper, Rebecca; Mai, Huy; Horlbeck
   Olsen, Samuel; Mathers, Benjamin; Souza Vonessen, Laura; Wright, Matthew;
   Hecht, Brent "Towards Domain-Specific Semantic Relatedness: A Case Study
   in Geography <http://ijcai.org/papers15/Papers/IJCAI15-334.pdf>" IJCAI,
   2015
   - Sen, Shilad; Lesicko, Matthew; Giesel, Margaret; Gold, Rebecca;
   Hillmann, Benjamin; Naden, Samuel; Russell, Jesse; Wang, Zixiao "Ken";
   Hecht, Brent "Turkers, Scholars, "Arafat" and "Peace":
Cultural
   Communities and Algorithmic Gold Standards
   <http://www-users.cs.umn.edu/~bhecht/publications/goldstandards_CSCW2015.pdf>
   "
   - Sen, Shilad; Li, Toby Jia-Jun; Lesicko, Matthew; Weiland, Ari; Gold,
   Rebecca; Li, Yulun; Hillmann, Benjamin; Hecht, Brent "WikiBrain:
   Democratizing computation on Wikipedia

<http://www-users.cs.umn.edu/~bhecht/publications/WikiBrain-WikiSym2014.pdf>"
   OpenSym 2014

You can of course also utilize similarity measures from the recommender
systems and information retrieval fields, e.g. use edit histories to
identify articles who have been edited by the same users, or apply search
engine techniques like TF/IDF and content vectors.

Cheers,
Morten

On Sat, 4 May 2019 at 04:48, Haifeng Zhang &lt;haifeng1(a)andrew.cmu.edu&gt; wrote:

...
  Dear folks,

 Is there a way to compute content similarity between two Wikipedia
 articles?

 For example, I can think of representing each article as a vector of
 likelihoods over possible topics.

 But, I wonder there are other work people have already explored in the
 past.

 Thanks,

 Haifeng
 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Content similarity between two Wikipedia articles