Le 20/02/2016 02:13, Jon Katz a écrit :
Also, even without boost links, there seems to be a bias towards popular (long pages). it seems that a focus on # of words in common rather than % is one of the things leading to long articles seeing so much more traction - would this be an easy thing to test as well?
Hi,
you're right but I think it's because of the boost templates feature which is enabled even when boostlinks is not: on enwiki few templates are configured in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates which means that a featured article will be overboosted.
We could fine tune the core more like algorithm with various params but today I think that the rescore features (boostlinks, boost-templates) is what have the most impact.
To sum up, 2 types of score are combined when ranking articles: - A score that computes the similarity between documents, this can be fine-tuned[1] - A score (we call it "rescore") that uses article metadata: boostlinks, templates.
The way these scores are combined can be configured with a rescore profile, but today it's a product of all the scores, e.g.
morelike:A_Summer_Bird-Cage
The score for "I Know Why the Caged Bird Sings" with boost links is: - similarity: 0.3457441 (terms chosen: "from", "cage", "bird") - boostlinks: 2.807535 - boost-templates: 2 - total: 0.3457441 * 2.807535 * 2 => 1.9413773
[1]: https://www.mediawiki.org/wiki/Help:CirrusSearch#morelike: