There is another similar article where they tested a different search engine: http://www.searchtechnologies.com/querying-indexing-cloudsearch

Some takeaways:

* Considers longer articles more important

* Considers shorter titles more important (aka Germany vs List of German Corps in World War II)

* Some hand tweaking ended up with the formula: text_relevance + 40.0*log10(content_size) - 15.0*log10(title_size)
* defined a per-document boost from 0 to 10 based on which namespace something belongs to.
* tweaked formula into: ext_relevance + (log10(content_size)*(doc_boost == 1 ? 25.0 : 40.0)) - (log10(title_size)*15)

On Thu, Jul 7, 2016 at 10:29 AM, Erik Bernhardson <ebernhardson@wikimedia.org> wrote:

Semi interesting post from Search Technologies (aka Paul Score) about indexing wikipedia data: http://www.searchtechnologies.com/wikipedia-azure-search

Takeaways:
* Automated entity detection, categorizing into person/place/organization
* Offers search facets by wikipedia category and by entity detection
* Multiple scoring profiles offered which change the weight between title and description (content? not clear)