2009/11/16 Gregory Maxwell gmaxwell@gmail.com:
For a while I thought it was just extracting text beginning at "Searchterm is something" looking backwards from the end of the article, but it seems to be more than that. Some older examples where I've seen this now seem to be returning different results, I don't know if its a timing thing or just chance.
I think it's an unfortunate collision of your search terms and Wikipedia's preferred vocabulary. We've long standardised on "film" not "movie", so any incidence of the latter is likely to be in direct quotes, and is very *unlikely* to be in the lead section. Direct quotes tend to be reviews, pro or con, so when the algorithm tries to find extracts showing as many search terms as possible, it ends up apparently cherry-picking these.
The effect becomes clearer when we compare the results using "film" instead of "movie".
[the box film wikipedia]
The Box (2009 film) - Wikipedia, the free encyclopedia "The Box is a 2009 science fiction horror film based on the 1970 short story "Button, Button" by Richard Matheson, which was previously adapted into an ..."
[the box movie wikipedia]
The Box (2009 film) - Wikipedia, the free encyclopedia "And it's so rare that a movie is an F. I mean, if it's an F, it shouldn't even be released." On the topic of the negative reaction to The Box, Mintz blamed ..."
Note that the first has all its keywords in the header, so it shows the first line (which contains three of them anyway). The second has all the keywords *except* 'movie', so it looks for an extract specifically using that word. (I don't know how it chooses that extract, though)
We can test this by using a different keyword and seeing how it builds the extract:
[the box mintz wikipedia]
The Box (2009 film) - Wikipedia, the free encyclopedia "On the topic of the negative reaction to The Box, Mintz blamed the film's ending and was quoted as saying "People really thought this was a stinker". ..."
Might this explain the effect? No idea how to *solve* it, though...