Hello.
I'm a developer at reddit, and this morning I was looking into how we
generate thumbnail images from Wikipedia (and I suppose all Mediawiki)
articles, due to a user report that they had an unexpected thumbnail on
their submission[0].
You can read my post there for more details, but essentially since we can't
find an og:image or other similar metadata tag hinting to us what we should
use for a thumbnail, we iterate through all the linked images on the page
and pull out the largest one (you can view the code online if you're
curious[1]). While this works reasonably well as a general heuristic,
Wikipedia articles often have some more structure that could give us a
better image to use.
While writing this, I recalled that the Wikipedia Android app displays
thumbnails in its search results. I think that's pulling from OpenSearch
with the PageImages extension[2]? but I haven't really delved into that
yet. I'm curious how those images get pulled - if it's taking into account
infoboxes or such, or just the first image on the page, or what.
Would it be feasible to include an og:image tag on pages for which we have
a reasonable guess as to the thumbnail? Open Graph[3] is supported by what
seems anecdotally to me to be a wide range of services, so good hints there
would improve thumbnails for links on not just reddit, but Facebook,
Twitter, various chat clients, I think several Wordpress plugins, etc.
Thanks,
- P
[0]:
https://www.reddit.com/r/bugs/comments/317n1v/thumbnail_acquisition_from_wi…
[1]:
https://github.com/reddit/reddit/blob/master/r2/r2/lib/media.py#L485-L542
[2]:
https://www.mediawiki.org/wiki/API:Opensearch
[3]:
http://ogp.me/