Hello.
I'm a developer at reddit, and this morning I was looking into how we generate thumbnail images from Wikipedia (and I suppose all Mediawiki) articles, due to a user report that they had an unexpected thumbnail on their submission[0].
You can read my post there for more details, but essentially since we can't find an og:image or other similar metadata tag hinting to us what we should use for a thumbnail, we iterate through all the linked images on the page and pull out the largest one (you can view the code online if you're curious[1]). While this works reasonably well as a general heuristic, Wikipedia articles often have some more structure that could give us a better image to use.
While writing this, I recalled that the Wikipedia Android app displays thumbnails in its search results. I think that's pulling from OpenSearch with the PageImages extension[2]? but I haven't really delved into that yet. I'm curious how those images get pulled - if it's taking into account infoboxes or such, or just the first image on the page, or what.
Would it be feasible to include an og:image tag on pages for which we have a reasonable guess as to the thumbnail? Open Graph[3] is supported by what seems anecdotally to me to be a wide range of services, so good hints there would improve thumbnails for links on not just reddit, but Facebook, Twitter, various chat clients, I think several Wordpress plugins, etc.
Thanks, - P
[0]: https://www.reddit.com/r/bugs/comments/317n1v/thumbnail_acquisition_from_wik... [1]: https://github.com/reddit/reddit/blob/master/r2/r2/lib/media.py#L485-L542 [2]: https://www.mediawiki.org/wiki/API:Opensearch [3]: http://ogp.me/
On Thu, Apr 2, 2015 at 11:35 AM, James Pearson james@reddit.com wrote:
While writing this, I recalled that the Wikipedia Android app displays thumbnails in its search results. I think that's pulling from OpenSearch with the PageImages extension[2]? but I haven't really delved into that yet. I'm curious how those images get pulled - if it's taking into account infoboxes or such, or just the first image on the page, or what.
It uses a scoring system that takes position on page, size and w:h ratio into account.
Would it be feasible to include an og:image tag on pages for which we have a reasonable guess as to the thumbnail? Open Graph[3] is supported by what seems anecdotally to me to be a wide range of services, so good hints there would improve thumbnails for links on not just reddit, but Facebook, Twitter, various chat clients, I think several Wordpress plugins, etc.
https://phabricator.wikimedia.org/T33338
Hi James! Thanks for mailing out.
On the subject of OG tags, we have a bunch of bugs around explicitly using og tags ([1] for example) This thread is very enlightening: [2]
tldr: Essentially I think many of us, myself included, would like to add 'og:image' tags but the Wikipedia community as a whole is not 100% sure this is aligned with the mission.
I wonder if adopting http://schema.org would be a less controversial move and help you towards your goal.
[1] https://phabricator.wikimedia.org/T32113 [2] http://www.gossamer-threads.com/lists/wiki/wikitech/545421
On Thu, Apr 2, 2015 at 11:45 AM, Max Semenik maxsem.wiki@gmail.com wrote:
On Thu, Apr 2, 2015 at 11:35 AM, James Pearson james@reddit.com wrote:
While writing this, I recalled that the Wikipedia Android app displays thumbnails in its search results. I think that's pulling from OpenSearch with the PageImages extension[2]? but I haven't really delved into that yet. I'm curious how those images get pulled - if it's taking into account infoboxes or such, or just the first image on the page, or what.
It uses a scoring system that takes position on page, size and w:h ratio into account.
Would it be feasible to include an og:image tag on pages for which we have a reasonable guess as to the thumbnail? Open Graph[3] is supported by what seems anecdotally to me to be a wide range of services, so good hints there would improve thumbnails for links on not just reddit, but Facebook, Twitter, various chat clients, I think several Wordpress plugins, etc.
https://phabricator.wikimedia.org/T33338
-- Best regards, Max Semenik ([[User:MaxSem]]) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks, it looks like I have some reading to do. - P
On Thu, Apr 2, 2015 at 11:58 AM, Jon Robson jdlrobson@gmail.com wrote:
Hi James! Thanks for mailing out.
On the subject of OG tags, we have a bunch of bugs around explicitly using og tags ([1] for example) This thread is very enlightening: [2]
tldr: Essentially I think many of us, myself included, would like to add 'og:image' tags but the Wikipedia community as a whole is not 100% sure this is aligned with the mission.
I wonder if adopting http://schema.org would be a less controversial move and help you towards your goal.
[1] https://phabricator.wikimedia.org/T32113 [2] http://www.gossamer-threads.com/lists/wiki/wikitech/545421
On Thu, Apr 2, 2015 at 11:45 AM, Max Semenik maxsem.wiki@gmail.com wrote:
On Thu, Apr 2, 2015 at 11:35 AM, James Pearson james@reddit.com wrote:
While writing this, I recalled that the Wikipedia Android app displays thumbnails in its search results. I think that's pulling from
OpenSearch
with the PageImages extension[2]? but I haven't really delved into that yet. I'm curious how those images get pulled - if it's taking into
account
infoboxes or such, or just the first image on the page, or what.
It uses a scoring system that takes position on page, size and w:h ratio into account.
Would it be feasible to include an og:image tag on pages for which we
have
a reasonable guess as to the thumbnail? Open Graph[3] is supported by
what
seems anecdotally to me to be a wide range of services, so good hints
there
would improve thumbnails for links on not just reddit, but Facebook, Twitter, various chat clients, I think several Wordpress plugins, etc.
https://phabricator.wikimedia.org/T33338
-- Best regards, Max Semenik ([[User:MaxSem]]) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Jon Robson
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Apr 2, 2015 2:58 PM, "Jon Robson" jdlrobson@gmail.com wrote:
Hi James! Thanks for mailing out.
On the subject of OG tags, we have a bunch of bugs around explicitly using og tags ([1] for example) This thread is very enlightening: [2]
tldr: Essentially I think many of us, myself included, would like to add 'og:image' tags but the Wikipedia community as a whole is not 100% sure this is aligned with the mission.
What's the actual objection? Our community (or at least a significant and very vocal portion of it) does not want gaudy share buttons for various reasons, but i dont recall anyone objecting to adding metadata to allow automatic identification of the primary image of an article.
--bawolff
Brian Wolff wrote:
On Apr 2, 2015 2:58 PM, "Jon Robson" jdlrobson@gmail.com wrote:
On the subject of OG tags, we have a bunch of bugs around explicitly using og tags ([1] for example) This thread is very enlightening: [2]
tldr: Essentially I think many of us, myself included, would like to add 'og:image' tags but the Wikipedia community as a whole is not 100% sure this is aligned with the mission.
What's the actual objection? Our community (or at least a significant and very vocal portion of it) does not want gaudy share buttons for various reasons, but i dont recall anyone objecting to adding metadata to allow automatic identification of the primary image of an article.
There's related discussion at https://phabricator.wikimedia.org/T64811.
Whether it's Twitter, Facebook, or some future social site, MediaWiki and Wikimedia need to figure out what a reasonable level of support that we're willing to offer each of these services looks like, in my opinion.
MZMcBride
On Apr 2, 2015 11:02 PM, "MZMcBride" z@mzmcbride.com wrote:
Brian Wolff wrote:
On Apr 2, 2015 2:58 PM, "Jon Robson" jdlrobson@gmail.com wrote:
On the subject of OG tags, we have a bunch of bugs around explicitly using og tags ([1] for example) This thread is very enlightening: [2]
tldr: Essentially I think many of us, myself included, would like to add 'og:image' tags but the Wikipedia community as a whole is not 100% sure this is aligned with the mission.
What's the actual objection? Our community (or at least a significant and very vocal portion of it) does not want gaudy share buttons for various reasons, but i dont recall anyone objecting to adding metadata to allow automatic identification of the primary image of an article.
There's related discussion at https://phabricator.wikimedia.org/T64811.
Whether it's Twitter, Facebook, or some future social site, MediaWiki and Wikimedia need to figure out what a reasonable level of support that we're willing to offer each of these services looks like, in my opinion.
MZMcBride
I agree that we should not add every propriety meta tag that ever happens to exist.
However there is clearly a desire to be able to identify a representitive image for an article. This need is exhibited across many websites including reddit, facebook, google plus, etc, but also our own site as noted by the page images extension for mobile. Its clear there are multiple parties that want to be able to accurately extract such information progmatically from any arbitrary website on the internet. I would argue supporting this use case is not a Wikipedia issue, but a MediaWiki issue.
We should research which meta data scheme is the most de-facto standard for declaring this sort of information (whether that be open graph or schema.org or something else) and implement it (and only 1. Implenting this 10 different ways would be silly).
In many ways i think this is similar to rss feeds (a specific piece of info multiple people want, with somewhat competing standards to implement it)
--bawolff
On 2015-04-02 8:44 PM, Brian Wolff wrote:
However there is clearly a desire to be able to identify a representitive image for an article. This need is exhibited across many websites including reddit, facebook, google plus, etc, but also our own site as noted by the page images extension for mobile. Its clear there are multiple parties that want to be able to accurately extract such information progmatically from any arbitrary website on the internet. I would argue supporting this use case is not a Wikipedia issue, but a MediaWiki issue.
We should research which meta data scheme is the most de-facto standard for declaring this sort of information (whether that be open graph or schema.org or something else) and implement it (and only 1. Implenting this 10 different ways would be silly).
Facebook exclusively supports Open Graph.
Google+ recommends schema.org microdata and uses Open Graph.
Twitter exclusively uses their proprietary Twitter cards markup ( <meta name="twitter:card" content="summary" /> ...) and requires you to validate and submit your site for approval before they'll display cards.
Reddit uses embed.ly, which is supposed to support a variety of Open Graph, oEmbed, etc...
Bing uses schema.org and Open Graph but states that they "currently only [use] this information to enhance the visual display of search results of a limited number of publishers". Bing just uses everything it can, Microdata, Microformats, RDFa, etc...
Google uses schema.org in microdata, RDFa, and JSON-LD formats for rich data (I'm not sure if they bother with page level metadata at all, standard HTML title and meta description generally covers what they output).
----
So my opinion would be to support Open Graph, optionally add some schema.org, and screw Twitter and their unwillingness to play nice with attempts to standardize metadata.
We should also consider oEmbed where it makes sense.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
On Thu, Apr 2, 2015 at 10:04 PM, Daniel Friesen daniel@nadir-seen-fire.com wrote:
On 2015-04-02 8:44 PM, Brian Wolff wrote:
However there is clearly a desire to be able to identify a representitive image for an article. This need is exhibited across many websites including reddit, facebook, google plus, etc, but also our own site as noted by the page images extension for mobile. Its clear there are multiple parties that want to be able to accurately extract such information progmatically from any arbitrary website on the internet. I would argue supporting this use case is not a Wikipedia issue, but a MediaWiki issue.
We should research which meta data scheme is the most de-facto standard for declaring this sort of information (whether that be open graph or schema.org or something else) and implement it (and only 1. Implenting this 10 different ways would be silly).
Facebook exclusively supports Open Graph.
Google+ recommends schema.org microdata and uses Open Graph.
Twitter exclusively uses their proprietary Twitter cards markup ( <meta name="twitter:card" content="summary" /> ...) and requires you to validate and submit your site for approval before they'll display cards.
Reddit uses embed.ly, which is supposed to support a variety of Open Graph, oEmbed, etc...
Bing uses schema.org and Open Graph but states that they "currently only [use] this information to enhance the visual display of search results of a limited number of publishers". Bing just uses everything it can, Microdata, Microformats, RDFa, etc...
Google uses schema.org in microdata, RDFa, and JSON-LD formats for rich data (I'm not sure if they bother with page level metadata at all, standard HTML title and meta description generally covers what they output).
So my opinion would be to support Open Graph, optionally add some schema.org, and screw Twitter and their unwillingness to play nice with attempts to standardize metadata.
+1 and if someone writes the patch I'll +2 it. We've been talking about this for far too long :-)
We should also consider oEmbed where it makes sense.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Apr 2, 2015 10:05 PM, "Daniel Friesen" daniel@nadir-seen-fire.com wrote:
Twitter exclusively uses their proprietary Twitter cards markup ( <meta name="twitter:card" content="summary" /> ...) and requires you to validate and submit your site for approval before they'll display cards.
This isn't quite correct. They have their own thing, which allows you to give some Twitter-specific data, but attributes that are pretty standard (like thumbnail image) will fall back to open graph.
https://dev.twitter.com/cards/markup
You do have to submit a request, though, for cards to be shown. I think it's a pretty painless process, but I wasn't the one handling it when we implemented card support.
Reddit uses embed.ly, which is supposed to support a variety of Open Graph, oEmbed, etc...
Depending on what embedly tells us it can embed given certain conditions (for instance, the embed needs to support https if the requested page was https), we sometimes use embedly for thumbnails, and sometimes use our own scraper, with the code I linked to in my first email. It will currently pick up on opengraph tags, but if you decide to implement another standard we don't currently support, I will gladly build it in (pending some project scheduling, so perhaps not immediately).
From what I've seen, the various web-chat irc-replacements support open
graph as well, if they do any auto-link-embedding.
Hi!
Would it be feasible to include an og:image tag on pages for which we have a reasonable guess as to the thumbnail? Open Graph[3] is supported by what seems anecdotally to me to be a wide range of services, so good hints there would improve thumbnails for links on not just reddit, but Facebook, Twitter, various chat clients, I think several Wordpress plugins, etc.
I wonder if this can somehow be connected to Wikidata's image attribute (https://www.wikidata.org/wiki/Property:P18).
On 3 apr. 2015, at 06:27, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Would it be feasible to include an og:image tag on pages for which we have a reasonable guess as to the thumbnail? Open Graph[3] is supported by what seems anecdotally to me to be a wide range of services, so good hints there would improve thumbnails for links on not just reddit, but Facebook, Twitter, various chat clients, I think several Wordpress plugins, etc.
I wonder if this can somehow be connected to Wikidata's image attribute (https://www.wikidata.org/wiki/Property:P18).
This is a very good idea, because it circumvents the problem of autodetection that PageImages has for instance and that takes away the ability of editors to ‘author’ the result.
DJ
On Thu, Apr 2, 2015 at 11:35 AM, James Pearson james@reddit.com wrote:
I recalled that the Wikipedia Android app displays thumbnails in its search results. I think that's pulling from OpenSearch with the PageImages extension[2]? but I haven't really delved into that yet. I'm curious how those images get pulled - if it's taking into account infoboxes or such, or just the first image on the page, or what.
I'm writing an article on that subject. It hasn't been reviewed, but it's interesting. Your feedback is welcome. https://www.mediawiki.org/wiki/API:Page_info_in_search_results
wikitech-l@lists.wikimedia.org