Multimedia September 2014

multimedia@lists.wikimedia.org

26 participants
24 discussions

Inclusion criteria for Wikidata items for paintings, engravings, illustrations, manuscript folios, photographs, old postcards, etc ?

by James Heald

Hi everybody, With the Structured Data for Commons project about to move into high gear, it seems to me that there's something the Wikidata community needs to have a serious discussion about, before APIs start getting designed and set in stone. Specifically: when should an object have an item with its own Q-number created for it on Wikidata? What are the limits? (Are there any limits?) The position so far seems to be essentially that a Wikidata item has only been created when an object either already has a fully-fledged Wikipedia article written for it, or reasonably could have. So objects that aren't particularly notable typically have not had Wikidata items made for them. Indeed, practically the first message Lydia sent to me when I started trying to work on Commons and Wikidata was to underline to me that Wikidata objects should generally not be created for individual Commons files. But, if I'm reading the initial plans and API thoughts of the Multimedia team correctly, eg https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Sl… and https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj… there seems to be the key assumption that, for any image that contains information relating to something beyond the immediate photograph or scan, there will be some kind of 'original work' item on main Wikidata that the file page will be able to reference, such that the 'original work' Wikidata item will be able to act as a place to locate any information specifically relating to the original work. Now in many ways this is a very clean division to be able to make. It removes any question of having to judge "notability"; and it removes any ambiguity or diversity of where information might be located -- if the information relates to the original work, then it will be stored on Wikidata. But it would appear to imply a potentially *huge* increase in the inclusion criteria for Wikidata, and the number of Wikidata items potentially creatable. So it seems appropriate that the Wikidata community should discuss and sign off just what should and should not be considered appropriate, before things get much further. For example, a year ago the British Library released 1 million illustrations from out-of-copyright books, which increasingly have been uploaded to Commons. Recently the Internet Archive has announced plans to release a further 12 million, with more images either already uploading or to follow from other major repositories including eg the NYPL, the Smithsonian, the Wellcome Foundation, etc, etc. How many of these images, all scanned from old originals, are going to need new Q-numbers for those originals? Is this okay? Or are some of them too much? For example, for maps, cf this data schema https://docs.google.com/spreadsheets/d/1Hn8VQ1rBgXj3avkUktjychEhluLQQJl5v6W… , each map sheet will have a separate Northernmost, Southernmost, Easternmost, Westernmost bounding co-ordinates. Does that mean each map sheet should have its own Wikidata item? For book illustrations, perhaps it is would be enough just to reference the edition of the book. But if individual illustrations have their own artist and engraver details, does that mean the illustration needs to have its own Wikidata item? Similarly, if the same engraving has appeared in many books, is that also a sign that it should have its own Wikidata item? What about old photographs, or old postcards, similarly. When should these have their own Wikidata item? If they have their own known creator, and creation date, then is it most simple just to give them a Wikidata item, so that such information about an original underlying work is always looked for on Wikidata? What if multiple copies of the same postcard or photograph are known, published or re-published at different times? But the potential number of old postcards and photographs, like the potential number of old engravings, is *huge*. What if an engraving was re-issued in different "states" (eg a re-issued engraving of a place might have been modified if a tower had been built). When should these get different items? At https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts#Wikidat… where I raised some of these issues a couple of weeks ago, there has even been the suggestion that particular individual impressions of an engraving might deserve their own separate items; or even everything with a separate accession number, so if a museum had three copies of an engraving, we would make three separate items, each carrying their own accession number, identifying the accession number that belonged to a particular File. (See also other sections at https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts for further relevant discussions on how to represent often quite complicated relations with Wikidata properties). With enough items, we could re-create and represent essentially the entire FRBR tree. We could do this. We may even need to do this, if MM team's outline for Commons is to be implemented in its apparent current form. But it seems to me that we shouldn't just sleepwalk into it. It does seem to me that this does represent (at least potentially) a *very* large expansion in the number of items, and widening of the inclusion criteria, for what Wikidata is going to encompass. I'm not saying it isn't the right thing to do, but given the potential scale of the implications, I do think it is something we do need to have properly worked through as a community, and confirmed that it is indeed what we *want* to do. All best, James. (Note that this is a slightly different discussion, though related, to the one I raised a few weeks ago as to whether Commons categories -- eg for particular sets of scans -- should necessarily have their own Q-number on Wikidata. Or whether some -- eg some intersection categories -- should just have an item on Commons data. But it's clearly related: is the simplest thing just to put items for everything on Wikidata? Or does one try to keep Wikidata lean, and no larger than it absolutely needs to be; albeit then having to cope with the complexity that some categories would have a Q-number, and some would not.)

9 years, 6 months

Media Viewer Update: First Improvements

by Fabrice Florin

Hi folks, I am happy to announce that we have just released a first round of improvements to Media Viewer, based on community feedback. The goal for these improvements is to make Media Viewer easier to use by readers and casual editors, our primary target users for this tool. To that end, we created a new 'minimal design’, with these features: * "More Details” button: a more prominent link to the File: page * separate icons for “Download" and "Share or Embed" features * an easier way to enlarge images by clicking on them * a simpler metadata panel with fewer items * faster image load with thumbnail pre-rendering These features are now live on Wikimedia Commons and sister projects (1), and will be deployed on all Wikipedias this Thursday by 20:00 UTC. Next, we plan to work on these other improvements: * an easier way to disable Media Viewer for personal use * a caption or description right below the image Learn more about these features on the Media Viewer Improvements page (2). They are based on findings from our recent community consultation (3) and ongoing user research (4). For more information, visit the Help FAQ page (5). Please let us know what you think of these new features on the Media Viewer talk page (6). We would like to thank all the community members who suggested these improvements. Our research suggests that they offer a better user experience, that is both clearer and simpler -- and that clarifies the relationship between Media Viewer and the File: description page. We will send another update in October, once the next round of improvements has been released. Onward! Fabrice and the Multimedia Team (1) Pictures of the Day on Commons: https://commons.wikimedia.org/wiki/Commons:Picture_of_the_day#mediaviewer/F… (2) Improvements page: https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Improvements (3) Community suggestions: https://meta.wikimedia.org/wiki/Community_Engagement_(Product)/Media_Viewer… (4) User Research: https://www.mediawiki.org/wiki/Media_Viewer_Research_Round_2_(August_2014) (5) Help page: https://www.mediawiki.org/wiki/Help:Multimedia/Media_Viewer (6) Talk page: https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer#Media_Vie… _______________________________ Fabrice Florin Product Manager, Multimedia Wikimedia Foundation https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)

9 years, 6 months

From the MW Core Backlog....

by Rob Lanphier

Hi folks, There's an item that's Luis Villa added to the MW Core backlog that I'd like to move to the Multimedia backlog: https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog#Struct… I'm assuming everything that he describes fits nicely into what is planned for Structured Data. Assuming that's true, should I just copy/paste into a new card in Mingle, or a new page on mw.org or what? Rob

9 years, 6 months

UploadWizard funnel - findings and next steps

by Gergo Tisza

Hi all, a little more detail from the funnel analysis of UploadWizard (if you haven't been following the other funnel thread, [[mw:UploadWizard/Funnel_analysis]] <https://www.mediawiki.org/wiki/UploadWizard/Funnel_analysis> has a quick summary). *Users repeat the upload process many times* The main thing I am trying to understand at this point is why people use the "upload another file" button so much. UploadWizard allows uploading up to 50 files at the same time, which should be more then enough for the average user, but our click-tracking data shows that most people click through the tutorial-file-deed-details-thanks screens, then click on the upload more button (which effectively resets the process and starts again from the file screen), then click through the screens again, then click on the upload more button again, then do the same again, and again, and again. (Doing this fifty times in a row is not uncommon.) This suggests some fundamental failing in UW - Sage suggested it is the instability of uploading more than a few files at the same time. I wonder if others have relevant experience? *Errors do not seem to be the main problem* I have tried to identify the reason for failed UploadWizard sessions (a series of UploadWizard events logged on the same page which are not terminated by reaching the thanks page) by checking what the last event was, and assuming that for failed sessions caused by errors, that error would be the last event. Assuming this is sound, errors do not seem to be the main problem - they only appear at the end of ~25% of the failed sessions (which is ~8% of the total sessions). *Top errors* That said, here is a list of error codes (these are mostly API error codes, but a few are internal to UploadWizard) sorted by frequency, collected over ~1000 sessions: | filename | 20 | | badtoken | 19 | | missingresult | 14 | | title | 13 | | publishfailed | 11 | | stasherror | 7 | | server-error | 3 | | fileexists-forbidden | 2 | | filetype-banned-type | 1 | | unknown | 1 | | verification-error | 1 | | unknownerror | 1 | A little explanation about the more frequent ones: - filename: these seem to be user errors - most often invalid filetype (doc, bmp etc), sometimes no extension at all or trying to add the same file twice. - badtoken: some sort of CSRF token expiration; bug 69691 <https://bugzilla.wikimedia.org/show_bug.cgi?id=69691> - missingresult: returned by the upload API in the details step when the uploaded file has gone missing; bug 43967 <https://bugzilla.wikimedia.org/show_bug.cgi?id=43967> - title: an error about duplicate files (i.e. the same file already exists on Commons) that somehow happens in the details step instead of the file step. - publishfailed: this seems to be some sort of race condition: first api call to publish a file from stash puts it into the job queue and sets it status to pending, second call will throw this error. - stasherror: could be lots of things. bug 56302 <https://bugzilla.wikimedia.org/show_bug.cgi?id=56302>, bug 54028 <https://bugzilla.wikimedia.org/show_bug.cgi?id=54028> and more. *Some suggestions based on the findings so far* Quick wins: - review UX for "fatal user errors" (i.e. when UploadWizard says "you can't upload this file type") - is the error message helpful? - review and improve api error messages (api-error-*), possibly override them with UW-specific ones. Do they identify next steps? Do they even exist?(e.g. api-error-publishfailed does not.) - renew token on badtoken error (bug 69691 <https://bugzilla.wikimedia.org/show_bug.cgi?id=69691>) - make sure that the specific error message thrown by ApiUpload::dieUsage gets logged somewhere. Currently we only log a generic message derived from the API error code, so e.g. all the dozen different UploadStashException subclasses are reported with the same message. - poll for success on publishfailed error (unlike its name suggest, it seems to be actually meaning something like "publish in progress") Medium wins: - understand better why people repeat the upload process so often. This might reveal serious UX deficiencies or functional errors (e.g. in an older thread about funnel analysis, Sage claims uploading more than three files at the same time is too unreliable for him). - Investigate if there is a low-effort way to recover entered details when the upload process has to be restarted. (There are drop-in solutions like garlic.js <http://garlicjs.org/> or sisyphus.js <https://github.com/simsalabim/sisyphus> but the very dynamic nature of UW forms might be a problem.) - figure out why are some title errors only reported in the details step - log information <https://meta.wikimedia.org/wiki/Schema:UploadWizardFlowEvent> about uploaded files to better identify size- or filetype-specific issues Bigger / longer-term effort: - figure out a way to retry when the user already entered all the details but publishing the file failed. (This points towards the per-file-workflow-instead-of-global-workflow direction.) - make stashed / async uploads rely on the database instead of the session (bug 43967 <https://bugzilla.wikimedia.org/show_bug.cgi?id=43967> )

9 years, 7 months

Preliminary results for UploadWizard funnel stats

by Gergo Tisza

Hi all, we have recently added some funnel [1] logging to UploadWizard. A nice dashboard is in the works, but here are some preliminary results, showing the number of virtual pageviews for each step of UploadWizard. mysql:research@s1-analytics-slave.eqiad.wmnet [log]> select event_step, count(*), count(*)/3623 as survival_rate from UploadWizardStep_8612364 group by event_step order by survival_rate desc; +------------+----------+---------------+ | event_step | count(*) | survival_rate | +------------+----------+---------------+ | tutorial | 3623 | 1.0000 | | file | 3496 | 0.9649 | | deeds | 2433 | 0.6715 | | details | 2373 | 0.6550 | | thanks | 2109 | 0.5821 | +------------+----------+---------------+ This is based on about a day's worth of logs (25.5 hours) - the logging code was deployed to Commons yesterday. The big drop is apparently in the file upload step (almost 30% - well over 1000 uploads a day). Some of that might be intentional (upload caught by badtitle filter etc), but even so the drop is huge. Given that that step is rather simple from a UX point of view, it seems that upload bugs are a bigger problem right now than design issues. (The license selection - deeds -> details - on the other hand is unexpectedly unproblematic; I would have expected it to be the main source of confusion, but actually adding description etc. seems worse.) The next step would be to log JS/upload errors, I suppose. Also, it would be nice to know which dropoffs are final and which are reloads/restarts. The Navigation Timing API can tell apart reloads and normal navigation, alternatively we could maybe group by IP + useragent + time bucket to find retries.

9 years, 7 months

Slideshow functionality for Wikipedia

by Russavia

Hi all, After working on an article on English Wikipedia, I came to realise that it might be useful if we had a slideshow feature for media for use in articles. I was informed that Hebrew Wikipedia has a fantastic slideshow template that can be used in articles.[1] The slideshow is created with this template.[2] The design is very sleek and it would no doubt be a fantastic addition to all Wikipedias. I've left a message for the person responsible for this template on he.wp asking if they can help create it for English Wikipedia, but I have been informed that they are basically semi-retired/on extended wikibreak. Would anyone out there like to take this on board and get it created for English Wikipedia at the earliest convenience. It can be tested live on the article I am working on at the moment if need be.[3] Cheers Russavia [1] https://he.wikipedia.org/wiki/%D7%A2%D7%96%D7%A8%D7%94:%D7%9E%D7%A6%D7%92%D… : [2] https://he.wikipedia.org/wiki/%D7%AA%D7%91%D7%A0%D7%99%D7%AA:%D7%9E%D7%A6%D… [3] https://en.wikipedia.org/wiki/Dobrolet_(low-cost_airline)

9 years, 7 months

Captions, descriptions, and duplication

by Pau Giner

Hi all, While adding details on story #589 <https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/589> about showing the caption above the fold in Media Viewer, I recalled that we were considering not showing the file description below the fold for files that have both a caption and a description. I wanted to confirm if that is the case, and the changes it will imply. I created story #895 <https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/895> to capture required changes if we remove the below the fold description. Basically it reorganises a little bit the authorship information so that is can expand using the room the description used and avoids just having an empty area below the fold. The rationale for removing the description below the fold was that for files that have both caption and description they tend to be redundant most of the time. I did a quick exploration by picking some random vital articles <http://en.wikipedia.org/wiki/Wikipedia:Vital_articles> and going through 62 images: - 46% of the images lack description or it was not shown in Media Viewer - From the rest of the files where both caption and description: - 58% (26% of total) provide redundant information or the description didn't add additional details (example <http://en.wikipedia.org/wiki/Leonardo_da_Vinci#mediaviewer/File:Clos_luce_0…> ). - 44% (20% of total) show more details on the description or complementary info to the caption (example <http://en.wikipedia.org/wiki/Mercury_(planet)#mediaviewer/File:Mercuryorbit…> ). I'm not sure how representative this is compared to all our files (any data on the subject is welcome), but taking into account that the file description can be accessed through the "more details" button which we plan to make more prominent, I'm ok with removing it if we don't found stronger evidence of its need. Should that be made as part of the changes of #589 <https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/589>? I don't know, I created card #895 <https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/895> as a separate card so that it can be done after #589 or at the same time, as it makes more sense according to development efforts. Any thoughts? Pau -- Pau Giner Interaction Designer Wikimedia Foundation

9 years, 7 months

Re: [Multimedia] [WikimediaMobile] Low-quality images are live

by Tomasz Finc

Thanks Yuri, CC'ing Multimedia team Maryana, this could be something interesting for the Mobile Web team to look at to optimize image delivery. Have you guys done any perf work around images? --tomasz On Thu, Jun 5, 2014 at 4:10 PM, Yuri Astrakhan <yastrakhan(a)wikimedia.org> wrote: > The reduced quality images is now live in production. To see it for > yourself, compare original with low quality images (253KB => 99.9KB, 60% > reduction). > > The quality reduction is triggered by adding "qlow-" in front of the file > name's pixel size. > > Continuing our previous discussion, now we need to figure out how to best > use this feature. As covered before, there are two main approaches: > * JavaScript rewrite - dynamically change <img> tag based on > network/device/user preference conditions. Issues may include multiple > downloads of the same image (if the browser starts the download before JS > runs), parser cache fragmentation. > > * Varnish-based rewrite - varnish decides which image to server under the > same URL. This approach requires Varnish to know everything needed to make a > decision. > > Zero plans to go the first route, but if we make it mobile, or ever site > wide, all the better. > > _______________________________________________ > Mobile-l mailing list > Mobile-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l >

9 years, 7 months

Making UploadWizard and Structured Data actual engineering activities on mediawiki.org

by Guillaume Paumier

Greetings, As most of you know, we document WMF engineering activities on mediawiki.org using "activity pages", which is just a fancy word for pages that have an infobox. We can then list the activities in many places, like the Wikimedia Engineering portal ( https://www.mediawiki.org/wiki/Wikimedia_Engineering ) and the status dashboard ( https://www.mediawiki.org/wiki/Wikimedia_Engineering/Dashboard ) Most of the activities are about a particular project, like "Phabricator migration" or "Flow". Multimedia is a bit awkward because it's about a team rather than the projects you guys work on. It might have made sense previously (for example if the team was touching a lot of different pieces of Multimedia) but my understanding from the Wikimania workshops is that the Multimedia team plans to mostly focus on two main projects this fiscal year: UploadWizard and Structured Data. Therefore, I'd like to recommend that we make those two projects actual "activities", with a dedicated infobox and status updates. Other, smaller multimedia-related bits like MediaViewer could still be in the catch-all "Multimedia" activity. This wouldn't change anything for most of you; the only visible difference would be that you would report on UploadWizard and Structured Data on a different page. It would be more consistent with the rest of WMF engineering, and it would be easier for the rest of the community to follow your work on each project. Unless there are strong objections to this proposal, I'm happy to add the infoboxes myself, but I wanted to ask here first :) Let me know if you have any questions. -- Guillaume Paumier Technical Communications Manager — Wikimedia Foundation

9 years, 7 months

PayPal video player

by Gergo Tisza

PayPal published an opensource HTML5 video player a couple days ago, with focus on accessibility. Seems pretty neat at a glance. https://www.paypal-engineering.com/2014/09/05/introducing-an-accessible-htm… https://github.com/paypal/accessible-html5-video-player

9 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Multimedia September 2014