Hi everybody,
With the Structured Data for Commons project about to move into high
gear, it seems to me that there's something the Wikidata community needs
to have a serious discussion about, before APIs start getting designed
and set in stone.
Specifically: when should an object have an item with its own Q-number
created for it on Wikidata? What are the limits? (Are there any limits?)
The position so far seems to be essentially that a Wikidata item has
only been created when an object either already has a fully-fledged
Wikipedia article written for it, or reasonably could have.
So objects that aren't particularly notable typically have not had
Wikidata items made for them.
Indeed, practically the first message Lydia sent to me when I started
trying to work on Commons and Wikidata was to underline to me that
Wikidata objects should generally not be created for individual Commons
files.
But, if I'm reading the initial plans and API thoughts of the Multimedia
team correctly, eg
https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Sl…
and
https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…
there seems to be the key assumption that, for any image that contains
information relating to something beyond the immediate photograph or
scan, there will be some kind of 'original work' item on main Wikidata
that the file page will be able to reference, such that the 'original
work' Wikidata item will be able to act as a place to locate any
information specifically relating to the original work.
Now in many ways this is a very clean division to be able to make. It
removes any question of having to judge "notability"; and it removes any
ambiguity or diversity of where information might be located -- if the
information relates to the original work, then it will be stored on
Wikidata.
But it would appear to imply a potentially *huge* increase in the
inclusion criteria for Wikidata, and the number of Wikidata items
potentially creatable.
So it seems appropriate that the Wikidata community should discuss and
sign off just what should and should not be considered appropriate,
before things get much further.
For example, a year ago the British Library released 1 million
illustrations from out-of-copyright books, which increasingly have been
uploaded to Commons. Recently the Internet Archive has announced plans
to release a further 12 million, with more images either already
uploading or to follow from other major repositories including eg the
NYPL, the Smithsonian, the Wellcome Foundation, etc, etc.
How many of these images, all scanned from old originals, are going to
need new Q-numbers for those originals? Is this okay? Or are some of
them too much?
For example, for maps, cf this data schema
https://docs.google.com/spreadsheets/d/1Hn8VQ1rBgXj3avkUktjychEhluLQQJl5v6W…
, each map sheet will have a separate Northernmost, Southernmost,
Easternmost, Westernmost bounding co-ordinates. Does that mean each map
sheet should have its own Wikidata item?
For book illustrations, perhaps it is would be enough just to reference
the edition of the book. But if individual illustrations have their own
artist and engraver details, does that mean the illustration needs to
have its own Wikidata item? Similarly, if the same engraving has
appeared in many books, is that also a sign that it should have its own
Wikidata item?
What about old photographs, or old postcards, similarly. When should
these have their own Wikidata item? If they have their own known
creator, and creation date, then is it most simple just to give them a
Wikidata item, so that such information about an original underlying
work is always looked for on Wikidata? What if multiple copies of the
same postcard or photograph are known, published or re-published at
different times? But the potential number of old postcards and
photographs, like the potential number of old engravings, is *huge*.
What if an engraving was re-issued in different "states" (eg a
re-issued engraving of a place might have been modified if a tower had
been built). When should these get different items?
At
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts#Wikidat…
where I raised some of these issues a couple of weeks ago, there has
even been the suggestion that particular individual impressions of an
engraving might deserve their own separate items; or even everything
with a separate accession number, so if a museum had three copies of an
engraving, we would make three separate items, each carrying their own
accession number, identifying the accession number that belonged to a
particular File.
(See also other sections at
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts for
further relevant discussions on how to represent often quite complicated
relations with Wikidata properties).
With enough items, we could re-create and represent essentially the
entire FRBR tree.
We could do this. We may even need to do this, if MM team's outline for
Commons is to be implemented in its apparent current form.
But it seems to me that we shouldn't just sleepwalk into it.
It does seem to me that this does represent (at least potentially) a
*very* large expansion in the number of items, and widening of the
inclusion criteria, for what Wikidata is going to encompass.
I'm not saying it isn't the right thing to do, but given the potential
scale of the implications, I do think it is something we do need to have
properly worked through as a community, and confirmed that it is indeed
what we *want* to do.
All best,
James.
(Note that this is a slightly different discussion, though related, to
the one I raised a few weeks ago as to whether Commons categories -- eg
for particular sets of scans -- should necessarily have their own
Q-number on Wikidata. Or whether some -- eg some intersection
categories -- should just have an item on Commons data. But it's
clearly related: is the simplest thing just to put items for everything
on Wikidata? Or does one try to keep Wikidata lean, and no larger than
it absolutely needs to be; albeit then having to cope with the
complexity that some categories would have a Q-number, and some would not.)
Hi folks,
I am happy to announce that we have just released a first round of improvements to Media Viewer, based on community feedback.
The goal for these improvements is to make Media Viewer easier to use by readers and casual editors, our primary target users for this tool.
To that end, we created a new 'minimal design’, with these features:
* "More Details” button: a more prominent link to the File: page
* separate icons for “Download" and "Share or Embed" features
* an easier way to enlarge images by clicking on them
* a simpler metadata panel with fewer items
* faster image load with thumbnail pre-rendering
These features are now live on Wikimedia Commons and sister projects (1), and will be deployed on all Wikipedias this Thursday by 20:00 UTC.
Next, we plan to work on these other improvements:
* an easier way to disable Media Viewer for personal use
* a caption or description right below the image
Learn more about these features on the Media Viewer Improvements page (2). They are based on findings from our recent community consultation (3) and ongoing user research (4). For more information, visit the Help FAQ page (5).
Please let us know what you think of these new features on the Media Viewer talk page (6).
We would like to thank all the community members who suggested these improvements. Our research suggests that they offer a better user experience, that is both clearer and simpler -- and that clarifies the relationship between Media Viewer and the File: description page.
We will send another update in October, once the next round of improvements has been released.
Onward!
Fabrice and the Multimedia Team
(1) Pictures of the Day on Commons:
https://commons.wikimedia.org/wiki/Commons:Picture_of_the_day#mediaviewer/F…
(2) Improvements page:
https://www.mediawiki.org/wiki/Multimedia/Media_Viewer/Improvements
(3) Community suggestions:
https://meta.wikimedia.org/wiki/Community_Engagement_(Product)/Media_Viewer…
(4) User Research:
https://www.mediawiki.org/wiki/Media_Viewer_Research_Round_2_(August_2014)
(5) Help page:
https://www.mediawiki.org/wiki/Help:Multimedia/Media_Viewer
(6) Talk page:
https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer#Media_Vie…
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)
Hi folks,
There's an item that's Luis Villa added to the MW Core backlog that I'd
like to move to the Multimedia backlog:
https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog#Struct…
I'm assuming everything that he describes fits nicely into what is planned
for Structured Data. Assuming that's true, should I just copy/paste into a
new card in Mingle, or a new page on mw.org or what?
Rob
Hi all,
a little more detail from the funnel analysis of UploadWizard (if you
haven't been following the other funnel thread,
[[mw:UploadWizard/Funnel_analysis]]
<https://www.mediawiki.org/wiki/UploadWizard/Funnel_analysis> has a quick
summary).
*Users repeat the upload process many times*
The main thing I am trying to understand at this point is why people use
the "upload another file" button so much. UploadWizard allows uploading up
to 50 files at the same time, which should be more then enough for the
average user, but our click-tracking data shows that most people click
through the tutorial-file-deed-details-thanks screens, then click on the
upload more button (which effectively resets the process and starts again
from the file screen), then click through the screens again, then click on
the upload more button again, then do the same again, and again, and again.
(Doing this fifty times in a row is not uncommon.) This suggests some
fundamental failing in UW - Sage suggested it is the instability of
uploading more than a few files at the same time. I wonder if others have
relevant experience?
*Errors do not seem to be the main problem*
I have tried to identify the reason for failed UploadWizard sessions (a
series of UploadWizard events logged on the same page which are not
terminated by reaching the thanks page) by checking what the last event
was, and assuming that for failed sessions caused by errors, that error
would be the last event. Assuming this is sound, errors do not seem to be
the main problem - they only appear at the end of ~25% of the failed
sessions (which is ~8% of the total sessions).
*Top errors*
That said, here is a list of error codes (these are mostly API error codes,
but a few are internal to UploadWizard) sorted by frequency, collected over
~1000 sessions:
| filename | 20 |
| badtoken | 19 |
| missingresult | 14 |
| title | 13 |
| publishfailed | 11 |
| stasherror | 7 |
| server-error | 3 |
| fileexists-forbidden | 2 |
| filetype-banned-type | 1 |
| unknown | 1 |
| verification-error | 1 |
| unknownerror | 1 |
A little explanation about the more frequent ones:
- filename: these seem to be user errors - most often invalid filetype
(doc, bmp etc), sometimes no extension at all or trying to add the same
file twice.
- badtoken: some sort of CSRF token expiration; bug 69691
<https://bugzilla.wikimedia.org/show_bug.cgi?id=69691>
- missingresult: returned by the upload API in the details step when the
uploaded file has gone missing; bug 43967
<https://bugzilla.wikimedia.org/show_bug.cgi?id=43967>
- title: an error about duplicate files (i.e. the same file already
exists on Commons) that somehow happens in the details step instead of the
file step.
- publishfailed: this seems to be some sort of race condition: first api
call to publish a file from stash puts it into the job queue and sets it
status to pending, second call will throw this error.
- stasherror: could be lots of things. bug 56302
<https://bugzilla.wikimedia.org/show_bug.cgi?id=56302>, bug 54028
<https://bugzilla.wikimedia.org/show_bug.cgi?id=54028> and more.
*Some suggestions based on the findings so far*
Quick wins:
- review UX for "fatal user errors" (i.e. when UploadWizard says "you
can't upload this file type") - is the error message helpful?
- review and improve api error messages (api-error-*), possibly override
them with UW-specific ones. Do they identify next steps? Do they even
exist?(e.g. api-error-publishfailed does not.)
- renew token on badtoken error (bug 69691
<https://bugzilla.wikimedia.org/show_bug.cgi?id=69691>)
- make sure that the specific error message thrown by
ApiUpload::dieUsage gets logged somewhere. Currently we only log a generic
message derived from the API error code, so e.g. all the dozen different
UploadStashException subclasses are reported with the same message.
- poll for success on publishfailed error (unlike its name suggest, it
seems to be actually meaning something like "publish in progress")
Medium wins:
- understand better why people repeat the upload process so often. This
might reveal serious UX deficiencies or functional errors (e.g. in an older
thread about funnel analysis, Sage claims uploading more than three files
at the same time is too unreliable for him).
- Investigate if there is a low-effort way to recover entered details
when the upload process has to be restarted. (There are drop-in solutions
like garlic.js <http://garlicjs.org/> or sisyphus.js
<https://github.com/simsalabim/sisyphus> but the very dynamic nature of
UW forms might be a problem.)
- figure out why are some title errors only reported in the details step
- log information
<https://meta.wikimedia.org/wiki/Schema:UploadWizardFlowEvent> about
uploaded files to better identify size- or filetype-specific issues
Bigger / longer-term effort:
- figure out a way to retry when the user already entered all the
details but publishing the file failed. (This points towards the
per-file-workflow-instead-of-global-workflow direction.)
- make stashed / async uploads rely on the database instead of the
session (bug 43967 <https://bugzilla.wikimedia.org/show_bug.cgi?id=43967>
)
Hi all,
we have recently added some funnel [1] logging to UploadWizard. A nice
dashboard is in the works, but here are some preliminary results, showing
the number of virtual pageviews for each step of UploadWizard.
mysql:research@s1-analytics-slave.eqiad.wmnet [log]> select event_step,
count(*), count(*)/3623 as survival_rate from UploadWizardStep_8612364
group by event_step order by survival_rate desc;
+------------+----------+---------------+
| event_step | count(*) | survival_rate |
+------------+----------+---------------+
| tutorial | 3623 | 1.0000 |
| file | 3496 | 0.9649 |
| deeds | 2433 | 0.6715 |
| details | 2373 | 0.6550 |
| thanks | 2109 | 0.5821 |
+------------+----------+---------------+
This is based on about a day's worth of logs (25.5 hours) - the logging
code was deployed to Commons yesterday.
The big drop is apparently in the file upload step (almost 30% - well over
1000 uploads a day). Some of that might be intentional (upload caught by
badtitle filter etc), but even so the drop is huge. Given that that step is
rather simple from a UX point of view, it seems that upload bugs are a
bigger problem right now than design issues.
(The license selection - deeds -> details - on the other hand is
unexpectedly unproblematic; I would have expected it to be the main source
of confusion, but actually adding description etc. seems worse.)
The next step would be to log JS/upload errors, I suppose.
Also, it would be nice to know which dropoffs are final and which are
reloads/restarts. The Navigation Timing API can tell apart reloads and
normal navigation, alternatively we could maybe group by IP + useragent +
time bucket to find retries.
Hi all,
After working on an article on English Wikipedia, I came to realise that it
might be useful if we had a slideshow feature for media for use in articles.
I was informed that Hebrew Wikipedia has a fantastic slideshow template
that can be used in articles.[1] The slideshow is created with this
template.[2]
The design is very sleek and it would no doubt be a fantastic addition to
all Wikipedias.
I've left a message for the person responsible for this template on he.wp
asking if they can help create it for English Wikipedia, but I have been
informed that they are basically semi-retired/on extended wikibreak.
Would anyone out there like to take this on board and get it created for
English Wikipedia at the earliest convenience. It can be tested live on the
article I am working on at the moment if need be.[3]
Cheers
Russavia
[1]
https://he.wikipedia.org/wiki/%D7%A2%D7%96%D7%A8%D7%94:%D7%9E%D7%A6%D7%92%D…
:
[2]
https://he.wikipedia.org/wiki/%D7%AA%D7%91%D7%A0%D7%99%D7%AA:%D7%9E%D7%A6%D…
[3] https://en.wikipedia.org/wiki/Dobrolet_(low-cost_airline)
Hi all,
While adding details on story #589
<https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/589> about
showing the caption above the fold in Media Viewer, I recalled that we were
considering not showing the file description below the fold for files that
have both a caption and a description. I wanted to confirm if that is the
case, and the changes it will imply.
I created story #895
<https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/895> to
capture required changes if we remove the below the fold description.
Basically it reorganises a little bit the authorship information so that is
can expand using the room the description used and avoids just having an
empty area below the fold.
The rationale for removing the description below the fold was that for
files that have both caption and description they tend to be redundant most
of the time. I did a quick exploration by picking some random vital articles
<http://en.wikipedia.org/wiki/Wikipedia:Vital_articles> and going through
62 images:
- 46% of the images lack description or it was not shown in Media Viewer
- From the rest of the files where both caption and description:
- 58% (26% of total) provide redundant information or the description
didn't add additional details (example
<http://en.wikipedia.org/wiki/Leonardo_da_Vinci#mediaviewer/File:Clos_luce_0…>
).
- 44% (20% of total) show more details on the description or
complementary info to the caption (example
<http://en.wikipedia.org/wiki/Mercury_(planet)#mediaviewer/File:Mercuryorbit…>
).
I'm not sure how representative this is compared to all our files (any data
on the subject is welcome), but taking into account that the file
description can be accessed through the "more details" button which we plan
to make more prominent, I'm ok with removing it if we don't found stronger
evidence of its need.
Should that be made as part of the changes of #589
<https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/589>?
I don't know, I created card #895
<https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/895> as
a separate card so that it can be done after #589 or at the same time, as
it makes more sense according to development efforts.
Any thoughts?
Pau
--
Pau Giner
Interaction Designer
Wikimedia Foundation
Thanks Yuri,
CC'ing Multimedia team
Maryana, this could be something interesting for the Mobile Web team
to look at to optimize image delivery.
Have you guys done any perf work around images?
--tomasz
On Thu, Jun 5, 2014 at 4:10 PM, Yuri Astrakhan <yastrakhan(a)wikimedia.org> wrote:
> The reduced quality images is now live in production. To see it for
> yourself, compare original with low quality images (253KB => 99.9KB, 60%
> reduction).
>
> The quality reduction is triggered by adding "qlow-" in front of the file
> name's pixel size.
>
> Continuing our previous discussion, now we need to figure out how to best
> use this feature. As covered before, there are two main approaches:
> * JavaScript rewrite - dynamically change <img> tag based on
> network/device/user preference conditions. Issues may include multiple
> downloads of the same image (if the browser starts the download before JS
> runs), parser cache fragmentation.
>
> * Varnish-based rewrite - varnish decides which image to server under the
> same URL. This approach requires Varnish to know everything needed to make a
> decision.
>
> Zero plans to go the first route, but if we make it mobile, or ever site
> wide, all the better.
>
> _______________________________________________
> Mobile-l mailing list
> Mobile-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
Greetings,
As most of you know, we document WMF engineering activities on
mediawiki.org using "activity pages", which is just a fancy word for
pages that have an infobox. We can then list the activities in many
places, like the Wikimedia Engineering portal (
https://www.mediawiki.org/wiki/Wikimedia_Engineering ) and the status
dashboard ( https://www.mediawiki.org/wiki/Wikimedia_Engineering/Dashboard
)
Most of the activities are about a particular project, like
"Phabricator migration" or "Flow". Multimedia is a bit awkward because
it's about a team rather than the projects you guys work on.
It might have made sense previously (for example if the team was
touching a lot of different pieces of Multimedia) but my understanding
from the Wikimania workshops is that the Multimedia team plans to
mostly focus on two main projects this fiscal year: UploadWizard and
Structured Data.
Therefore, I'd like to recommend that we make those two projects
actual "activities", with a dedicated infobox and status updates.
Other, smaller multimedia-related bits like MediaViewer could still be
in the catch-all "Multimedia" activity.
This wouldn't change anything for most of you; the only visible
difference would be that you would report on UploadWizard and
Structured Data on a different page. It would be more consistent with
the rest of WMF engineering, and it would be easier for the rest of
the community to follow your work on each project.
Unless there are strong objections to this proposal, I'm happy to add
the infoboxes myself, but I wanted to ask here first :) Let me know if
you have any questions.
--
Guillaume Paumier
Technical Communications Manager — Wikimedia Foundation