Hi everybody,
With the Structured Data for Commons project about to move into high
gear, it seems to me that there's something the Wikidata community needs
to have a serious discussion about, before APIs start getting designed
and set in stone.
Specifically: when should an object have an item with its own Q-number
created for it on Wikidata? What are the limits? (Are there any limits?)
The position so far seems to be essentially that a Wikidata item has
only been created when an object either already has a fully-fledged
Wikipedia article written for it, or reasonably could have.
So objects that aren't particularly notable typically have not had
Wikidata items made for them.
Indeed, practically the first message Lydia sent to me when I started
trying to work on Commons and Wikidata was to underline to me that
Wikidata objects should generally not be created for individual Commons
files.
But, if I'm reading the initial plans and API thoughts of the Multimedia
team correctly, eg
https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Sl…
and
https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…
there seems to be the key assumption that, for any image that contains
information relating to something beyond the immediate photograph or
scan, there will be some kind of 'original work' item on main Wikidata
that the file page will be able to reference, such that the 'original
work' Wikidata item will be able to act as a place to locate any
information specifically relating to the original work.
Now in many ways this is a very clean division to be able to make. It
removes any question of having to judge "notability"; and it removes any
ambiguity or diversity of where information might be located -- if the
information relates to the original work, then it will be stored on
Wikidata.
But it would appear to imply a potentially *huge* increase in the
inclusion criteria for Wikidata, and the number of Wikidata items
potentially creatable.
So it seems appropriate that the Wikidata community should discuss and
sign off just what should and should not be considered appropriate,
before things get much further.
For example, a year ago the British Library released 1 million
illustrations from out-of-copyright books, which increasingly have been
uploaded to Commons. Recently the Internet Archive has announced plans
to release a further 12 million, with more images either already
uploading or to follow from other major repositories including eg the
NYPL, the Smithsonian, the Wellcome Foundation, etc, etc.
How many of these images, all scanned from old originals, are going to
need new Q-numbers for those originals? Is this okay? Or are some of
them too much?
For example, for maps, cf this data schema
https://docs.google.com/spreadsheets/d/1Hn8VQ1rBgXj3avkUktjychEhluLQQJl5v6W…
, each map sheet will have a separate Northernmost, Southernmost,
Easternmost, Westernmost bounding co-ordinates. Does that mean each map
sheet should have its own Wikidata item?
For book illustrations, perhaps it is would be enough just to reference
the edition of the book. But if individual illustrations have their own
artist and engraver details, does that mean the illustration needs to
have its own Wikidata item? Similarly, if the same engraving has
appeared in many books, is that also a sign that it should have its own
Wikidata item?
What about old photographs, or old postcards, similarly. When should
these have their own Wikidata item? If they have their own known
creator, and creation date, then is it most simple just to give them a
Wikidata item, so that such information about an original underlying
work is always looked for on Wikidata? What if multiple copies of the
same postcard or photograph are known, published or re-published at
different times? But the potential number of old postcards and
photographs, like the potential number of old engravings, is *huge*.
What if an engraving was re-issued in different "states" (eg a
re-issued engraving of a place might have been modified if a tower had
been built). When should these get different items?
At
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts#Wikidat…
where I raised some of these issues a couple of weeks ago, there has
even been the suggestion that particular individual impressions of an
engraving might deserve their own separate items; or even everything
with a separate accession number, so if a museum had three copies of an
engraving, we would make three separate items, each carrying their own
accession number, identifying the accession number that belonged to a
particular File.
(See also other sections at
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts for
further relevant discussions on how to represent often quite complicated
relations with Wikidata properties).
With enough items, we could re-create and represent essentially the
entire FRBR tree.
We could do this. We may even need to do this, if MM team's outline for
Commons is to be implemented in its apparent current form.
But it seems to me that we shouldn't just sleepwalk into it.
It does seem to me that this does represent (at least potentially) a
*very* large expansion in the number of items, and widening of the
inclusion criteria, for what Wikidata is going to encompass.
I'm not saying it isn't the right thing to do, but given the potential
scale of the implications, I do think it is something we do need to have
properly worked through as a community, and confirmed that it is indeed
what we *want* to do.
All best,
James.
(Note that this is a slightly different discussion, though related, to
the one I raised a few weeks ago as to whether Commons categories -- eg
for particular sets of scans -- should necessarily have their own
Q-number on Wikidata. Or whether some -- eg some intersection
categories -- should just have an item on Commons data. But it's
clearly related: is the simplest thing just to put items for everything
on Wikidata? Or does one try to keep Wikidata lean, and no larger than
it absolutely needs to be; albeit then having to cope with the
complexity that some categories would have a Q-number, and some would not.)
FYI:
http://multimedia-alpha.wmflabs.org/wiki/Lightbox_demo
Open a lightbox, click on the settings/cog icon near the top right of
your screen, and be amazed.
That is all.
(warning: alpha quality, don't expect it to be perfect, meant for testing
by our developers and designers, but no reason not to let y'all try it!)
--
Mark Holmquist
Software Engineer, Multimedia
Wikimedia Foundation
mtraceur(a)member.fsf.org
https://wikimediafoundation.org/wiki/User:MHolmquist
Greetings!
We invite you to join our next Structured Data Q&A on IRC office hours next Thursday, Oc. 16, at 18:00 UTC.
Our Multimedia team and the Wikidata team will be on hand for this discussion, as well as some of the community volunteers who are helping guide this project, such as Multichill and TheDJ.
During this hour-long IRC chat, we will discuss our next steps for this Structured Data project, and give you an update on our bootcamp in Berlin. Please RSVP here, so we know who plans to attend:
https://commons.wikimedia.org/wiki/Commons:Structured_data#Discussions
Early next week, we will update our Structured Data pages with our latest work on this project, and send another email to invite you to review them.
And if you are based in Europe, we also invite you to join the Amsterdam Hackathon on November 14-16 , 2014. Many of us will be at this event, and plan to give more updates as well as do some hacking together. You can register here:
https://docs.google.com/a/wikimedia.org/forms/d/1Gpvz3BH5Y4dqSIiwv2HE_X8fOP…
Please spread the word in your community, and invite them to join this chat, and/or the hackathon.
We look forward to a productive discussions with many of you tomorrow.
Regards as ever,
Fabrice — for the Structured Data team
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)
Hi all,
starting this Tuesday (on Commons) or Thursday (all other wikis), files
which do not have machine-parseable author, source, license or description
will be automatically added to tracking categories (one category for each).
The name of the categories will be determined by the following messages:
commonsmetadata-trackingcategory-no-license
commonsmetadata-trackingcategory-no-description
commonsmetadata-trackingcategory-no-author
commonsmetadata-trackingcategory-no-source
Translatewiki link:
https://translatewiki.net/w/i.php?title=Special:Translate&group=ext-commons…
If you would rather not have these tracking categories on your wiki, you
can achieve that by setting the content of the local message to "-" (a
single dash character).
Links to the local message pages are available from
[[Special:TrackingCategories]].
Hi everyone.
tl;dr: Can we do https://gerrit.wikimedia.org/r/164476
Now that the pre-requisite patches for using VIPS with tiff has been
merged (Woo!), lets umm use it.
So for those who don't know what vips is, vips is an alternative to
image magick which can scale certain file formats in essentially
constant memory (Or probably to be pedantic, linear in the number of
pixels in the resulting file, instead of linear in the number of
pixels in the source). This means we would be able to make thumbnails
no matter how big the source file is. Which is good because we have
lots of very high resolution tiff files, such as [[File:Zoomit2.tif]]
and [[File:Zentralbibliothek Zürich - Mittelalterliche Stadt -
000005203.tif]]. We already use VIPS to scale png files larger than 20
megapixels, and non-progressive jpeg files can be scaled efficiently
with image magick, so tiff is the current pain point in terms of
scaling limits (although GIF is also painful).
I would like to propose the following:
First we experiment with turning it on for files > 50 megapixels.
Currently we do not even try to render such files, so I doubt this
will cause any community angst. To that end I proposed a patch (
https://gerrit.wikimedia.org/r/164476 ) that uses the following
settings:
array(
'conditions' => array(
'mimeType' => 'image/tiff',
'minShrinkFactor' => 1.2,
'minArea' => 5e7,
),
'sharpen' => array( 'sigma' => 0.8 ),
)
This will turn the feature on for big files (which currently do not
render), and also enable sharpening (Most tiff images benefit from it
and the community has asked for it repeatedly, I think its less
disruptive to enable sharpening at the same time as VIPS, instead of
two separate changes to tiff rendering).
I would propose we let that sit for a little bit. We should than have
a community discussion (With the commons community, since its hard to
have a discussion with every community, and commons (+esp. Glams) are
the people who care the most about this) to see if the community likes
that. Hopefully if all is well we could move to stage 2, which would
be something like:
array(
'conditions' => array(
'mimeType' => 'image/tiff',
'minShrinkFactor' => 1.2,
),
'sharpen' => array( 'sigma' => 0.8 ),
),
array(
'conditions' => array(
'mimeType' => 'image/tiff',
),
),
Anyways, thoughts. Does this sound like a good plan? Someone want to
be bold and deploy my change ;)
--bawolff
Hi guys,
Geni brought up a good point that Media Viewer doesn’t provide a warning when users click to enlarge huge files (e.g.: 400 Mb) on our talk page:
https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer#Media_Vie…
This is not a new issue, as this is the same functionality we have provided for years on the File: page. But Media Viewer makes it a lot easier for users to accidentally load a huge file. So I think we should seriously consider providing a warning, if it is easy to implement and if we can identify a threshold that is based on data and that is acceptable to our communities.
Do any of you have data on what the threshold might be for identifying file sizes that might crash your browser? Or do you know what best practices are on that point? It would be good if we could agree on a limit that is at least partly informed by data.
If there is no reliable data or best practices, we might have to determine this threshold together arbitrarily, based on common sense. In that case, what do you think would be a reasonable threshold when we would start giving the warning? 50Mb or above? 100Mb or above?
For now, I just filed this ticket #933 to track this issue:
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/933
Thanks for any recommendations you might have,
Fabrice
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)