Hi Jane,
As a continuation of the other thread:
1. Change a page of a djvu
The usual procedure is to download the file, perform changes with tools like Djvu Solo, and overwrite the previous file with the new one adding a description of the cannges ("Upload a new version of this file"). This doesn't delete the old file, it is still accessible, and it can be restored any time.
2. Dirty metadata from GLAMs
That is a problem I heard many times and there is no easy solution, however there are better tools now that some years ago. Have you heard of OpenRefine? https://code.google.com/p/google-refine/
Commons needs something like that, but to annotate metadata with Wikidata concepts. Maybe you could write a description on what is needed on the IdeaLab?https://meta.wikimedia.org/wiki/Grants:IdeaLab
3. Not recognizing the good job done by Commons
This is something not addressed in current designs of the Media Viewer. You can only thank the person who uploaded the file, but not the one who curated the metadata, added categories, etc. If you have any idea about how to send good karma to these users, please do share them in the MediaViewer feedback page: https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer
I also think that many Wikipedians have the mindset that since English Wikipedia is the biggest project, all other projects should be subordinated to their wishes. That creates some tensions (as you probably have seen the last days). There is no easy solution to this en-wp centric mentality, I just wish that more shared international projects (like Wikidata) will add perspective and a better understanding in the long run.
4. Files associated with a concept
Hopefully this will be addressed by new tools like "Wikidata for media info" https://commons.wikimedia.org/wiki/Commons:Wikidata_for_media_info
5. Users not classifying their data in proper subcategories
It is hard to educate casual users, I guess you could propose a new notification when a file gets re-categorized. That way users will learn which category would have been better.
6. Showing gaps in our coverage
Again this depends on "wikidata for commons". When that tool is in place then it will be possible to create "concept trees" and signal which branch of the tree doesn't have any file. It is doable, but not trivial for now.
7. Files as parts of a real object item
We are going in that direction, however it is a long ride... again the "Wikidata for media info" is part of the solution. We'll see...
Cheers, Micru
On Fri, May 16, 2014 at 11:35 AM, Jane Darnell jane023@gmail.com wrote:
David,
I would strongly prefer a system that keeps the parts together, while at the same time, keeping all the parts separate and interchangeable. I hate that the .djvu files are blobs now, because if I find a better scan of an engraving from a book, I would like to replace the crappy scan that is in the .djvu file. I suppose you need to keep the version you uploaded, but you always want to present the best you have to the reader.
I have looked at problems with datasets for a small GLAM, and have seen just how bad the data can be. I am mostly a web-surfer of poorly-designed GLAM datasets, which is why I have spent many hours thinking about these things. I have since given up trying to preach the evangelism of open data to GLAMs and started thinking more about what Wikipedia can do to curate the world's art. Many GLAMs are willing to share their data, but believe me when I say we may not want it. The backlog in batch uploads to Commons is not the technical upload queue, it's all the data massaging by hand that Wikipedians need to do beforehand. That work, which is done by Commons wizards, goes largely unrecognized today.
Theoretically, a specific artwork is both a data item and a dataset. If you look at our artwork template on Commons you may have noticed how it has grown in the past 4 years and is fast becoming a fairly comprehensive standard dataset for certain items. The next step is to create a way to index these per object (yes we have categories - is that really the best we can do?).
For popular artworks that are architectural features, Wiki Loves Monuments has harvested so many images of these from all different angles that you could probably make the case that Wikimedia Commons has more images than any other publication about that specific item. If you browse the various language versions and their representation of the object, you will notice that individual Wikipedians have selected different images, but these are rarely linked to each other and the casual Wikipedia reader has no idea that they can probably view the object in 3-D if they want to, or see a short movie about how it was made. Indeed, let's face it, most casual readers have only heard of Wikipedia and are completely unaware of Wikimedia Commons and have never heard of Wikimedia Commons categories.
Take the case for the Sagrada Familia: https://commons.wikimedia.org/wiki/Category:Sagrada_Fam%C3%ADlia
This category is augmented by a gallery page, with the helpful text "The Sagrada Família is an unfinished church in the Catalan city of Barcelona, considered to be architect Antoni Gaudí's masterpiece. For images of the Holy Family (Jesus, Mary, and Joseph), see Category:Holy Family." : https://commons.wikimedia.org/wiki/Sagrada_Fam%C3%ADlia
Is this really the best we can do? Has anyone ever stopped and counted the rate at which we accumulate photos of the Sagrada Familia each year? We don't want to deter people from uploading, because we are probably still missing important photos of various internal features. But how do we show the gaps in our coverage of this object, while presenting an encyclopedic view? The English Wikipedia page includes about 40 images with a link to the category, but no other hints for media navigation.
This is just one example, there are many more. I would like to see a system by which the normal Wiki-collaboration process can be used to slowly integrate all of the Commons files into datasets per item, and then include these into datasets per city or artist or GLAM or whatever. I suppose it should be lists of categories, gallery pages, and templates, most of them blank (like the artwork template - you can use the fields or not, as long as you include the minimum for the upload wizard). Wikidata can help with the template fields as properties.
Jane
2014-05-15 18:14 GMT+02:00, David Cuenca dacuetu@gmail.com:
Jane,
Thanks for your input! I never thought as datasets as incorporating
images,
but just as a table (whose elements might point to images, but not
contain
them). Are people in the GLAM scene expecting other files embedded when talking about datasets?
Well, if it is a standard format (csv or json), then it is easy to keep
the
whole dataset together, you just need to consider it a text file, and
then
you upload a new one, like any other file in Commons :)
Micru
On Thu, May 15, 2014 at 5:18 PM, Jane Darnell jane023@gmail.com wrote:
David, This is an interesting question. I think that a dataset is just like any other table such as the ones included in Wikipedia, with lots more entries and maybe even pieces attached that can't go on Wikipedia such as pictures, audio, short films, pieces of software code, or other media.
So I guess this page should be merged with the DataNamespace page. The problem is how to reference a dataset or table. Images on Commons are timestamped with a source link that is often {{self}}, but more often a weblink somewhere that may or may not die within a year or two. Since the image is something that you can't really change easily, this is generally not an issue, but how do you see this with data that can be manipulated? I don't really see how you can upload datasets as whole "blobs" that will keep all the pieces together the way a .djvu file keeps the text with the images.
Jane
2014-05-15 16:46 GMT+02:00, David Cuenca dacuetu@gmail.com:
On Thu, May 15, 2014 at 1:42 PM, Cristian Consonni <
kikkocristian@gmail.com>
wrote:
Thanks for the pointer, "How can I put this open data on Wikidata is
a
question that I have been asked many times", this page was needed.
Thanks for your comment!
On Thu, May 15, 2014 at 3:59 PM, Samuel Klein meta.sj@gmail.com
wrote:
Thanks Micru! I think we should start by including datasets on wikisource, with descriptions about them (storing the files on
commons
where possible). And adding more data formats to the formats accepted on commons.
I don't follow you... why would you put datasets on Wikisource when
they
are only used in Wikipedia and have to be stored somewhere else? As it is now, it doesn't seem a good dataset management solution. Besides that it would conflict with its identity as repository for
textual
sources.. About Commons I don't know if it is relevant to their mission as a
sharing
media platform either... I hope someone from their community can share their views.
Thanks for the input, Micru _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Etiamsi omnes, ego non _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
David, Thanks for your thoughtful reply. Yes I think if I sat down I could probably type up about 20 things to go into the Idea Lab, but it's quite a small group of people that ever reads anything in there, and most of the readers are just quickly checking other ideas to make sure they have filled in their own idea correctly.
One problem is that there is very little human interaction at all on Commons. I was personally very sad to see this user [1] leave the project, but I can't help but agree with him/her. Sometimes I wonder what I am doing there myself! I just checked, and I have only left messages on 21 other Commons user pages since 2006. Of those 21 users, I have personally met only 4 of them. I can safely say I do not use Commons as a social media website.
On the subject of dirty metadata from GLAMS, I am afraid that we concentrate too much on metadata from GLAMs and I would like to see some project work on our own metadata. Lots of it is dirty, but there are some featured items in there that can be used to clean lots of the dirty stuff to a higher level of quality and we should do that. I am worried though when I read things from experienced Wikipedians along the lines of getting rid of the creator and institution templates, or even categories. [2] The whole power of Wikipedia lies in the low threshold of technical knowledge necessary for participation. We need to keep the entrance gate as wide open as possible. I see it as a sign of failure on the part of Commons that I rarely use the new&improved default upload wizard because it almost never fits my needs. I find it annoying that I need to use my own link to use the old uploader. When searching for new solutions, let's not throw away what we have, but try to incorporate innovative approaches on top of the technical landscape we already created.
I don't expect newbies to understand and use categories, but it should be easier to identify and fix uncategorized items, and we need to lower thresholds of participation. I think your suggestion of uploading a new version of a djvu file shows an example of a high threshold (the participant must have djvu software to be able to update the file). In the case of the Sagrada Familia, there is a lot more necessary to consolidate our knowledge about it than just media files and text. We need improved search methods and associated navigational tooling, as well as a way to aggregate the number of projects involved in a quick overview. It's more useful if you can see that more than 100 projects carry information about it, than if it was in just 1 or two projects. Even Google just references the article in your default language. I would also like to see redirects and disambiguation pages exported to a central place, so search will improve across language boundaries. I've noticed that it's often spelling differences that keeps Google search better than Wikipedia at locating Wikipedia stuff.
As far as the English-centric vision goes, I don't feel one way or another about it. English is as good a "lingua franca" as any other, as long as it's one basic language. I have said before I am cool with this being Chinese, as long as it's consistent. My main problem with Wikidata is the spotty coverage of labels, and if German was the default language and was completely filled in for all data item labels, I would be totally happy to put German in my Google settings from now on. I get frustrated when people get into arguments about the "database-ification of Wikipedia", as if that is a bad thing. It is unrealistic to think we can make everything machine-readable, just as it is unrealistic to think we can keep the reading machines outside the gates. We need to satisfy both our power users and our power editors, and the only way to do that is to keep low-thresholds of participation for both groups. That low threshold of participation is first and foremost for our casual mobile readers who become Wiki(p/m)edians when they upload one thing and go away for ever. We need them too, and though I don't want to force them into learning about our category structures, I do want to serve them a few tips so they can leave the default uploader and take a look around the landscape. The main thing is, we need to share more about where we are going as we get there, so users don't leave because they get tired of waiting.
[1] https://commons.wikimedia.org/w/index.php?title=User:Boo-Boo_Baroo&oldid... [2] https://commons.wikimedia.org/wiki/Commons_talk:Wikidata_for_media_info
Jane
2014-05-16 14:20 GMT+02:00, David Cuenca dacuetu@gmail.com:
Hi Jane,
As a continuation of the other thread:
- Change a page of a djvu
The usual procedure is to download the file, perform changes with tools like Djvu Solo, and overwrite the previous file with the new one adding a description of the cannges ("Upload a new version of this file"). This doesn't delete the old file, it is still accessible, and it can be restored any time.
- Dirty metadata from GLAMs
That is a problem I heard many times and there is no easy solution, however there are better tools now that some years ago. Have you heard of OpenRefine? https://code.google.com/p/google-refine/
Commons needs something like that, but to annotate metadata with Wikidata concepts. Maybe you could write a description on what is needed on the IdeaLab?https://meta.wikimedia.org/wiki/Grants:IdeaLab
- Not recognizing the good job done by Commons
This is something not addressed in current designs of the Media Viewer. You can only thank the person who uploaded the file, but not the one who curated the metadata, added categories, etc. If you have any idea about how to send good karma to these users, please do share them in the MediaViewer feedback page: https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer
I also think that many Wikipedians have the mindset that since English Wikipedia is the biggest project, all other projects should be subordinated to their wishes. That creates some tensions (as you probably have seen the last days). There is no easy solution to this en-wp centric mentality, I just wish that more shared international projects (like Wikidata) will add perspective and a better understanding in the long run.
- Files associated with a concept
Hopefully this will be addressed by new tools like "Wikidata for media info" https://commons.wikimedia.org/wiki/Commons:Wikidata_for_media_info
- Users not classifying their data in proper subcategories
It is hard to educate casual users, I guess you could propose a new notification when a file gets re-categorized. That way users will learn which category would have been better.
- Showing gaps in our coverage
Again this depends on "wikidata for commons". When that tool is in place then it will be possible to create "concept trees" and signal which branch of the tree doesn't have any file. It is doable, but not trivial for now.
- Files as parts of a real object item
We are going in that direction, however it is a long ride... again the "Wikidata for media info" is part of the solution. We'll see...
Cheers, Micru
On Fri, May 16, 2014 at 11:35 AM, Jane Darnell jane023@gmail.com wrote:
David,
I would strongly prefer a system that keeps the parts together, while at the same time, keeping all the parts separate and interchangeable. I hate that the .djvu files are blobs now, because if I find a better scan of an engraving from a book, I would like to replace the crappy scan that is in the .djvu file. I suppose you need to keep the version you uploaded, but you always want to present the best you have to the reader.
I have looked at problems with datasets for a small GLAM, and have seen just how bad the data can be. I am mostly a web-surfer of poorly-designed GLAM datasets, which is why I have spent many hours thinking about these things. I have since given up trying to preach the evangelism of open data to GLAMs and started thinking more about what Wikipedia can do to curate the world's art. Many GLAMs are willing to share their data, but believe me when I say we may not want it. The backlog in batch uploads to Commons is not the technical upload queue, it's all the data massaging by hand that Wikipedians need to do beforehand. That work, which is done by Commons wizards, goes largely unrecognized today.
Theoretically, a specific artwork is both a data item and a dataset. If you look at our artwork template on Commons you may have noticed how it has grown in the past 4 years and is fast becoming a fairly comprehensive standard dataset for certain items. The next step is to create a way to index these per object (yes we have categories - is that really the best we can do?).
For popular artworks that are architectural features, Wiki Loves Monuments has harvested so many images of these from all different angles that you could probably make the case that Wikimedia Commons has more images than any other publication about that specific item. If you browse the various language versions and their representation of the object, you will notice that individual Wikipedians have selected different images, but these are rarely linked to each other and the casual Wikipedia reader has no idea that they can probably view the object in 3-D if they want to, or see a short movie about how it was made. Indeed, let's face it, most casual readers have only heard of Wikipedia and are completely unaware of Wikimedia Commons and have never heard of Wikimedia Commons categories.
Take the case for the Sagrada Familia: https://commons.wikimedia.org/wiki/Category:Sagrada_Fam%C3%ADlia
This category is augmented by a gallery page, with the helpful text "The Sagrada Família is an unfinished church in the Catalan city of Barcelona, considered to be architect Antoni Gaudí's masterpiece. For images of the Holy Family (Jesus, Mary, and Joseph), see Category:Holy Family." : https://commons.wikimedia.org/wiki/Sagrada_Fam%C3%ADlia
Is this really the best we can do? Has anyone ever stopped and counted the rate at which we accumulate photos of the Sagrada Familia each year? We don't want to deter people from uploading, because we are probably still missing important photos of various internal features. But how do we show the gaps in our coverage of this object, while presenting an encyclopedic view? The English Wikipedia page includes about 40 images with a link to the category, but no other hints for media navigation.
This is just one example, there are many more. I would like to see a system by which the normal Wiki-collaboration process can be used to slowly integrate all of the Commons files into datasets per item, and then include these into datasets per city or artist or GLAM or whatever. I suppose it should be lists of categories, gallery pages, and templates, most of them blank (like the artwork template - you can use the fields or not, as long as you include the minimum for the upload wizard). Wikidata can help with the template fields as properties.
Jane
2014-05-15 18:14 GMT+02:00, David Cuenca dacuetu@gmail.com:
Jane,
Thanks for your input! I never thought as datasets as incorporating
images,
but just as a table (whose elements might point to images, but not
contain
them). Are people in the GLAM scene expecting other files embedded when talking about datasets?
Well, if it is a standard format (csv or json), then it is easy to keep
the
whole dataset together, you just need to consider it a text file, and
then
you upload a new one, like any other file in Commons :)
Micru
On Thu, May 15, 2014 at 5:18 PM, Jane Darnell jane023@gmail.com wrote:
David, This is an interesting question. I think that a dataset is just like any other table such as the ones included in Wikipedia, with lots more entries and maybe even pieces attached that can't go on Wikipedia such as pictures, audio, short films, pieces of software code, or other media.
So I guess this page should be merged with the DataNamespace page. The problem is how to reference a dataset or table. Images on Commons are timestamped with a source link that is often {{self}}, but more often a weblink somewhere that may or may not die within a year or two. Since the image is something that you can't really change easily, this is generally not an issue, but how do you see this with data that can be manipulated? I don't really see how you can upload datasets as whole "blobs" that will keep all the pieces together the way a .djvu file keeps the text with the images.
Jane
2014-05-15 16:46 GMT+02:00, David Cuenca dacuetu@gmail.com:
On Thu, May 15, 2014 at 1:42 PM, Cristian Consonni <
kikkocristian@gmail.com>
wrote:
Thanks for the pointer, "How can I put this open data on Wikidata is
a
question that I have been asked many times", this page was needed.
Thanks for your comment!
On Thu, May 15, 2014 at 3:59 PM, Samuel Klein meta.sj@gmail.com
wrote:
Thanks Micru! I think we should start by including datasets on wikisource, with descriptions about them (storing the files on
commons
where possible). And adding more data formats to the formats accepted on commons.
I don't follow you... why would you put datasets on Wikisource when
they
are only used in Wikipedia and have to be stored somewhere else? As it is now, it doesn't seem a good dataset management solution. Besides that it would conflict with its identity as repository for
textual
sources.. About Commons I don't know if it is relevant to their mission as a
sharing
media platform either... I hope someone from their community can share their views.
Thanks for the input, Micru _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Etiamsi omnes, ego non _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Etiamsi omnes, ego non _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org