On Fri, May 16, 2014 at 10:28 AM, Pete Forsyth <peteforsyth(a)gmail.com>wrote:
> A quick visit to stats.grok.se indicates that the "Search" feature of
> English Wikipedia was used 63 million times last month [...]
>
Long time ago I played around with the idea that our endeavor for free
knowledge would result in spin-off movements, some of which have already
happened. Like easing the access to current primary sources (Open Access),
guaranteeing the distribution of knowledge (Wikipedia Zero, movement for
net neutrality), and making sure that it can be found with global open
search tools (?).
This last part is mostly uncharted territory, and as the diversity of our
open knowledge repository expands, it seems that we can offer much more
than "just wikipedia text search". Commercial search engines are moving in
the direction of search result mash-ups, mixing text , images, videos,
maps, data... so the user has an idea of which facets of the subject he or
she can explore further.
Even mash-ups generated from Wikimedia sites will never be able to compete
with web search engines, but it can become the primary choice when looking
for free knowledge, specially if synergies with other free knowledge
organizations were sought.
Perhaps there will be gaps, but I am convinced that with the time they will
become less. Maps were missing and OSM has covered that void, there were no
free scores, and then IMSLP appeared. Even for products there is Open
Product Data in the making. And regarding local business seems that
Wikivoyage and OSM have something to offer too, and who knows what will
come next.
I wonder what are your thoughts about the exciting topic of joining forces
with other organizations in the search front to become more than the sum of
the parts.
Micru
On Fri, May 16, 2014 at 1:28 AM, Pete Forsyth <peteforsyth(a)gmail.com> wrote:
> I think it is much more likely that a Wikipedia reader would expect to find
> those images *used in Wikipedia articles* than a massive collection of
> stuff that is somehow tangentially related to Wikipedia in a way that they
> don't fully understand.
>
> So why on earth does the main "multimedia" search link on Wikipedia
> automatically return unused results from Commons to begin with? Is that
> really the right way to go?
I'm breaking out this question since it's a concrete technical
proposal; it should probably also be raised on the multimedia list.
But we should answer it from the perspective "What's best for the
user", rather than have it be driven solely by the NSFW corner case
(which may also appear when searching images used on a project like
en.wp alone).
As a user, I might want to find images to add to an article. Having
results from the central repository presented locally makes it easier
to do so without visiting a separate site. (Consider this from the
perspective of smaller projects especially, where the local search
would be pretty much useless.) This is why VisualEditor presents
Commons search results, as well.
As a user, I might be interested in multimedia about a certain topic I
just read about in Wikipedia. Showing only the results already in the
Wikipedia article(s) about the topic would make it harder to find such
media. Simple example: Let's say I read an article about a city, and I
want to find other historic maps of that city. In many cases, these
maps do exist on Commons but not locally. Should we therefore force
people to visit Commons to find them?
I would argue that from the ordinary user's perspective, the
distinction between Wikipedia/Commons is less important than what they
have in common, i.e. being large repositories of useful educational
content (and hyperbole aside, 99% of Commons is pretty boring stuff).
We could default to displaying locally used media and offer to search
Commons with an extra click. From a usability perspective, you want to
minimize the steps a user has to take, so good UX design would likely
disclose results from Commons either a) always, clearly labeled or b)
when no local results are available.
There's no question that search UX, both on Commons and on Wikipedia,
can be improved. I'm just skeptical that an unbiased evaluation of the
user experience using standard UX heuristics would lead to a design
that hides explicit content from initial search results. Distinguish
different types of content more clearly and make it easier to find the
stuff you want - sure, that's doable.
Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
Hi Jane,
As a continuation of the other thread:
1. Change a page of a djvu
The usual procedure is to download the file, perform changes with tools
like Djvu Solo, and overwrite the previous file with the new one adding a
description of the cannges ("Upload a new version of this file"). This
doesn't delete the old file, it is still accessible, and it can be restored
any time.
2. Dirty metadata from GLAMs
That is a problem I heard many times and there is no easy solution, however
there are better tools now that some years ago. Have you heard of
OpenRefine?
https://code.google.com/p/google-refine/
Commons needs something like that, but to annotate metadata with Wikidata
concepts. Maybe you could write a description on what is needed on the
IdeaLab?https://meta.wikimedia.org/wiki/Grants:IdeaLab
3. Not recognizing the good job done by Commons
This is something not addressed in current designs of the Media Viewer. You
can only thank the person who uploaded the file, but not the one who
curated the metadata, added categories, etc. If you have any idea about how
to send good karma to these users, please do share them in the MediaViewer
feedback page:
https://www.mediawiki.org/wiki/Talk:Multimedia/About_Media_Viewer
I also think that many Wikipedians have the mindset that since English
Wikipedia is the biggest project, all other projects should be subordinated
to their wishes. That creates some tensions (as you probably have seen the
last days). There is no easy solution to this en-wp centric mentality, I
just wish that more shared international projects (like Wikidata) will add
perspective and a better understanding in the long run.
4. Files associated with a concept
Hopefully this will be addressed by new tools like "Wikidata for media info"
https://commons.wikimedia.org/wiki/Commons:Wikidata_for_media_info
5. Users not classifying their data in proper subcategories
It is hard to educate casual users, I guess you could propose a new
notification when a file gets re-categorized. That way users will learn
which category would have been better.
6. Showing gaps in our coverage
Again this depends on "wikidata for commons". When that tool is in place
then it will be possible to create "concept trees" and signal which branch
of the tree doesn't have any file. It is doable, but not trivial for now.
7. Files as parts of a real object item
We are going in that direction, however it is a long ride... again the
"Wikidata for media info" is part of the solution. We'll see...
Cheers,
Micru
On Fri, May 16, 2014 at 11:35 AM, Jane Darnell <jane023(a)gmail.com> wrote:
> David,
>
> I would strongly prefer a system that keeps the parts together, while
> at the same time, keeping all the parts separate and interchangeable.
> I hate that the .djvu files are blobs now, because if I find a better
> scan of an engraving from a book, I would like to replace the crappy
> scan that is in the .djvu file. I suppose you need to keep the version
> you uploaded, but you always want to present the best you have to the
> reader.
>
> I have looked at problems with datasets for a small GLAM, and have
> seen just how bad the data can be. I am mostly a web-surfer of
> poorly-designed GLAM datasets, which is why I have spent many hours
> thinking about these things. I have since given up trying to preach
> the evangelism of open data to GLAMs and started thinking more about
> what Wikipedia can do to curate the world's art. Many GLAMs are
> willing to share their data, but believe me when I say we may not want
> it. The backlog in batch uploads to Commons is not the technical
> upload queue, it's all the data massaging by hand that Wikipedians
> need to do beforehand. That work, which is done by Commons wizards,
> goes largely unrecognized today.
>
> Theoretically, a specific artwork is both a data item and a dataset.
> If you look at our artwork template on Commons you may have noticed
> how it has grown in the past 4 years and is fast becoming a fairly
> comprehensive standard dataset for certain items. The next step is to
> create a way to index these per object (yes we have categories - is
> that really the best we can do?).
>
> For popular artworks that are architectural features, Wiki Loves
> Monuments has harvested so many images of these from all different
> angles that you could probably make the case that Wikimedia Commons
> has more images than any other publication about that specific item.
> If you browse the various language versions and their representation
> of the object, you will notice that individual Wikipedians have
> selected different images, but these are rarely linked to each other
> and the casual Wikipedia reader has no idea that they can probably
> view the object in 3-D if they want to, or see a short movie about how
> it was made. Indeed, let's face it, most casual readers have only
> heard of Wikipedia and are completely unaware of Wikimedia Commons and
> have never heard of Wikimedia Commons categories.
>
> Take the case for the Sagrada Familia:
> https://commons.wikimedia.org/wiki/Category:Sagrada_Fam%C3%ADlia
>
> This category is augmented by a gallery page, with the helpful text
> "The Sagrada Família is an unfinished church in the Catalan city of
> Barcelona, considered to be architect Antoni Gaudí's masterpiece. For
> images of the Holy Family (Jesus, Mary, and Joseph), see Category:Holy
> Family." :
> https://commons.wikimedia.org/wiki/Sagrada_Fam%C3%ADlia
>
> Is this really the best we can do? Has anyone ever stopped and counted
> the rate at which we accumulate photos of the Sagrada Familia each
> year? We don't want to deter people from uploading, because we are
> probably still missing important photos of various internal features.
> But how do we show the gaps in our coverage of this object, while
> presenting an encyclopedic view? The English Wikipedia page includes
> about 40 images with a link to the category, but no other hints for
> media navigation.
>
> This is just one example, there are many more. I would like to see a
> system by which the normal Wiki-collaboration process can be used to
> slowly integrate all of the Commons files into datasets per item, and
> then include these into datasets per city or artist or GLAM or
> whatever. I suppose it should be lists of categories, gallery pages,
> and templates, most of them blank (like the artwork template - you can
> use the fields or not, as long as you include the minimum for the
> upload wizard). Wikidata can help with the template fields as
> properties.
>
> Jane
>
> 2014-05-15 18:14 GMT+02:00, David Cuenca <dacuetu(a)gmail.com>:
> > Jane,
> >
> > Thanks for your input! I never thought as datasets as incorporating
> images,
> > but just as a table (whose elements might point to images, but not
> contain
> > them). Are people in the GLAM scene expecting other files embedded when
> > talking about datasets?
> >
> > Well, if it is a standard format (csv or json), then it is easy to keep
> the
> > whole dataset together, you just need to consider it a text file, and
> then
> > you upload a new one, like any other file in Commons :)
> >
> > Micru
> >
> >
> >
> >
> > On Thu, May 15, 2014 at 5:18 PM, Jane Darnell <jane023(a)gmail.com> wrote:
> >
> >> David,
> >> This is an interesting question. I think that a dataset is just like
> >> any other table such as the ones included in Wikipedia, with lots more
> >> entries and maybe even pieces attached that can't go on Wikipedia such
> >> as pictures, audio, short films, pieces of software code, or other
> >> media.
> >>
> >> So I guess this page should be merged with the DataNamespace page. The
> >> problem is how to reference a dataset or table. Images on Commons are
> >> timestamped with a source link that is often {{self}}, but more often
> >> a weblink somewhere that may or may not die within a year or two.
> >> Since the image is something that you can't really change easily, this
> >> is generally not an issue, but how do you see this with data that can
> >> be manipulated? I don't really see how you can upload datasets as
> >> whole "blobs" that will keep all the pieces together the way a .djvu
> >> file keeps the text with the images.
> >>
> >> Jane
> >>
> >> 2014-05-15 16:46 GMT+02:00, David Cuenca <dacuetu(a)gmail.com>:
> >> > On Thu, May 15, 2014 at 1:42 PM, Cristian Consonni <
> >> kikkocristian(a)gmail.com>
> >> > wrote:
> >> >
> >> >> Thanks for the pointer, "How can I put this open data on Wikidata is
> a
> >> >> question that I have been asked many times", this page was needed.
> >> >>
> >> >
> >> > Thanks for your comment!
> >> >
> >> > On Thu, May 15, 2014 at 3:59 PM, Samuel Klein <meta.sj(a)gmail.com>
> wrote:
> >> >
> >> >> Thanks Micru! I think we should start by including datasets on
> >> >> wikisource, with descriptions about them (storing the files on
> commons
> >> >> where possible). And adding more data formats to the formats
> >> >> accepted on commons.
> >> >>
> >> >
> >> > I don't follow you... why would you put datasets on Wikisource when
> they
> >> > are only used in Wikipedia and have to be stored somewhere else? As it
> >> > is
> >> > now, it doesn't seem a good dataset management solution.
> >> > Besides that it would conflict with its identity as repository for
> >> textual
> >> > sources..
> >> > About Commons I don't know if it is relevant to their mission as a
> >> sharing
> >> > media platform either... I hope someone from their community can share
> >> > their views.
> >> >
> >> > Thanks for the input,
> >> > Micru
> >> > _______________________________________________
> >> > Wikimedia-l mailing list, guidelines at:
> >> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> >> > Wikimedia-l(a)lists.wikimedia.org
> >> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
> ,
> >> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >>
> >> _______________________________________________
> >> Wikimedia-l mailing list, guidelines at:
> >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> >> Wikimedia-l(a)lists.wikimedia.org
> >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> >> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >>
> >
> >
> >
> > --
> > Etiamsi omnes, ego non
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
--
Etiamsi omnes, ego non
On Thu, May 15, 2014 at 1:42 PM, Cristian Consonni <kikkocristian(a)gmail.com>
wrote:
> Thanks for the pointer, "How can I put this open data on Wikidata is a
> question that I have been asked many times", this page was needed.
>
Thanks for your comment!
On Thu, May 15, 2014 at 3:59 PM, Samuel Klein <meta.sj(a)gmail.com> wrote:
> Thanks Micru! I think we should start by including datasets on
> wikisource, with descriptions about them (storing the files on commons
> where possible). And adding more data formats to the formats
> accepted on commons.
>
I don't follow you... why would you put datasets on Wikisource when they
are only used in Wikipedia and have to be stored somewhere else? As it is
now, it doesn't seem a good dataset management solution.
Besides that it would conflict with its identity as repository for textual
sources..
About Commons I don't know if it is relevant to their mission as a sharing
media platform either... I hope someone from their community can share
their views.
Thanks for the input,
Micru
Hi,
During the Zürich Hackathon I met several people that looked for solutions
about how to integrate external open datasets into our projects (mainly
Wikipedia, Wikidata). Since Wikidata is not the right tool to manage them
(reasons explained in the RFC as discussed during the Wikidata session), I
have felt convenient to centralize the discussion about potential
requirements, needs, and how to approach this new changing landscape that
didn't exist a few years ago.
You will find more details here
https://meta.wikimedia.org/wiki/Requests_for_comment/How_to_deal_with_open_…
Your comments, thoughts and ideas are appreciated!
Cheers,
Micru
Hi,
I remember once I shared here some thoughts from Dennett on the importance
of making mistakes<http://ase.tufts.edu/cogstud/dennett/papers/howmista.htm>.
Now I saw this article about his new book, *Intuition Pumps And Other Tools
for Thinking*<http://www.amazon.com/exec/obidos/ASIN/0393082067/braipick-20>,
and I would like to share here also:
http://www.brainpickings.org/index.php/2014/03/28/daniel-dennett-rapoport-r…
"How to compose a successful critical commentary:
1. You should attempt to re-express your target’s position so clearly,
vividly, and fairly that your target says, “Thanks, I wish I’d thought of
putting it that way.
2. You should list any points of agreement (especially if they are not
matters of general or widespread agreement).
3. You should mention anything you have learned from your target.
4. Only then are you permitted to say so much as a word of rebuttal or
criticism."
If someone here read it, please, share your impressions. I'm tempted to do
it.
Tom
--
Everton Zanella Alvarenga (also Tom)
Open Knowledge Brasil - Rede pelo Conhecimento Livre
http://br.okfn.org
Re http://www.bbc.com/news/technology-27407017
Please remember that the EU Courts explicitly allow for a public
interest exemption which almost by definition covers anyone passing
Wikipedia's notability criteria.
Also, please consider the appropriate response for those of us
including Kathy Sierra and myself who have had our names, addresses,
phone numbers and personal identifiers such as identity fraud magnets
like Social Security Numbers belonging to our children and ourselves
"doxed" on e.g. Encyclopedia Dramatica and similar locations.
Asserting that we should have no recourse does not seem particularly
well grounded, and seems to be happening without reasons being
offered.
Best regards,
James Salsman
Hello guys,
Since that is our first participation in an international photo contest, we
discussed locally about ways to have more information about what is working
well, the results and the real impact on commons of all our efforts around
Wiki Loves Earth here in Brasil.
So, the Brazilian User Group studied and specified a tool, developed by
Danilo (Danilo.mac on pt.wiki and member of the user group as well) to read
database information and generate a complete report about the Wiki Loves
Earth, including all participating countries listed on commons.
The tool is hosted on the server wmflabs.org under the URL
http://tools.wmflabs.org/ptwikis/WLE
Through this tool you can have a general idea of what is going on: number
of photos uploaded, photos used on wikis, number of uploaders(with complete
list and registration date by country) and the percentage of many
information, including uploaders registered in May 2014 during the contest.
I guess that tool can help each country to define metrics to evaluate local
efforts/results and can be useful for other local contests as well.
That's our first time organizing this kind of project and we are learning a
lot, until now we received more than 1.000 photos with 94% of uploaders
registered in May 2014(200 new users for the Wikimedia Commons in 13 days).
Now we are planning a bot to provide guidance and keep in touch with that
new users after the contest ends.
Best regards!
Rodrigo Padula
WLE Brazil - Coordinator
Wikimedia Community User Group Brasil
Education Program Coordinator - Ação Educativa / Brazilian Catalyst Program
+55 21 99326-0558