Hi
In case no one noticed it: I opened this bug last week, which prevents me
from finalising my mass upload:
<https://bugzilla.wikimedia.org/show_bug.cgi?id=68285>
Would be awesome if anyone had some time to look into it!
Thanks,
--
Jean-Frédéric
-------- Original Message --------
Subject: [Wikidata-l] Fwd: Structured Data on Commons
Date: Sun, 24 Aug 2014 14:19:15 +0200
From: Lydia Pintscher <lydia.pintscher(a)wikimedia.de>
Reply-To: Discussion list for the Wikidata project.
<wikidata-l(a)lists.wikimedia.org>
To: Discussion list for the Wikidata project.
<wikidata-l(a)lists.wikimedia.org>
Hey folks :)
We're starting to pick up speed with structured data support for
Commons now. It'd be great to have you all on board for this. Planning
documents are linked below. I hope to see many of you interested in
multimedia at the office hour on September 3rd.
Cheers
Lydia
---------- Forwarded message ----------
From: Fabrice Florin <fflorin(a)wikimedia.org>
Date: Fri, Aug 22, 2014 at 10:48 AM
Subject: Structured Data on Commons
To: Multimedia Mailing List <multimedia(a)lists.wikimedia.org>
Greetings!
We invite you to join a discussion about Structured Data on Commons,
to help us plan our next steps for this project.
The Structured Data initiative proposes to store and retrieve
information for media files in machine-readable data on Wikimedia
Commons, using Wikidata tools and practices, as described on our new
project page (1).
The purpose of this project is to make it easier for users to read and
write file information, and to enable developers to build better tools
to view, search, edit, curate and use media files. To that end, we
propose to investigate this opportunity together through community
discussions and small experiments. If these initial tests are
successful, we would develop new tools and practices for structured
data, then work with our communities to gradually migrate unstructured
data into a machine-readable format over time.
The Multimedia team and the Wikidata team are starting to plan this
project together, in collaboration with many community volunteers
active on Wikimedia Commons and other wikis. We had a truly inspiring
roundtable discussion about Structured Data at Wikimania a few weeks
ago, to define a first proposal together (2).
We would now like to extend this discussion to include more community
members that might benefit from this initiative. Please take a moment
to read the project overview on Commons, then let us know what you
think, by answering some of the questions on its talk page (3).
We also invite you to join a Structured Data Q&A on Wednesday
September 3 at 19:00 UTC, so we can discuss some of the details live
in this IRC office hours chat. Please RSVP if you plan to attend (4).
Lastly, we propose to form small workgroups to investigate workflows,
data structure, research, platform, features, migration and other open
issues. If you are interested in contributing to one of these
workgroups, we invite you to sign up on directly on our hub page (5)
-- and help start a sub-page for your workgroup.
We look forward to some productive discussions with you in coming
weeks. In previous roundtables, many of you told us this is the most
important contribution that our team can make to support multimedia in
coming years. We heard you loud and clear and are happy to devote more
resources to bring it to life, with your help.
We are honored to be working with the Wikidata team and talented
community members like you to take on this challenge, improve our
infrastructure and provide a better experience for all our users.
Onward!
Fabrice — for the Structured Data team
(1) Structured Data Hub on Commons:
https://commons.wikimedia.org/wiki/Commons:Structured_data
(2) Structured Data Slides:
https://commons.wikimedia.org/wiki/File:Structured_Data_-_Slides.pdf
(3) Structured Data Talk Page:
https://commons.wikimedia.org/wiki/Commons_talk:Structured_data
(4) Structured Data Q&A (IRC chat on Sep. 3):
https://commons.wikimedia.org/wiki/Commons:Structured_data#Discussions
(5) Structured Data Workgroups:
https://commons.wikimedia.org/wiki/Commons:Structured_data#Workgroups
_______________________________
Fabrice Florin
Product Manager, Multimedia
Wikimedia Foundation
https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
So now Wikimania has been and gone, can we think about where we're at
with the Structured Data initative for Commons ?
In particular Liam, I think you said after one of the sessions you were
"all over this" -- it would be good to know your thoughts.
I do think, as the GW toolset community, we ought to have a lot we
should be able to offer here, because essentially we are doing big
uploads from data which is *already* structured, so
(i) we've got at least some experience already with working with data
that is at least in some form structured
(ii) we may know and be able to flag some awkward edge cases
(iii) we would like to accompany uploads with data that can be "born
structured", rather than converted later
(iv) in any case we're uploading a lot of images, which somebody is
going to have to convert to structured
(v) we may have seen (or even written) some of the gnarlier templates
on Commons, that migration will have to cope with.
It's not clear (at least not yet to me) how the Multimedia and Wikidata
teams may best want to be communicated with, but I'm including Keegan
(WMF) in cc:, who I think is the staffer with assigned community liaison
responsibility.
The biggest message to me from looking through some of the documents
after the meetings is just how much of the information is going to be
stored as part of central main Wikidata.
Essentially, if we upload an image of an object, then it is expected
that an 'item' (ie a Q-number) for that object will be added to
Wikidata, which will contain all the metadata that describes the object
rather than just the image.
The Wikidata community is already developing a very strong ontology to
describe such objects -- key resources are
https://www.wikidata.org/wiki/Wikidata:WikiProject_Visual_arts/Item_structu…https://www.wikidata.org/wiki/Wikidata:WikiProject_Books
where there are active and friendly communities involved in refining them.
We can get involved and help the process right now, by trying to
identify and fill and gaps in these ontologies, and by being
enthusiastic early adopters -- there is no reason we should not be
getting involved right now, filling in appropriate metadata on Wikidata
right now each time we upload an image to Commons -- real-world testing
the current ontologies to see what creaks.
Data specific to the image itself (rather than what it shows) will be
stored in a separate Commons Wikibase.
This will include such things as the file name, a file description,
photographer, wikicontributor name, precise geographical location etc.
Commons Wikibase is also likely to contain a tag-like "topic list" -- a
list of all the Wikidata Q-numbers that apply to the image. These I
think will be gathered by climbing up the Wikidata tree from any
specified Subject identified for the image -- so a view of Westminster
Abbey might get topics such as "Westminster; London; England; Cathedral;
religious building" etc; and games will be invented to encourage people
to identify more such topics for the best images.
This should allow WM to introduced a proper combinatorial search engine
based on tags for Commons; and many of the most egregious Commons
intersection categories will wither on the vine. (There is debate as to
whether Commons will end up needing *any* category pages, but I suspect
it will, because they are just so convenient to use as places for
jotting down facts -- on the other hand, it is possible one might be
forced to create an associated Commons article/gallery for that).
It would be nice (IMO) if there could be an interface to the topic list
through the wikisource code for the filepage -- I think this would be
well-received by the community, allow easy adaptation of existing bots,
etc. But this may be resisted as being too fragile a point of failure,
as it would mean that people making hand-edits would have to know (and
get right) the meaningless number-strings of individual Q-numbers.
Finally some very specific text data -- such as the EXIF data describing
shutter-speed etc, is likely to continue to live on the file description
page; because it's probably not something that people are primarily
going to want to search, and it may be a bit unpredictable.
Part of the immediate effort in the next few weeks is going to be to
produce clearer ideas about what information is going to live where, and
in particular what information is going to live on the Commons Wikibase,
and how it will be structured.
The good news is that much of the most complicated information will be
stored on WikiData, so can be as detailed as we like (and can be
accessed live now).
On the other hand, the design for Commons Wikibase will initially aim to
be as simple as possible, with the aim to evolve it as experience is
gained, to migrate the edge cases later.
The file description page (or something not entirely unlike it) will
continue to exist as a view bringing together all the data.
Current templates will be re-written to draw information from Wikidata.
However, this won't yet be possible until the Wikidata team has
implemented the "Arbitrary Access" feature -- the ability for a wikipage
to access the properties of an arbitrary Wikidata Q-number. What's
causing the hold-up is that if the properties of the Q-number item are
edited, then all the pages that access that Q-number need to be marked
as dirty and regenerated. That's easy if you only have one page that
can access the Q-number, but hard if arbitrary pages can access it,
through a chain of properties.
(eg: the file page for a painting Q12345 may use property Pnnn to its
creator Q4567 who has property Pxxx, a date of birth. If the date of
birth gets made more precise, the system has to recurse back to indicate
that all the file pages showing pictures of that creator's work need to
be regererated. This is tough, but file-page templates won't be able to
draw on Wikidata information until it is in place).
It is progressively hoped to simplify the myriad of different templates
used on the file pages as quickly as possible, to standardise them to
draw from the structured data stores.
Templates to display summary information about collection objects, which
will draw from Wikidata, may well be standardised so they can easily be
used on Wikipedias and other wikis -- or, to put that the other way
round, since Wikipedias and other wikis will also be developing
standardised templates to display summary object information, it should
well be possible to use the same code twice.
However, it would be good to get involved in the development of these
templates, to make sure they accurately reflect the information we
currently like to show in Commons.
(There may be some important details to get right -- for example the
Wikidata data-type for dates currently comprises a 'best' value, and an
optional numeric range (which is great for sorting). But if the
catalogue source data says eg "mid 17th century to early 18th century",
do we want to make sure that precise string is still stored? And should
it still be possible to make it visible? This needs close engagement;
but probably principally with the community-based development effort in
the Wikidata community groups.
Already very standardised are the present Commons creator templates and
Commons institution templates. These are likely to be an early quick win.
Looking down a typical present-day filepage, that means that it is the
Source/Photographer information in the present "Artist" template, which
is currently free-form and often a freely composed pull together of
multiple different sources of metadata, that is likely to be going to
need the most work to unpick.
This is also the field most commonly used for the credit link-back
templates to the originating GLAM institutions, which are obviously a
key consideration for our GLAM partners.
These templates may currently often be very institution-specific, and
may do quite complex stuff -- eg the present version of the British Library
https://commons.wikimedia.org/wiki/Template:British_Library_image
as used at eg
https://commons.wikimedia.org/wiki/File:Cuthbert_discovers_piece_of_timber_…
contains link-backs to a number of catalogues, each with their own
corresponding text; and as well as linking back to the information about
the underlying object (which is likely to be stored on WikiData), it
will also likely contain a link-back to the source of the original file
(in this case the specific file at BL images online), which being
information specific to the file is likely to be living on the Commons
Wikibase.
The Source/Photographer field as a whole is (I think) likely to be one
of the last on the file page to be assimilated, because it can be so
sui-generis, and so the present rats nest of templates may continue to
be acceptable for some time -- though even they are likely to need
modification, as eg Photographer information moves to the Commons Wikibase.
That said, each institution is only going to need to manage its own
template.
But it probably would make sense to start an effort to think
* what is the structured data that typically lives in these templates?
* and is there some standardisation we could start to get into the
box, even now
Apart from anything else, something readily customisable might be much
easier for new institutions to adapt and adopt.
For the migration project as a whole, an audit of all the source
templates of this sort would be useful. That is something the MM/WD
project team could perhaps usefully encourage the community to undertake
for them.
I have to admit there are lines I am not sure about, as to what gets a
Wikidata entry and what does not.
For example, when does a photograph deserve its own entry?
Perhaps a bright line is that an image of a photograph one took oneself
doesn't get an entry on Wikidata, but a photograph by Man Ray perhaps does.
What about a photograph by a photographer by more intermediate
notability? Or instead, perhaps an engraving from a book of 19th
century engravings?
It makes sense to create an identifier for the book on Wikidata; and
also the place depicted. This is often almost enough to identify the
particular image, but really one would want to store the page number,
and perhaps the scan number as well. (Since one might well have either
one or the other or both). It would probably be good to store some
identifier for the set of scans as well -- this too probably doesn't
belong on Wikidata, (although one might identify it as set number
<identifer> from eg the Mechanical Curator collection, which itself
probably then *would* get a Wikidata identifier).
So the Commons wikibase probably needs to be able to identify images as
having a sequence in a particular set, and that set as perhaps having an
identifer that links it to a collection which has a particular Q-number
on Wikidata.
This is the kind of thinking we will particularly need to be doing over
the next few weeks -- what is the metadata that will *not* be stored on
Wikidata, so will *need* to be stored on the Commons Wikibase if it is
to continue to be accessible? That is something that we as the
community need to evolve, thinking of all the use cases we can.
I am sure that there was something else I meant to say, but this email
seems long enough already.
There's a scratchpad of some bookmarks I started keeping on a subpage of
my userpage at Wikidata that people are welcome to,
https://www.wikidata.org/wiki/User:Jheald/bookmarks
This gives a nutshell of where some different fields might be stored
https://docs.google.com/presentation/d/1x-vOUr-zveLzoIP6uJC1Sz95xwuTFNwaBqJ…
This etherpad is good, esp lines immediately after 140, and "What new
fields should be created to complement the old fields?" at 156 (actually
in the context of Upload Wizard, but it gives some ideas)
http://etherpad.wikimedia.org/p/multimedia-wikidata-catchup
There's a spreadsheet showing some of the fields they're thinking about
https://docs.google.com/spreadsheets/d/1rk05EcLZpJaqOh5wymK6teIQufH9t0xn6oD…
-- though I think quite a lot of what's down as living on WikiData
should really be Commons WikiBase --
and also a suggestion based on some simple use-cases:
https://docs.google.com/document/d/1C7UTB1kbaf_EisF3LmhpIQkb_ifSkB0rD8IGI9a…
though I think we would probably see that as *too* simple, even for a
first build, because for many of our applications
sequence-number in set
& set-identifer in collection
are probably essential quantities to have (as they probably are for the
WikiSource collection too).
Finally, this is an etherpad from the Hackathon just been, which has a
lot of useful links at the end.
http://etherpad.wikimedia.org/p/structured-data-discussion-7-august-2014
Hope this initial brain dump is of at least some use, to make it worth
its length,
All best,
James Heald. (User:Jheald).
Dear all on the Glamtools and Europeana Steering Committee mailinglists.
In preparation for Wikimania I thought I'd send out this summary of what's
going on. Obviously not everyone can attend the conference, but for those
that can, this will hopefully be of use to you. There is information here
about the *Stall,* the *Flyer*, the *GWT Wiki page *the *Schedule*,
and the *Task
Force launch*.
*Stall*
We have been confirmed to have a stall for the duration of the conference
in the main foyer. Here is the map
<https://wikimania2014.wikimedia.org/wiki/Community_Village>. We are
kind-of in the corner so I'm hoping one of the other groups that applied
for a stall doesn't show up, and we can 'borrow' their space... A banner
and lots of Europeana Labs/Europeana API etc. brochures have arrived in
London already. I will be trying to sit at this stall during most
lunches/breaks - and I would very much appreciate if YOU could join me. If
you see the desk unattended, and you want to sit and help, please do. If
past Wikimanias are any guide, chairs during break-times will be a valuable
commodity!
*Flyer*
Also we have a very pretty flyer to hand out at this stall and at every
GWToolset related presentation/hacking. No one should be able to leave this
conference without having heard about the GWT! Wikimedia-UK generously
agreed to organise and pay for the printing of this flyer - so we have 1250
of them! They are A5 size. (front side
<https://commons.wikimedia.org/wiki/File:GLAMwiki_Toolset_Flyer_-_front_side…>
image, back side
<https://commons.wikimedia.org/wiki/File:GLAMwiki_Toolset_Flyer_-_back_side.…>
image). Especial thanks go to Sebastiaan for his generous assistance taking
my design and turning into this professional publication!
*GLAMwiki Toolset Wiki page*
As you can see, that flyer tells people to go to bit.ly/GWtoolset (which
directs to [[Commons:GLAMwiki Toolset]] ). So, I've been very bold in
completely restructuring that page to make it more attractive to new
visitors - adding some example uploads, live statistics of usage, links to
the different sub-pages (documentation, project history etc.), and trying
to give an accurate but easily digestible summary of how to use the
tool. I'm applying Cunningham's law
<https://meta.wikimedia.org/wiki/Cunningham's_Law> by including the
"instructions" section - please change it if it is incorrect. Also, anyone
who is particularly good at making mediawiki pages look *pretty *is very
welcome to go in and help improve it.
*Schedule*
GLAMwikiToolset/Europeana related events at Wikimania.
For this I can't beat the summary that Dan Entous created earlier today
which I've copied below. There are lots of relevant things of course but
these are the directly related items. (I should also add to Dan's list a
panel session about the Europeana Fashion events that is happening on
Friday at 12.)
*Wikimania - overall programme*
>
> - https://wikimania2014.wikimedia.org/wiki/Programme
>
> *GWToolset Hackathon Sprint*
>
> - this is for programmers -- *idea is to involve any programmers that
> are interested in programming for the tool attend in order to learn how to
> program for the
> tool: https://wikimania2014.wikimedia.org/wiki/Hackathon#GWToolset
> <https://wikimania2014.wikimedia.org/wiki/Hackathon#GWToolset>*
>
>
> - there is no fixed schedule -- just *an open hacking sprint during
> the day: **wednesday aug 6 and **thursday august 7*
>
> *Structured Data*
>
> - thursday, august 7
>
>
> - 10:00 - 11:30
>
>
> -
> https://wikimania2014.wikimedia.org/wiki/Programme#Thursday.2C_August_7
>
>
> - this is probably a good discussion to attend
>
> *Europeana Task Force*
>
> - thursday, august 7
>
>
> - 16:00 - 17:00
>
>
> -
> https://wikimania2014.wikimedia.org/wiki/Programme#Thursday.2C_August_7
>
> *GWToolset - What is It*
>
> - sunday, august 10
>
>
> - 12:00 - 12:30
>
>
> -
> https://wikimania2014.wikimedia.org/wiki/Programme#Sunday.2C_August_10
>
>
> - this is joris' talk about the toolset
>
> *GWToolset - Training Session*
>
> - sunday, august 10
>
>
> - 12:30 - 13:00
>
>
> -
> https://wikimania2014.wikimedia.org/wiki/Programme#Sunday.2C_August_10
>
>
> - this is fae's training session on how to use the tool
>
>
*Task Force launch*
As indicated in this list of Dan's, the *Task Force *will be having its
first meeting on Thursday afternoon. This is the agenda
<https://docs.google.com/document/d/1RgDUkMZX00I_IfdZgnwcYVNb4skHv1da3DsJmMD…>.
It will hopefully kick-off a very productive process to produce, in 6
months' time, a report for what the relationship between Europeana and
Wikimedia should look like in accordance with their respective strategic
plans. (The public documentation of this Task Force will be published on
the Europeana website <http://pro.europeana.eu/network/task-forces/overview>
very soon I am informed.)
[Relatedly, there will be lots of formal and informal side-meetings
happening that are related to Europeana specifically or GLAM-Wiki more
generally. let's share notes!]
Any questions, ideas or problems, please email (or tweet to me during the
conference @Wittylama <https://twitter.com/Wittylama>).
And with that - I look forward to seeing many of you at Wikimania!
Your humble Europeana GLAM-Wiki coordinator,
- Liam
wittylama.com
Peace, love & metadata