Dear all,
First off, congratulations to everybody on creating this tool, which is
going to revolutionise uploading to WikiCommons.
Inevitably what follows is going to be largely a list of nit-picks (and
I'm sorry if I haven't tried to find your project plans or bug-tracker
first, in case some of the answers are already in the pipeline); but
don't let any of the below take away from what is a great achievement.
So, what are some issues that struck me, when uploading the set now at
https://commons.wikimedia.org/wiki/Category:Images_released_by_British_Libr…
(cat name may move, but this is where it's at for the moment).
* Filenames -- already under discussion in a different thread, at least
as regards character replacements.
I was a bit surprised to find the Artwork::title field automatically
being built into the file name -- I hadn't expected this.
On the one hand, I can see that it's an important piece of WikiCommons
culture to enforce: the name of the work comes first, because that is
what people will first see. But in my case, I sometimes had some very
long titles, so I wanted to be able to sometimes have a shortened
version in the filename. As a result, to avoid this I found that I was
having to move put the picture title into the first line of the
description field -- not ideal. So you might want to consider adding an
option to de-select this.
It would be nice for users to have a bit more information about how
filenames will be created, but this will come.
* Staging area. -- I had had the impression that the initial 3 test
uploads would be uploaded to a staging area, rather than the main live
wiki. So I was a bit surprised when I found it was indeed the main live
wiki they had been uploaded to.
Of course, this makes a lot of sense -- for example, seeing the effect
of specialist templates etc. It's just about managing expectations --
and, perhaps, reassuring people that mistakes can be easily removed eg
by tagging the wrongly named image with {{duplicate}}. (I ended up with
the unexpected title duplication causing unwanted filenames, and then a
".jpg.jpg" set of uploads). I initially wasn't very comfortable with my
mistakes happening on the live wiki for all to see, which made me feel
quite stressed to start with; but then I relaxed, and started the full
upload.
* Output. -- If outputting {{artwork}}, please include the standard
fields in the standard order, even if some of them are empty. eg:
{{Artwork
|artist =
|title =
|description =
|date =
|medium =
|dimensions =
|institution =
|location =
|references =
|object history =
|credit line =
|inscriptions =
|notes =
|accession number =
|source =
|permission =
|other_versions =
}}
& further fields have their standard places in the order; which pretty
much corresponds to the sequence they are output in, *not* alphabetical
order.
This is important, because WikiCommons is not a "write once" medium --
pages are there to be easily edited and updated, by humans.
It is useful to have all the basic fields in place, even if they are not
populated, because it makes it so much easier to fill something in later
-- for example, in my case, to move some of the 'description' back into
the 'title'; or to add references; or transcriptions of inscriptions; or
other versions, already on the Wiki.
The empty fields also help to give the edit page order and structure
when you look at it; otherwise it can get messy and harder to process,
if the 'description' and 'source' fields are allowed to dominate, which
can get quite long and free-form.
And please keep the fields in the standard order above, so that
experienced editors know exactly where to expect to look for particular
information, and where to edit it.
* GWtoolset fields.
The unexpected fields 'gwtoolset-title-identifier' and
'gwtoolset-url-to-the-media-file' are currently causing the template to
throw warnings, which look unsightly.
If these are going to be placed in the artwork template, please edit
that template, so that it doesn't throw warnings.
But is the artwork template actually the best place for these fields?
They don't relate to a description of the artwork, rather a description
of the upload process.
The standard place to describe the history of the upload process is in
its own template, separate from the image description template --
compare for example the template left by the Flickr2Commons bot in the
'licensing' section of the page
https://commons.wikimedia.org/wiki/File:Furnival%27s_Inn,_Holborn_-_Shepher…
The advantage of this is that the 'artwork' template can be kept to a
very specific function, without having its code cluttered up by other
stuff. Think what the effect would be if every upload process wanted to
add its fields to the artwork template -- maintenance, or even reading
the code, would become a nightmare. Instead, much better to put this
content in your own template, to mark the GWtoolset upload process,
perhaps with an additional master parameter to turn visible output from
the template off or on.
* Category section
This is one of the most important sections for hand-editing. Yes there
are nice methods to add/remove categories now built right into the
interface; but these still also get edited by hand, too. Readability is
therefore important.
Therefore, can you add linefeed characters, so that each
[[Category:...]] directive starts on a new line.
It's a small thing. But without it the output from last night's version
is almost unreadable.
* Whitespace
I can see it's useful at the moment, in the present beta stage of the
code, to add a debugging dump of the tool's run-state to the end of the
page.
But please can you add several lines of whitespace before it.
Normally, the category section is very easy to find, being the last
thing on the page. But without whitespace, it gets buried in a big heap
of text. So, fine to keep the debugging information there, but please
add a few lines of whitespace before it, to make it easier to find the
categories section.
* Markup
I wasn't sure how to get markup onto the page. For example, the <br />
tag can be useful if one only wants a newline, not a new paragraph. (It
is only double newlines that the Wiki software treats as breaks, single
newlines get rendered as spaces; so a <br /> tag is needed if you want
to specify a linebreak).
However it appeared that <br /> tags were being eaten by the XML parser.
I also tried double single-quotes '' to indicate italicised text, but
the software carefully turned these into Unicode escapes to preserve
them. (I didn't try <i> or <em>, so maybe that would have been the way
round this).
It can also be very useful to be able to add [[wikilinks]] at the
offline, pre-upload stage. I presume the software will escape these as
well. (Though there are workaround templates, which I presume may give
a way to work round this, albeit at the expense of less readable
wiki-pages).
* Enhancements
** {{DEFAULTSORT:}}
It would be nice to be able to specify a field in the XML to be put into
a Defaultsort for the page.
For example, for anything over 100 years old, I tend to find that it's
useful to specify a default sort-key of the form "DATE ITEM SEQ" --
where DATE is a 4-digit numerical date (perhaps with a suffix to
indicate imprecision), ITEM is some identifier for the series or item,
eg a book, that the images are drawn from; and SEQ is a padded number to
indicate a sequence within that item.
Last night I got round this by smuggling my Defaultsort into one of the
fields in the Artwork template; but really it ought to be placed
immediately above the Category information, so it would be able to load
it there directly.
** Free text
It might be good to also be able to have the general ability to load
text (eg arbitrary templates) from the XML file into the various other
parts of the page outside the Artwork template. For example, particular
credit templates or notes, or bespoke 'permissions' templates.
Of course it would be nice if the tool already knew about such
templates; but for when it doesn't, it would be a useful option to be
able to place free text in different parts of the standard page.
** Compound fields
As well as Defaultsort above, there were a number of other entries in my
upload last night that were compound fields.
For example,
Description = Title + '
' + Description
Filename = Short_Name - Short_Item_Name (Date), Page - Shelfmark
while 'Source' was built from two fields plus two further templates,
each of which had various input fields.
Some of this is always going to be best pre-processed offline. But for
simple cases, it would be nice to be able to specify multiple fields
with separators, that could then be baked into the JSON file.
** Non-XML forms of input.
JSON seems increasingly popular; and might not have so many issues with
escaped characters (and escapes for the escape mechanisms) as XML. Or
perhaps it's just that I write simple XML by hand, but for JSON I tend
to leave it to a library call to worry about...
So there are some issues. The (non-)allowed filename characters, and
the presentation/layout of the final wikitext page were the ones that
gave me actual unhappiness. The rest is there as a raw user's initial
impressions.
But really I want to thank you for this tool, which makes batch
uploading accessible really for anyone who can write an XML file, rather
than having to write bespoke bots and get specific bot approval for each
little thing.
Hope this is useful,
All best,
James.