Could you include what the binary code for your ae was if possible (on unix computers, possibly also mac, the hd or hexdump command can tell you this) or just attach the xml in question to a bug (since there is a possibility that your email client might change the character)?

The encoding should be utf8 with NFC, but even if its not quite correct gw/mw should convert it.

--bawolff
On May 2, 2014 3:42 AM, "Fæ" <faewik@gmail.com> wrote:
>
> Thanks for the detailed investigation Dan. There must be some oddity in the way I'm creating my xml (Python generated, then edited in JEdit for any tweaks, should be 'utf-8') so I'll continue plugging at it.
>
> I keep missing your emails and finding them under my spam folder, no idea why.
>
> Fae
>
>
> On 1 May 2014 20:04, dan entous <d_entous@yahoo.com> wrote:
>>
>> characters
>> ----------
>> i have a test xml i use to test titles and added the characters you mentioned. i had no problem uploading the test xml file. here are 2 results that seem to indicate that there should not be an issue with the characters:
>>
>> http://commons.wikimedia.beta.wmflabs.org/wiki/File:The_%22King%E2%80%99s_of_Hungary_-_%C3%86%22_holding_c%C3%B6uncil_%26_in_his_tent_%26_on_the_battl%C3%A9field_-_Froissart%27s_Chronicles_(Volume_IV,_part_2)_(1470-1475),_f.84_-_BL_Harley_MS_4380.jpg
>>
>> http://commons.wikimedia.beta.wmflabs.org/wiki/File:Dice_players_-_Lo_L%C3%ADbro_de_Multi_B%C8%A9lli_Mirac%C3%BCli_(14th_C),_f.9v_-_BL_%C3%85dd_MS_22557.jpg
>>
>>
>> example record
>> --------------
>> i tested the example record locally and after about 2 minutes i got the message:
>>
>> The file you submitted was too large. original URL: http://link.nypl.org/2Qqj_oLvSbWRwPxtB1rq_wZ evaluated URL: http://link.nypl.org/2Qqj_oLvSbWRwPxtB1rq_wZ
>>
>> my wiki was set to a limit of 100mb, so i up’d it to 1000mb.
>>
>> i also switched to the new preview branch i have in gerrit, https://gerrit.wikimedia.org/r/#/c/127839/, for bug https://bugzilla.wikimedia.org/show_bug.cgi?id=63864, which no longer downloads an image to the wiki during the preview step. instead it downloads all mediafiles in a background job.
>>
>> the job successfully completed after 3 minutes and the image was viewable in my local wiki.
>>
>> i also took a look at our wikitech instance and saw that you had uploaded the image there without issue. i also repeated the uploaded but got the message:
>>
>> “This file did not pass file verification.”
>>
>> this seems to have been thrown by UploadBase.php, so i'd have to look further into that issue. but i also suspect that commons may have just timed out on the download of the image in the preview step. this type of error seems similar to bug 63864. i just need someone to +2 the patch i made so that we can test the new preview step on the beta cluster.
>>
>>
>> with kind regards,
>> dan
>>
>>
>> On May 1, 2014, at 18:35 , Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
>>
>> > Fæ, 01/05/2014 14:59:
>> >> Instead, a good example of characters giving a problem is the file at
>> >> [1]. This caused the GWT run to halt but was successfully loaded once
>> >> I changed the "Æ" (ae ligature) character in Ægean to a simple "A".
>> >> The only cause of this failure must have been the character, which is
>> >> allowed in the mediawiki software.
>> >>
>> >> Links
>> >> 1.https://commons.wikimedia.org/wiki/File:A_new_map_of_the_islands_of_the_Agean_Sea,_together_with_the_island_of_Crete,_and_the_adjoining_isles._NYPL1630716.tiff
>> >
>> > Thanks, this gives you clear steps to reproduce and makes a valuable bug report. Please file. :)
>> >
>> > Nemo
>> >
>> >
>> > _______________________________________________
>> > Glamtools mailing list
>> > Glamtools@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/glamtools
>>
>>
>> _______________________________________________
>> Glamtools mailing list
>> Glamtools@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/glamtools
>
>
>
>
> --
> faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
> Personal and confidential, please do not circulate or re-quote.
>
> _______________________________________________
> Glamtools mailing list
> Glamtools@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/glamtools
>