Re: [Analytics] Analytics Digest, Vol 48, Issue 10 - Analytics

4 Feb 2016

...
  Date: Thu, 4 Feb 2016 08:22:01 +0100
 From: "Federico Leva (Nemo)" &lt;nemowiki(a)gmail.com&gt;
 To: A mailing list for the Analytics Team at WMF and everybody who has
         an interest in Wikipedia and "analytics."
         &lt;analytics(a)lists.wikimedia.org&gt;
 Subject: Re: [Analytics] Pagecounts dumps page title UTF-8 escaping
 Message-ID: &lt;56B2FC19.6090105(a)gmail.com&gt;
 Content-Type: text/plain; charset=utf-8; format=flowed

 Bo Han, 04/02/2016 00:40:
  Is the logic for the escaping available
somewhere? 
 MediaWiki API does https://phabricator.wikimedia.org/T29849
 For the new pageviews API I got this reply on Unicode normalisation:
 https://phabricator.wikimedia.org/T44259#1351880

 (Phabricator is down right now; wait a couple hours or check
 web.archive.org.)

 Nemo 
Thanks for the reply Nemo. I read over the two links but am still a
little confused about the case for "Мстители (фильм, 2012)" on domain
ru, which is escaped as:
"%D0%9C%D1%81%D1%82%D0%B8%D1%82%D0%B5%D0%BB%D0%B8_%28%D1%84%D0%B8%D0%BB%D1%8C%D0%BC,_2012%29"
(everything but comma escaped)
"%D0%9C%D1%81%D1%82%D0%B8%D1%82%D0%B5%D0%BB%D0%B8_(%D1%84%D0%B8%D0%BB%D1%8C%D0%BC,_2012)"
(everything but comma+parens escaped)
"Мстители_(фильм,_2012)" (nothing escaped)

Shouldn't the comma and parens be escaped as well, or is there a
special case for reserved characters? If so, why are parens sometimes
escaped and sometimes not? Maybe some of the variation has to do with
how browsers encode/send the request?

Bo