>I filed that ticked almost a month ago, without a reaction so far. I take it that for the purpose of getting the attention of Analytics engineers towards such widespread anomalies, this mailing list is a far better venue than Phabricator.

Not really, it just means that on the last month we had higher priority items to work on than this one. You can see our work in the last month here: https://phabricator.wikimedia.org/project/board/1030/query/all/ (scroll right to "done column", items on gray are the ones completed)

On Wed, Dec 2, 2015 at 10:43 AM, Tilman Bayer <tbayer@wikimedia.org> wrote:


On Wed, Dec 2, 2015 at 9:38 AM, Joseph Allemandou <jallemandou@wikimedia.org> wrote:
Food for thoughts:

Regarding the "???.." entries below, see also https://phabricator.wikimedia.org/T117945 ; the list there shows the prevalence of the "-"s across languages, too.

(I filed that ticked almost a month ago, without a reaction so far. I take it that for the purpose of getting the attention of Analytics engineers towards such widespread anomalies, this mailing list is a far better venue than Phabricator.)
 

SELECT
  uri_host, uri_path, uri_query, COUNT(1) as c 
FROM wmf.webrequest
WHERE webrequest_source IN ('text', 'mobile') AND <YEAR/MONTH/DAY/HOUR restricted>
AND is_pageview AND pageview_info['page_title'] = '-'
GROUP BY uri_host, uri_path, uri_query
ORDER BY c DESC LIMIT 100;



en.m.wikipedia.org /w/index.php ?search=&fulltext=Search 1986
fr.m.wikipedia.org /w/index.php ?search=&fulltext=Rechercher 476
ar.m.wikipedia.org /w/index.php ?search=&fulltext=%D8%A7%D8%A8%D8%AD%D8%AB 461
en.wikipedia.org /w/index.php ?redirs=0&search=Calvary%20%2B%20Film%20%2B%20Brendan%20Gleeson%20%2B%202014&fulltext=Search&ns0=1 356
en.wikipedia.org /w/index.php ?redirs=0&search=The%20Woman%20in%20Black%202%3A%20Angel%20of%20Death%20%2B%20Film%20%2B%20Jeremy%20Irvine%20%2B%202014&fulltext=Search&ns0=1 317
ja.wikipedia.org /wiki/ ??? 307
ja.wikipedia.org /wiki/ ???? 258
pt.m.wikipedia.org /w/index.php ?search=&fulltext=Pesquisar 253
en.wikipedia.org /w/index.php ?redirs=0&search=We%20Are%20the%20Best%21%20%2B%20Film%20%2B%20Vanja%20Engstr%C3%B6m%20%2B%202013&fulltext=Search&ns0=1 207
en.wikipedia.org /w/index.php ?redirs=0&search=Manhattan%20%2B%20Film%20%2B%20John%20Benjamin%20Hickey%20%2B%202015&fulltext=Search&ns0=1 200
es.m.wikipedia.org /w/index.php ?search=&fulltext=Buscar 162
en.wikipedia.org /w/index.php ?redirs=0&search=Angus,%20Thongs%20and%20Perfect%20Snogging%20%2B%20Film%20%2B%20Aaron%20Taylor-Johnson%20%2B%202008&fulltext=Search&ns0=1 145
en.wikipedia.org /w/index.php 140
de.m.wikipedia.org /wiki ?search=Mordkommission%20BERLIN%201 139
ja.wikipedia.org /wiki/ ????? 134
en.m.wikipedia.org /w/index.php ?search= 120
ja.wikipedia.org /wiki/ ?? 117
ru.m.wikipedia.org /w/index.php ?search=&fulltext=%D0%9D%D0%B0%D0%B9%D1%82%D0%B8 112
de.wikipedia.org /w/index.php 100
ja.wikipedia.org /wiki/ ??????? 93
ar.wikipedia.org /wiki/ ???? 93
ja.wikipedia.org /wiki/ ???????? 83
m.wikimediafoundation.org /w/index.php ?search=&fulltext=Search 82
fa.m.wikipedia.org /w/index.php ?search=&fulltext=%D8%AC%D8%B3%D8%AA%D8%AC%D9%88 82
zh.wikipedia.org /wiki/ ??? 80
ar.m.wikipedia.org /w/index.php ?search&fulltext=%D8%A7%D8%A8%D8%AD%D8%AB 77
zh.wikipedia.org /wiki/ ???? 76
ja.wikipedia.org /wiki/ ?????? 76
ar.wikipedia.org /wiki/ ????? 72
ar.wikipedia.org /wiki/ ?????_?????? 69
en.wikipedia.org /w/index.php ?redirs=0&search=Once%20Upon%20a%20Time%20%2B%20Film%20%2B%20Ginnifer%20Goodwin%20%2B%202015&fulltext=Search&ns0=1 69
commons.m.wikimedia.org /wiki/ ?uselang=it 69
en.wikipedia.org /w/index.php ?redirs=0&search=The%20Birdcage%20%2B%20Film%20%2B%20Robin%20Williams%20%2B%201996&fulltext=Search&ns0=1 68
commons.wikimedia.org /wiki/ ?uselang=it 66
ar.wikipedia.org /wiki/ ????_????? 63
de.m.wikipedia.org /wiki ?search=Helmar%20B%C3%BCchel 58
en.m.wikipedia.org /w/index.php ?search&fulltext=Search 56
ar.wikipedia.org /wiki/ ????_??????? 54
en.m.wikipedia.org /wiki ?search=%3Cnomatch%2F%3E 53
en.wikipedia.org / ?search=+ 53
en.wikipedia.org /w/index.php ?redirs=0&search=We%20Are%20What%20We%20Are%20%2B%20Film%20%2B%20Vonia%20Arslanian%20%2B%202013&fulltext=Search&ns0=1 51
en.wikipedia.org /w/index.php ?diff=693310399&oldid=662731788 50
en.wikipedia.org /w/index.php ?redirs=0&search=We're%20the%20Millers%20%2B%20Film%20%2B%20Jason%20Sudeikis%20%2B%202013&fulltext=Search&ns0=1 48
ar.wikipedia.org /wiki/ ?????_???? 48
en.wikipedia.org /w/index.php ?diff=693310404&oldid=693207084 47
commons.m.wikimedia.org /w/index.php ?search=&fulltext=Search 44
ar.wikipedia.org /wiki/ ????_?????? 44
ar.wikipedia.org /wiki/ ?????_??????? 43
en.wikipedia.org /w/index.php ?do=/user/login/ 43
zh.wikipedia.org /zh-tw/ 43
ar.wikipedia.org /wiki/ ?????_????? 42
en.wikipedia.org /w/index.php ?diff=693313743&oldid=693310399 42
ar.wikipedia.org /wiki/ ?????? 41
en.m.wikipedia.org /w/index.php ?search=Mukamwezi.Com&fulltext=Search 41
en.wikipedia.org /w/index.php ?redirs=0&search=Marina%20%2B%20Film%20%2B%20Matteo%20Simoni%20%2B%202013&fulltext=Search&ns0=1 41
de.m.wikipedia.org /w/index.php ?search=&fulltext=Volltext 40
ar.wikipedia.org /wiki/ ????_???? 40
en.wikipedia.org /w/index.php ?do=/blog/ 37
ja.wikipedia.org /wiki/ ????????? 37
en.wikipedia.org /w/index.php ?redirs=0&search=Zurich%20%2B%20Film%20%2B%20Wende%20Snijders%20%2B%202015&fulltext=Search&ns0=1 37
ar.wikipedia.org /wiki/ ??????? 36
ar.m.wikipedia.org /w/index.php ?search=%D8%B3%D9%83%D8%B3&fulltext=%D8%A7%D8%A8%D8%AD%D8%AB 35
tr.m.wikipedia.org /w/index.php ?search=&fulltext=Ara 35
en.wikipedia.org /w/index.php ?redirs=0&search=Extraterrestre%20%2B%20Film%20%2B%20Juli%C3%A1n%20Villagr%C3%A1n%20%2B%202011&fulltext=Search&ns0=1 34
uk.wikipedia.org /w/index.php ?curid=2070033 33
ar.wikipedia.org /wiki/ ???_????? 33
es.wikipedia.org /w/index.php 33
ar.wikipedia.org /wiki/ ???_???? 32
en.wikipedia.org /w/index.php ?redirs=0&search=The%20Book%20Thief%20%2B%20Film%20%2B%20Sophie%20N%C3%A9lisse%20%2B%202013&fulltext=Search&ns0=1 32
en.wikipedia.org / ?search=Fn,..;;;: 31
en.wikipedia.org /w/index.php ?do=/user/profile/ 31
de.m.wikipedia.org /w/index.php ?search=Rtl+Deutschland 31
id.m.wikipedia.org /w/index.php ?search=&fulltext=Cari 30
de.wikipedia.org /w/index.php ?search=Mordkommission+Berlin+1 30
de.m.wikipedia.org /w/index.php ?search= 30
en.wikipedia.org /w/index.php ?do=/user/privacy/ 30
en.wikipedia.org /w/index.php ?redirs=0&search=Exodus%3A%20Gods%20and%20Kings%20%2B%20Film%20%2B%20Christian%20Bale%20%2B%202014&fulltext=Search&ns0=1 29
ar.wikipedia.org /wiki/ ??_?????? 28
ar.wikipedia.org /wiki/ ??? 28
ar.wikipedia.org /wiki/ ?????_???????? 27
en.wikipedia.org / ?curid=3392129 26
es.m.wikipedia.org /w/index.php ?search= 26
en.wikipedia.org /w/index.php ?redirs=0&search=Event%20Horizon%20%2B%20Film%20%2B%20Laurence%20Fishburne%20%2B%201997&fulltext=Search&ns0=1 26
en.m.wiktionary.org /w/index.php ?search=&fulltext=Search 25
meta.m.wikimedia.org /w/index.php ?search=&fulltext=Search 25
ar.wikipedia.org /wiki/ ??????_???????? 24
ar.wikipedia.org /wiki/ ???????? 24
zh.wikipedia.org /wiki/ ?%E6%9C%88? 24
sw.m.wikipedia.org /w/index.php ?search=&fulltext=Tafuta 24
de.m.wikipedia.org /wiki ?search=Mordkommission%20Berlin%201 24
en.m.wikipedia.org /w/index.php ?search=xnxx+xnxx+xnxx 24
zh.wikipedia.org /wiki/ ?? 24
en.wikipedia.org /w/index.php ?redirs=0&search=Hart's%20War%20%2B%20Film%20%2B%20Bruce%20Willis%20%2B%202002&fulltext=Search&ns0=1 24
m.mediawiki.org /w/index.php ?search=&fulltext=Search 23
en.wikipedia.org /w/index.php ?redirs=0&search=Selma%20%2B%20Film%20%2B%20David%20Oyelowo%20%2B%202014&fulltext=Search&ns0=1 23
en.wikipedia.org /w/index.php ?redirs=0&search=The%20Bridge%20%2B%20Film%20%2B%20Sofia%20Helin%20%2B%202011&fulltext=Search&ns0=1 23
en.wikipedia.org /w/index.php ?redirs=0&search=Rob%20the%20Mob%20%2B%20Film%20%2B%20Michael%20Pitt%20%2B%202014&fulltext=Search&ns0=1 23
fr.wikipedia.org / ?search=+ 23
it.m.wikipedia.org /w/index.php ?search=&fulltext=Ricerca 22


Perhaps not a very big deal in case of "Mordkommission BERLIN", but in general, search terms are considered private data.


On Wed, Dec 2, 2015 at 6:16 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
I mean, now I want to know how we can have a condition where there's
no page title but it registers as a pageview.

On 2 December 2015 at 12:14, Joseph Allemandou
<jallemandou@wikimedia.org> wrote:
> Double checked:
> https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L117
>
> This value is the default when no page title is found.
> I agree it's not very explicit.
> Any suggestion on changing it, or should we just make sure it is documented
> ?
>
> On Wed, Dec 2, 2015 at 6:10 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> Can someone dig into it? We should really be excluding that (unless it
>> is the page on the dash ;p)
>>
>> On 2 December 2015 at 12:00, Dan Andreescu <dandreescu@wikimedia.org>
>> wrote:
>> > I always wonder about that.  There's also an actual page that could
>> > theoretically be hit:
>> > https://en.wikipedia.org/w/index.php?title=-&redirect=no
>> >
>> > On Wed, Dec 2, 2015 at 11:58 AM, Gabriel Wicke <gwicke@wikimedia.org>
>> > wrote:
>> >>
>> >> Historically, I vaguely remember that we have used that title for user
>> >> script / style loading with action=raw. I think that's gone from the
>> >> skin code, but it's possible that user scripts still reference this
>> >> title.
>> >>
>> >> Gabriel
>> >>
>> >> On Wed, Dec 2, 2015 at 8:41 AM, Oliver Keyes <okeyes@wikimedia.org>
>> >> wrote:
>> >> > One of the most prominent top articles has no page; it's "-". What is
>> >> > this?
>> >> >
>> >> > --
>> >> > Oliver Keyes
>> >> > Count Logula
>> >> > Wikimedia Foundation
>> >> >
>> >> > _______________________________________________
>> >> > Analytics mailing list
>> >> > Analytics@lists.wikimedia.org
>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >>
>> >>
>> >> --
>> >> Gabriel Wicke
>> >> Principal Engineer, Wikimedia Foundation
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> Analytics@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
> Joseph Allemandou
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics