Hi Tilman,
Your assumption is correct, you can trust projectview_hourly :)

On Wed, Mar 2, 2016 at 4:22 AM, Tilman Bayer <tbayer@wikimedia.org> wrote:
Thanks Joseph! Is it reasonable to assume that the aggregate data in projectview_hourly has not been affected? 

On Tue, Mar 1, 2016 at 7:24 AM, Joseph Allemandou <jallemandou@wikimedia.org> wrote:
Hey Oliver,
It depends on what data you've used: if page_title or other 'encoding sensitive' data (I can't think of any other, but ...) is part of it, then yes, you should !

On Tue, Mar 1, 2016 at 3:27 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Hey Joseph,

Thanks for letting us know. So we should delete and backfill last
week's data, for our regularly scheduled scripts?

On 1 March 2016 at 08:26, Joseph Allemandou <jallemandou@wikimedia.org> wrote:
> Hi,
>
> TL,DR: Please don't use hive / spark / hadoop before next week.
>
> Last week the Analytics Team performed an upgrade to the Hadoop Cluster.
> It went reasonably well except for many of the hadoop processes were
> launched with a special option to NOT use utf-8 as default encoding.
> This issue caused trouble particularly in page title extraction and was
> detected last sunday (many kudos to the people having filled bugs on
> Analytics API about encoding :)
> We found the bug and fixed it yesterday, and backfill starts today, with the
> cluster recomputing every dataset starting 2016-02-23 onward.
> This means you shouldn't query last week data during this week, first
> because it is incorrect, and second because you'll curse the cluster for
> being too slow :)
>
> We are sorry for the inconvenience.
> Don't hesitate to contact us if you have any question
>
>
> --
> Joseph Allemandou
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> _______________________________________________
> Engineering mailing list
> Engineering@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>



--
Oliver Keyes
Count Logula
Wikimedia Foundation



--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

_______________________________________________
Engineering mailing list
Engineering@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/engineering




--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal