Upon further digging, this might actually be expected:
* The hash calculation has not changed
* I manually updated many graphs recently, by changing the Lua code - about 25k graphs, which could be the reason why so many cached HTML pages were referring to older version of the images. The image URL contains a hash, but that hash is no longer present in the SQL (pageprops), hence the failure.
* A semi-related bug was discovered - several non-wikipedia wikis that allow "Graph" namespace do not save graph data to the prop-pages sql table.  Since graph namespace is not used much, I think I will disable this support entirely, rather than try to figure it out.

On Wed, Dec 16, 2015 at 1:48 AM, Yuri Astrakhan <yastrakhan@wikimedia.org> wrote:
I'm debugging the issue

On Wed, Dec 16, 2015 at 1:47 AM, Daniel Zahn <dzahn@wikimedia.org> wrote:
Hi, i see these

"graphoid endpoints health"

/{domain}/v1/{format}/{title}/{revid}/{id} is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)

on sca1001 and sca1002.

Is that the same thing?

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1001&service=graphoid+endpoints+health

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1002&service=graphoid+endpoints+health



On Tue, Dec 15, 2015 at 4:52 AM, Marko Obrovac <mobrovac@wikimedia.org> wrote:
+cc ops-l.

It seems that the Graph extension and Graphoid have changed the way the hash is being calculated, which resulted in 404s for RESTBase's check of Graphoid.

I have changed the hash in RESTBase's monitoring specification and deployed it. All checks are passing now.

Yuri, given that Graph(oid) is using RESTBase for its everyday operation, you need to notify us (Ops/Services) of such fundamental changes so that we can prepare and act accordingly (and synchronise deployment with you).

Cheers,
Marko

On 15 December 2015 at 06:11, Gabriel Wicke <gwicke@wikimedia.org> wrote:
From the graphoid log:

{"name":"graphoid","hostname":"sca1002","pid":187,"level":30,"domain":"en.wikipedia.org","format":
"png","title":"User:Pchelolo/Graph","revid":"670213569","id":"1533aaad45c733dcc7e07614b54cbae4119a
6747.png","apicall":{"format":"json","formatversion":"2","action":"graph","title":"User:Pchelolo/G
raph","hash":"1533aaad45c733dcc7e07614b54cbae4119a6747"},"apiRetError":{"code":"invalidhash","info
":"No graph found.","docref":"See https://en.wikipedia.org/w/api.php
for API usage"},"levelPath":"
info/mwapi-error","request_id":"2645c162-a2ea-11e5-a91c-8766eb5ddb47","msg":"[object
Object]","tim
e":"2015-12-15T05:10:25.341Z","v":0}

On Mon, Dec 14, 2015 at 9:03 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:
> This errors out:
>
> http://graphoid.wikimedia.org/mediawiki.org/v1/png/Extension:Graph/0/be66c7016b9de3188ef6a585950f10dc83239837.png
>
> On Mon, Dec 14, 2015 at 8:58 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:
>> Hi Yuri,
>>
>> there were just some graphoid endpoint alerts from the RESTBase health
>> checks. Is this known / expected?
>>
>> Gabriel
>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation



--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation



-- 
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation

_______________________________________________
Ops mailing list
Ops@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ops




--
Daniel Zahn <dzahn@wikimedia.org>
Operations Engineer