Hi, i see these
"graphoid endpoints health"
/{domain}/v1/{format}/{title}/{revid}/{id} is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
on sca1001 and sca1002.
Is that the same thing?
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1...
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1...
On Tue, Dec 15, 2015 at 4:52 AM, Marko Obrovac mobrovac@wikimedia.org wrote:
+cc ops-l.
It seems that the Graph extension and Graphoid have changed the way the hash is being calculated, which resulted in 404s for RESTBase's check of Graphoid.
I have changed the hash in RESTBase's monitoring specification and deployed it. All checks are passing now.
Yuri, given that Graph(oid) is using RESTBase for its everyday operation, you need to notify us (Ops/Services) of such fundamental changes so that we can prepare and act accordingly (and synchronise deployment with you).
Cheers, Marko
On 15 December 2015 at 06:11, Gabriel Wicke gwicke@wikimedia.org wrote:
From the graphoid log:
{"name":"graphoid","hostname":"sca1002","pid":187,"level":30,"domain":" en.wikipedia.org","format":
"png","title":"User:Pchelolo/Graph","revid":"670213569","id":"1533aaad45c733dcc7e07614b54cbae4119a
6747.png","apicall":{"format":"json","formatversion":"2","action":"graph","title":"User:Pchelolo/G
raph","hash":"1533aaad45c733dcc7e07614b54cbae4119a6747"},"apiRetError":{"code":"invalidhash","info ":"No graph found.","docref":"See https://en.wikipedia.org/w/api.php for API usage"},"levelPath":"
info/mwapi-error","request_id":"2645c162-a2ea-11e5-a91c-8766eb5ddb47","msg":"[object Object]","tim e":"2015-12-15T05:10:25.341Z","v":0}
On Mon, Dec 14, 2015 at 9:03 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
This errors out:
http://graphoid.wikimedia.org/mediawiki.org/v1/png/Extension:Graph/0/be66c70...
On Mon, Dec 14, 2015 at 8:58 PM, Gabriel Wicke gwicke@wikimedia.org
wrote:
Hi Yuri,
there were just some graphoid endpoint alerts from the RESTBase health checks. Is this known / expected?
Gabriel
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Ops mailing list Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops
I'm debugging the issue
On Wed, Dec 16, 2015 at 1:47 AM, Daniel Zahn dzahn@wikimedia.org wrote:
Hi, i see these
"graphoid endpoints health"
/{domain}/v1/{format}/{title}/{revid}/{id} is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
on sca1001 and sca1002.
Is that the same thing?
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1...
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1...
On Tue, Dec 15, 2015 at 4:52 AM, Marko Obrovac mobrovac@wikimedia.org wrote:
+cc ops-l.
It seems that the Graph extension and Graphoid have changed the way the hash is being calculated, which resulted in 404s for RESTBase's check of Graphoid.
I have changed the hash in RESTBase's monitoring specification and deployed it. All checks are passing now.
Yuri, given that Graph(oid) is using RESTBase for its everyday operation, you need to notify us (Ops/Services) of such fundamental changes so that we can prepare and act accordingly (and synchronise deployment with you).
Cheers, Marko
On 15 December 2015 at 06:11, Gabriel Wicke gwicke@wikimedia.org wrote:
From the graphoid log:
{"name":"graphoid","hostname":"sca1002","pid":187,"level":30,"domain":" en.wikipedia.org","format":
"png","title":"User:Pchelolo/Graph","revid":"670213569","id":"1533aaad45c733dcc7e07614b54cbae4119a
6747.png","apicall":{"format":"json","formatversion":"2","action":"graph","title":"User:Pchelolo/G
raph","hash":"1533aaad45c733dcc7e07614b54cbae4119a6747"},"apiRetError":{"code":"invalidhash","info ":"No graph found.","docref":"See https://en.wikipedia.org/w/api.php for API usage"},"levelPath":"
info/mwapi-error","request_id":"2645c162-a2ea-11e5-a91c-8766eb5ddb47","msg":"[object Object]","tim e":"2015-12-15T05:10:25.341Z","v":0}
On Mon, Dec 14, 2015 at 9:03 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
This errors out:
http://graphoid.wikimedia.org/mediawiki.org/v1/png/Extension:Graph/0/be66c70...
On Mon, Dec 14, 2015 at 8:58 PM, Gabriel Wicke gwicke@wikimedia.org
wrote:
Hi Yuri,
there were just some graphoid endpoint alerts from the RESTBase health checks. Is this known / expected?
Gabriel
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Ops mailing list Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops
-- Daniel Zahn dzahn@wikimedia.org Operations Engineer
Upon further digging, this might actually be expected: * The hash calculation has not changed * I manually updated many graphs recently, by changing the Lua code - about 25k graphs, which could be the reason why so many cached HTML pages were referring to older version of the images. The image URL contains a hash, but that hash is no longer present in the SQL (pageprops), hence the failure. * A semi-related bug https://phabricator.wikimedia.org/T121603 was discovered - several non-wikipedia wikis that allow "Graph" namespace do not save graph data to the prop-pages sql table. Since graph namespace is not used much, I think I will disable this support entirely, rather than try to figure it out.
On Wed, Dec 16, 2015 at 1:48 AM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
I'm debugging the issue
On Wed, Dec 16, 2015 at 1:47 AM, Daniel Zahn dzahn@wikimedia.org wrote:
Hi, i see these
"graphoid endpoints health"
/{domain}/v1/{format}/{title}/{revid}/{id} is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200)
on sca1001 and sca1002.
Is that the same thing?
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1...
https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=sca1...
On Tue, Dec 15, 2015 at 4:52 AM, Marko Obrovac mobrovac@wikimedia.org wrote:
+cc ops-l.
It seems that the Graph extension and Graphoid have changed the way the hash is being calculated, which resulted in 404s for RESTBase's check of Graphoid.
I have changed the hash in RESTBase's monitoring specification and deployed it. All checks are passing now.
Yuri, given that Graph(oid) is using RESTBase for its everyday operation, you need to notify us (Ops/Services) of such fundamental changes so that we can prepare and act accordingly (and synchronise deployment with you).
Cheers, Marko
On 15 December 2015 at 06:11, Gabriel Wicke gwicke@wikimedia.org wrote:
From the graphoid log:
{"name":"graphoid","hostname":"sca1002","pid":187,"level":30,"domain":" en.wikipedia.org","format":
"png","title":"User:Pchelolo/Graph","revid":"670213569","id":"1533aaad45c733dcc7e07614b54cbae4119a
6747.png","apicall":{"format":"json","formatversion":"2","action":"graph","title":"User:Pchelolo/G
raph","hash":"1533aaad45c733dcc7e07614b54cbae4119a6747"},"apiRetError":{"code":"invalidhash","info ":"No graph found.","docref":"See https://en.wikipedia.org/w/api.php for API usage"},"levelPath":"
info/mwapi-error","request_id":"2645c162-a2ea-11e5-a91c-8766eb5ddb47","msg":"[object Object]","tim e":"2015-12-15T05:10:25.341Z","v":0}
On Mon, Dec 14, 2015 at 9:03 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
This errors out:
http://graphoid.wikimedia.org/mediawiki.org/v1/png/Extension:Graph/0/be66c70...
On Mon, Dec 14, 2015 at 8:58 PM, Gabriel Wicke gwicke@wikimedia.org
wrote:
Hi Yuri,
there were just some graphoid endpoint alerts from the RESTBase
health
checks. Is this known / expected?
Gabriel
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Ops mailing list Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops
-- Daniel Zahn dzahn@wikimedia.org Operations Engineer
On Wed, Dec 16, 2015 at 1:24 AM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
Upon further digging, this might actually be expected:
- The hash calculation has not changed
- I manually updated many graphs recently, by changing the Lua code - about
25k graphs, which could be the reason why so many cached HTML pages were referring to older version of the images. The image URL contains a hash, but that hash is no longer present in the SQL (pageprops), hence the failure.
- A semi-related bug was discovered - several non-wikipedia wikis that allow
"Graph" namespace do not save graph data to the prop-pages sql table. Since graph namespace is not used much, I think I will disable this support entirely, rather than try to figure it out.
So, when will you release a new version of graphoid which doesn't expose a spec that fails to be verified?
Cheers,
Giuseppe
On 16 December 2015 at 09:34, Giuseppe Lavagetto glavagetto@wikimedia.org wrote:
On Wed, Dec 16, 2015 at 1:24 AM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
Upon further digging, this might actually be expected:
- The hash calculation has not changed
Indeed, the hash calculation hasn't changed, but you changed the graph defs, which altered the hash~[1], thus leading to monitoring failures for RESTBase. I suspect the same thing happened with the specification tests for Graphoid.
Marko
- I manually updated many graphs recently, by changing the Lua code -
about
25k graphs, which could be the reason why so many cached HTML pages were referring to older version of the images. The image URL contains a hash,
but
that hash is no longer present in the SQL (pageprops), hence the failure.
- A semi-related bug was discovered - several non-wikipedia wikis that
allow
"Graph" namespace do not save graph data to the prop-pages sql table.
Since
graph namespace is not used much, I think I will disable this support entirely, rather than try to figure it out.
So, when will you release a new version of graphoid which doesn't expose a spec that fails to be verified?
Cheers,
Giuseppe
[1] https://en.wikipedia.org/w/index.php?title=User:Pchelolo/Graph&action=hi...
The graph hash used in the graphoid unit test has been updated and deployed.
On Wed, Dec 16, 2015 at 2:42 PM, Marko Obrovac mobrovac@wikimedia.org wrote:
On 16 December 2015 at 09:34, Giuseppe Lavagetto <glavagetto@wikimedia.org
wrote:
On Wed, Dec 16, 2015 at 1:24 AM, Yuri Astrakhan yastrakhan@wikimedia.org wrote:
Upon further digging, this might actually be expected:
- The hash calculation has not changed
Indeed, the hash calculation hasn't changed, but you changed the graph defs, which altered the hash~[1], thus leading to monitoring failures for RESTBase. I suspect the same thing happened with the specification tests for Graphoid.
Marko
- I manually updated many graphs recently, by changing the Lua code -
about
25k graphs, which could be the reason why so many cached HTML pages were referring to older version of the images. The image URL contains a
hash, but
that hash is no longer present in the SQL (pageprops), hence the
failure.
- A semi-related bug was discovered - several non-wikipedia wikis that
allow
"Graph" namespace do not save graph data to the prop-pages sql table.
Since
graph namespace is not used much, I think I will disable this support entirely, rather than try to figure it out.
So, when will you release a new version of graphoid which doesn't expose a spec that fails to be verified?
Cheers,
Giuseppe
[1] https://en.wikipedia.org/w/index.php?title=User:Pchelolo/Graph&action=hi...
-- Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation