Dear All,
I need some help tracking down a DB deadlock on a single-server, "private" Wiki installation running latest MediaWiki download from http://www.mediawiki.org/wiki/Download
I have been referred to this mailing list from #mediawiki at irc.freenode.org, thanks for the pointer!
Installed versions: -------------------
- MediaWiki: 1.23.2
- VisualEditor from Git, branch REL1_23, ID 9883566
- Parsoid from Git, branch REL1_23, ID 8da3673
- parsoid service from Git, branch master, ID df35443
- mysql: 5.5.35, Debian wheezy package
- apache: 2.2.22, Debian wheezy package
- PHP: 5.4.4, Debian wheezy package
- OS: Linux 3.2.0-4-686-pae #1 SMP Debian 3.2.57-3 i686 GNU/Linux
MediaWiki, Parsoid (nodejs) service and mysql database are all on the same server. Setup followed the instructions on https://www.mediawiki.org/wiki/Parsoid/Setup
Problem description: --------------------
Logging in, reading and editing using the standard (not visual) edit works fine.
Clicking "Edit", which starts the visual editor, results in no reaction at all for approx 5 seconds, then the page goes dim and a progress indicator appears at the top right corner.
After another approx. 100 seconds, the page returns to normal state as if opened or reading.
Directly accessing the Parsoid service, here running on port 8000, promptly delivers the page.
Preliminary investigation: --------------------------
Enabling debugging in LocalSettings.php and tweaking wfDebugTimer() to get absolute millisecond timestamps and client address/port info in the log file showed the following sequences of activities (anonymized log files available if needed):
Summarized:
- client request page by calling "GET /wiki/api.php?format=json&action=visualeditor&paction=parse..."
- MediaWiki starts processing and class the Parsoid service with "GET /testwiki/TestPage?oldid=4053 HTTP/1.0"
- Parsoid service calls "GET /wiki/api.php?format=json&action=query&prop=revisions&..."
- call from Parsoid server into MediaWiki API deadlocks with client call into API.
The question is why is it deadlocking and how to prevent the deadlock.
Detailed activities analysis:
- (1) client accesses "GET /wiki/api.php?format=json&action=visualeditor&paction=parse..."
- (2) MediaWiki starts some session bookkeeping, including opening a DB transaction with "BEGIN" and performing several queries in the transaction.
- (3) MediaWiki/1.23.2 calls the parsoid service with "GET /testwiki/TestPage?oldid=4053 HTTP/1.0"
- (4) Parsoid/0.1 calls "GET /wiki/api.php?format=json&action=query&prop=revisions&..."
- (5) MediaWiki starts processing (in a new apache/PHP slave) and opens a new DB transaction with "BEGIN" performing the same initial "UPDATE `wiki_user` SET user_touched ..." query as first query in the transaction as in step (2).
- (6) approx. 40 seconds pass
- (7) Parsoid service drops connection from (4) to MediaWiki (TCP FIN packet observed by tcpdump while capturing the on-wire conversation between MediaWiki and the client / the Parsiod service)
- (8) Parsoid/0.1 calls again "GET /wiki/api.php?format=json&action=query&prop=revisions&..." from a new connection (client port number changed)
- (9) MediaWiki starts processing (in a new apache/PHP slave) and opens a new DB transaction with "BEGIN" performing the same initial "UPDATE `wiki_user` SET user_touched ..." query as first query in the transaction as in step (2) and (5).
- (10) approx. 10 seconds pass
- (11) MediaWiki receives a "SQL ERROR: Lock wait timeout exceeded; try restarting transaction" relating to transaction from step (5)
- (12) MediaWiki attempts a ROLLBACK of (5) and tries to send a "HTTP 500" (content captured by tcpdump) but the connection is already dead.
- (13) approx. 30 seconds pass
- (14) Parsoid service drops connection from (8) to MediaWiki (TCP FIN packet observed by tcpdump while capturing the on-wire conversation between MediaWiki and the client / the Parsiod service)
- (15) Parsoid/0.1 calls again "GET /wiki/api.php?format=json&action=query&prop=revisions&..." from a new connection (client port number changed)
- (16) MediaWiki starts processing (in a new apache/PHP slave) and opens a new DB transaction with "BEGIN" performing the same initial "UPDATE `wiki_user` SET user_touched ..." query as first query in the transaction as in step (2), (5) and (9).
- (17) approx. 10 seconds pass
- (18) MediaWiki receives a "SQL ERROR: Lock wait timeout exceeded; try restarting transaction" relating to transaction from step (9)
- (19) MediaWiki attempts a ROLLBACK of (9) and tries to send a "HTTP 500" (content captured by tcpdump) but the connection is already dead.
- (20) approx. 10 seconds pass
- (21) MediaWiki closes the connection to the Parsoid service from step (3), without any data exchange after receiving the initial "GET /testwiki/TestPage?oldid=4053 HTTP/1.0"
- (22) MediaWiki performs a DB "COMMIT" in the original client session from step (2), after 104 seconds.
- (23) MediaWiki continues processing the session from step (16), at least in part executing the same set of queries as in step (2).
- (24) MediaWiki performs a DB "COMMIT" in the Parsoid client session from step (15), after 20 seconds.
- (25) MediaWiki sends a "HTTP 200" with a JSON body to the Parsoid service as answer to request from step (15).
- (26) Parsoid service logs messages "starting parsing", a parse tree and "completed parsing in 101234 ms". (Note: It is unclear when in the range (21) to (26) these messages are logged, but I think this position "(26)" is most logical.) At earlier times, Parsoid logs the two messages "Failed API request, 8 retries remaining." and "Failed API request, 7 retries remaining." and not precisely known time steps, probably between (7)/(8) and (14)/(15), respectively.
Question: ---------
What could cause this behavior and how should I configure my system to prevent the deadlocks? If this is a Bug in either MediWiki or the VisualEditor or Parsoid, how to further investigate and fix it?
Best regards
Björn