Re: [Wikitech-l] An alternate parser

13 Aug 2004


      I just glanced at the codebase to take a look exactly what gets cached. 
If my understanding is correct (and it's possible that it's not, since I 
only looked for about 5 minutes), the article is reparsed by the parser 
on every page view[1].
Since disk space is cheap, and disks are relatively fast, why not do the 
following (it's assumed that a dedicated box is designated the "read" 
database box):
1. User begins to edit page
2. Software loads mediawiki-markup page from "write" database, displays it
3. User makes changes, submits changes
4. Mediawiki stores the page, verbatim, into the "write" database
5. Mediawiki also runs the page through Parser, takes the raw HTML, and 
feeds it to a separate, "read" database
Subsequently, any and all article views are routed to the "read" 
database, reducing the time for all non-edit pageviews to a mere 
database fetch, which is further performance-optimized by virtue of 
memcached and the MySQL query cache, and can be cheaply made faster 
through RAID striping and similar measures.
Of course, this depends on no user-configurable settings changing the 
HTML once it's produced. That's in direct conflict with at least one 
feature that I'm aware of, which is the different style of showing 
non-existent links, but there are two solutions to this: one, scrap the 
feature for a massive performance boost, and two, have a special 
miniparser that only parses the raw HTML from the "read" database for 
those per-user settings which modify the HTML output.
Either way, the performance penalty of the parser is reduced to a 
one-time hit when the article is being committed, and the "our parser is 
slow" discussion becomes moot.
Thoughts? If this is what we need, I might be nudged into providing a patch.
Cheers,
Ivan.
[1] For simplicity, I'm leaving the squids out of the equation here.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] An alternate parser