Hey thanks for the tip. I tried grepping using the following command grep -Rl "SELECT" .|grep -v "/.svn/"|grep -v "/docs/"|grep -v "/maintenance/"
and got the list below. Its really long. I have excluded the maintenance and docs folders completely. The files in includes directory is the first place I would be looking into.
./extensions/CrossNamespaceLinks/SpecialCrossNamespaceLinks_body.php ./extensions/CategoryTree/CategoryTreeFunctions.php ./includes/SearchTsearch2.php ./includes/SpecialAncientpages.php ./includes/SpecialLonelypages.php ./includes/SpecialWithoutinterwiki.php ./includes/ImagePage.php ./includes/SearchOracle.php ./includes/Export.php ./includes/SpecialUncategorizedpages.php ./includes/SpecialRecentchanges.php ./includes/SpecialMostlinked.php ./includes/Block.php ./includes/Sanitizer.php ./includes/SpecialRecentchangeslinked.php ./includes/SpecialWantedcategories.php ./includes/FileStore.php ./includes/LinkCache.php ./includes/SpecialUnusedcategories.php ./includes/SpecialDeadendpages.php ./includes/BagOStuff.php ./includes/SpecialShortpages.php ./includes/SpecialFewestrevisions.php ./includes/filerepo/File.php ./includes/filerepo/ICRepo.php ./includes/filerepo/LocalFile.php ./includes/SpecialUnusedimages.php ./includes/QueryPage.php ./includes/SiteStats.php ./includes/SpecialUnwatchedpages.php ./includes/Parser.php ./includes/SpecialExport.php ./includes/DatabaseOracle.php ./includes/Parser_OldPP.php ./includes/SearchPostgres.php ./includes/SpecialMostcategories.php ./includes/SpecialListredirects.php ./includes/SpecialLog.php ./includes/SpecialMostlinkedtemplates.php ./includes/Title.php ./includes/SpecialDisambiguations.php ./includes/SpecialDoubleRedirects.php ./includes/SkinTemplate.php ./includes/SpecialRandompage.php ./includes/SpecialMIMEsearch.php ./includes/SpecialPopularpages.php ./includes/LinkBatch.php ./includes/SpecialWantedpages.php ./includes/api/ApiQueryRecentChanges.php ./includes/Database.php ./includes/SpecialMostlinkedcategories.php ./includes/SpecialMostimages.php ./includes/Skin.php ./includes/SpecialBrokenRedirects.php ./includes/SpecialWatchlist.php ./includes/SearchMySQL.php ./includes/DatabasePostgres.php ./includes/SpecialNewimages.php ./includes/SpecialUnusedtemplates.php ./includes/SpecialMostrevisions.php ./includes/Categoryfinder.php ./includes/SpecialAllmessages.php ./includes/SpecialNewpages.php ./includes/SpecialUncategorizedimages.php ./includes/SpecialUpload.php ./includes/LinksUpdate.php ./includes/Article.php ./includes/WatchlistEditor.php ./skins/disabled/MonoBookCBT.php ./config/index.php ./tests/DatabaseTest.php ./tests/MediaWiki_TestCase.php ./profileinfo.php
------------------------------------------
However, it turned out that access in bzipped files was way too slow, unzipped data was way too large to be of use, and re-indexing would take ages. I even tried sqlite, which bogged down. Maybe sqlite3 does better these days.
The kde app for reading wiki dump does the very thing of reading directly from bz2 files and it is not slow. URL:http://www.kde-apps.org/content/show.php?content=65244
On 2/19/08, Magnus Manske magnusmanske@googlemail.com wrote:
On Feb 19, 2008 3:04 AM, Apple Grew applegrew@gmail.com wrote:
On Feb 19, 2008 3:25 AM, Roan Kattouw roan.kattouw@home.nl wrote:
This more or less exists already in the API:
http://en.wikipedia.org/w/api.php?action=parse&text=%5B%5Bhello%5D%5D&am...
Roan Kattouw (Catrope)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The problem with this is that it needs a install of Mediawiki with database working. The database too must have the necessary template pages in it. If we try to use the the api from the official website then we need a working internet connection for that (at least during parsing the XML file, not always) plus it is pointless as the XML file too contains the template information.
One approach I took some time ago was to alter the database access script. As a quick hack, use regexp to find queries that want text or data, then return bogus data (where it's unimportant for the rendering) or text (retrieve from XML dump). Ignore anything that doesn't start with "SELECT" ;-)
However, it turned out that access in bzipped files was way too slow, unzipped data was way too large to be of use, and re-indexing would take ages. I even tried sqlite, which bogged down. Maybe sqlite3 does better these days.
Magnus
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l