Wikitech-l January 2009

wikitech-l@lists.wikimedia.org

94 participants
87 discussions

mwdumper ERROR Duplicate entry
by Dawson 15 Jan '09

15 Jan '09

Hello, I have used Special:Export at en.wikipedia to export "Diabetes_mellitus" and ticked the box "include templates" (I'm only really after the templates). The resulting XML file is 40.1mb so I decided to go with mwdumper.js rather than Special:Import. I'm working on a fresh build of mediawiki on my local system. When running the command: java -jar mwdumper.jar --format=sql:1.5 Wikipedia-20090113203939.xml | mysql -u root -p wiki It is returning the following error: 1 pages (0.102/sec), 1,000 revs (102.062/sec) ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1 Exception in thread "main" java.io.IOException: XML document structures must start and end within the same entity. at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source) at org.mediawiki.dumper.Dumper.main(Unknown Source) Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. at org .apache .xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source) at org .apache.xerces.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org .apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl $FragmentContentDispatcher.dispatch(Unknown Source) at org .apache .xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:176) ... 2 more Can anyone please advise? After some googling the only advice I managed to find was: "Before you start, try clearing the tables that mwdumper works in: DELETE FROM page; DELETE FROM revision; DELETE FROM text; " I have done this and tried again, but the same error continues. Many thanks, Dawson

3 6

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [45755] trunk/phase3 (IBM DB2 database layer)
by Roan Kattouw 15 Jan '09

15 Jan '09

Just some basic comments, I'm sure Brion has more. leonsp(a)svn.wikimedia.org schreef: > Revision: 45755 > Author: leonsp > Date: 2009-01-14 22:20:15 +0000 (Wed, 14 Jan 2009) > > Log Message: > ----------- > (bug 17028) Added support for IBM DB2 database. config/index.php has new interface elements that only show up if PHP has ibm_db2 module enabled. AutoLoader knows about the new DB2 classes. GlobalFunctions has a new constant for DB2 time format. Revision class fixed slightly. Also includes new PHP files containing the Database and Search API implementations for IBM DB2. > > [...] > Modified: trunk/phase3/includes/Revision.php > =================================================================== > --- trunk/phase3/includes/Revision.php 2009-01-14 22:15:50 UTC (rev 45754) > +++ trunk/phase3/includes/Revision.php 2009-01-14 22:20:15 UTC (rev 45755) > @@ -961,6 +961,10 @@ > */ > static function getTimestampFromId( $title, $id ) { > $dbr = wfGetDB( DB_SLAVE ); > + // Casting fix for DB2 > + if ($id == '') { > + $id = 0; > + } > You should probably use intval($id) here. > [...] > > Added: trunk/phase3/includes/SearchIBM_DB2.php > =================================================================== > --- trunk/phase3/includes/SearchIBM_DB2.php (rev 0) > +++ trunk/phase3/includes/SearchIBM_DB2.php 2009-01-14 22:20:15 UTC (rev 45755) > @@ -0,0 +1,247 @@ > +<?php > +# Copyright (C) 2004 Brion Vibber <brion(a)pobox.com> > If you wrote this file, you should attribute yourself. > > Added: trunk/phase3/includes/db/DatabaseIbm_db2.php > =================================================================== > --- trunk/phase3/includes/db/DatabaseIbm_db2.php (rev 0) > +++ trunk/phase3/includes/db/DatabaseIbm_db2.php 2009-01-14 22:20:15 UTC (rev 45755) > > +/** > + * Utility class for generating blank objects > + * Intended as an equivalent to {} in Javascript > + * @ingroup Database > + */ > +class BlankObject { > +} > Just use $obj = new stdClass; here. > + > +/** > + * This represents a column in a DB2 database > + * @ingroup Database > + */ > +class IBM_DB2Field { > + private $name, $tablename, $type, $nullable, $max_length; > + > + /** > + * Builder method for the class > + * @param Object $db Database interface > + * @param string $table table name > + * @param string $field column name > + * @return IBM_DB2Field > + */ > + static function fromText($db, $table, $field) { > + [...] > + } > + /** > + * Get column name > + * @return string column name > + */ > + function name() { return $this->name; } > + /** > + * Get table name > + * @return string table name > + */ > + function tableName() { return $this->tablename; } > + /** > + * Get column type > + * @return string column type > + */ > + function type() { return $this->type; } > + /** > + * Can column be null? > + * @return bool true or false > + */ > + function nullable() { return $this->nullable; } > + /** > + * How much can you fit in the column per row? > + * @return int length > + */ > + function maxLength() { return $this->max_length; } > +} > Why do you need this? The other Database backends don't have it. > + > +/** > + * Wrapper around binary large objects > + * @ingroup Database > + */ > +class IBM_DB2Blob { > + private $mData; > + > + function __construct($data) { > + $this->mData = $data; > + } > + > + function getData() { > + return $this->mData; > + } > +} > Why do you need these wrapper objects? > [...] > + public function is_numeric_type( $type ) { > + switch (strtoupper($type)) { > + case 'SMALLINT': > + case 'INTEGER': > + case 'INT': > + case 'BIGINT': > + case 'DECIMAL': > + case 'REAL': > + case 'DOUBLE': > + case 'DECFLOAT': > + return true; > + } > + return false; > + } > Indentation looks wrong here. > + /** > + * Construct a LIMIT query with optional offset > + * This is used for query pages > + * $sql string SQL query we will append the limit too > + * $limit integer the SQL limit > + * $offset integer the SQL offset (default false) > + */ > + public function limitResult($sql, $limit, $offset=false) { > + if( !is_numeric($limit) ) { > + throw new DBUnexpectedError( $this, "Invalid non-numeric limit passed to limitResult()\n" ); > + } > + if( $offset ) { > + wfDebug("Offset parameter not supported in limitResult()\n"); > + } > + // TODO implement proper offset handling > + // idea: get all the rows between 0 and offset, advance cursor to offset > + return "$sql FETCH FIRST $limit ROWS ONLY "; > + } > So DB2 renames LIMIT $n to something else and doesn't even implement offset handling, even though both are in the SQL specification? > + /** > + * USE INDEX clause > + * DB2 doesn't have them and returns "" > + * @param sting $index > + */ > + public function useIndexClause( $index ) { > + return ""; > + } > What do you mean DB2 "doesn't have them"? FORCE INDEX isn't supported in DB2? Then unless its index choosing algorithm is extremely good, it won't be able to run certain queries with satisfactory efficiency. > + public function select( $table, $vars, $conds='', $fname = 'DatabaseIbm_db2::select', $options = array(), $join_conds = array() ) > + { > + $res = parent::select( $table, $vars, $conds, $fname, $options, $join_conds ); > + > + // We must adjust for offset > + if ( isset( $options['LIMIT'] ) ) { > + if ( isset ($options['OFFSET'] ) ) { > + $limit = $options['LIMIT']; > + $offset = $options['OFFSET']; > + } > + } > This only sets $limit if both $options['LIMIT'] and $options['OFFSET'] are set, which I'm pretty sure is not what you want. > + > + > + // DB2 does not have a proper num_rows() function yet, so we must emulate it > + // DB2 9.5.3/9.5.4 and the corresponding ibm_db2 driver will introduce a working one > + // Yay! > You probably want to detect the version and use num_rows() if it's available. > + > + // we want the count > + $vars2 = array('count(*) as num_rows'); > + // respecting just the limit option > + $options2 = array(); > + if ( isset( $options['LIMIT'] ) ) $options2['LIMIT'] = $options['LIMIT']; > Can't you just rewrite LIMIT n to FETCH FIRST n ROWS ONLY here? > Added: trunk/phase3/maintenance/ibm_db2/README > =================================================================== > --- trunk/phase3/maintenance/ibm_db2/README (rev 0) > +++ trunk/phase3/maintenance/ibm_db2/README 2009-01-14 22:20:15 UTC (rev 45755) > @@ -0,0 +1,41 @@ > +== Syntax differences between other databases and IBM DB2 == > +{| border cellspacing=0 cellpadding=4 > +!MySQL!!IBM DB2 > +|- > + > +|SELECT 1 FROM $table LIMIT 1 > +|SELECT COUNT(*) FROM SYSIBM.SYSTABLES ST > +WHERE ST.NAME = '$table' AND ST.CREATOR = '$schema' > [...] > This is probably better off as plain text than as wikitext. > Added: trunk/phase3/maintenance/ibm_db2/tables.sql > =================================================================== > --- trunk/phase3/maintenance/ibm_db2/tables.sql (rev 0) > +++ trunk/phase3/maintenance/ibm_db2/tables.sql 2009-01-14 22:20:15 UTC (rev 45755) > @@ -0,0 +1,604 @@ > +-- DB2 > + > +-- SQL to create the initial tables for the MediaWiki database. > +-- This is read and executed by the install script; you should > +-- not have to run it by itself unless doing a manual install. > +-- This is the IBM DB2 version. > +-- For information about each table, please see the notes in maintenance/tables.sql > +-- Please make sure all dollar-quoting uses $mw$ at the start of the line > +-- TODO: Change CHAR/SMALLINT to BOOL (still used in a non-bool fashion in PHP code) > + > + > + > + > +CREATE SEQUENCE user_user_id_seq AS INTEGER START WITH 0 INCREMENT BY 1; > +CREATE TABLE mwuser ( -- replace reserved word 'user' > + user_id INTEGER NOT NULL PRIMARY KEY, -- DEFAULT nextval('user_user_id_seq'), > + user_name VARCHAR(255) NOT NULL UNIQUE, > + user_real_name VARCHAR(255), > + user_password clob(1K), > + user_newpassword clob(1K), > + user_newpass_time TIMESTAMP, > + user_token VARCHAR(255), > + user_email VARCHAR(255), > + user_email_token VARCHAR(255), > + user_email_token_expires TIMESTAMP, > + user_email_authenticated TIMESTAMP, > + user_options CLOB(64K), > + user_touched TIMESTAMP, > + user_registration TIMESTAMP, > + user_editcount INTEGER > +); > +CREATE INDEX user_email_token_idx ON mwuser (user_email_token); > You shouldn't rename indices like that, because index names are used in FORCE INDEX clauses (oh wait, but they weren't supported, right?) > +-- should be replaced with OmniFind, Contains(), etc > +CREATE TABLE searchindex ( > + si_page int NOT NULL, > + si_title varchar(255) NOT NULL default '', > + si_text clob NOT NULL > +); > Don't you need some index on this table to enable efficient searching? Again, this is only a very shallow look from my part, Brion will probably have more interesting stuff to say. Roan Kattouw (Catrope)

4 6

WikiGeist: Wikipedia equivalent of Google's Hot Trends
by WikiGeist 14 Jan '09

14 Jan '09

Hi, Just completed a project using the Wikipedia page counters made available by *Domas Mituzas ( http://lists.wikimedia.org/pipermail/wikitech-l/2007-December/035435.html) *WikiGeist is an attempt to build the Wikipedia equivalent of Google's Hot Trends or other websites' most popular widget. It tracks, aggregates, ranks and reports the page views on en.wikipedia.org. There are three types of report: Top Pages by Count (ranks the articles according to the number of page views during the past hour,) Top New Entries (ranks the articles by page views with prior page views of 0) and Top Pages by Page Count Increase. When articles are accessed individually, a excerpt of the wikipedia page is shown as well as a graph reporting the trend during the past 24 hours. Let me know what you think of it. Thanks. willy -- [[user:Tookam]]

4 5

image redirect bug
by Marcus Buck 14 Jan '09

14 Jan '09

It seems, image redirects are somehow broken. You cannot view history diffs (nothing shows up) and categories won't work in image redirects. Is this bug known? An example is visible at <http://test.wikipedia.org/w/index.php?title=File:ARTI.svg>, see my edits in the history there. It's broken on all wikis, not only on test. I made my test there, to see whether the the new code to move images activated on test will cure the problem. It seems, it will not. Marcus Buck

3 2

Search suggestions
by Huji 14 Jan '09

14 Jan '09

Hello, People on Persian Wikipedia are eager to know if there is a way to let MediaWiki search feature suggest some keywords when misspelled words are search in Persian Wikipedia. As I'm not that familiar with the MediaWiki search suggestions feature, I would be thankful if someone could share their thoughts about this idea. Best, Hojjat (aka Huji)

2 3

padleft/padright patch waiting for review
by Remember the dot 14 Jan '09

14 Jan '09

Comments from developers would be welcome at https://bugzilla.wikimedia.org/show_bug.cgi?id=16852 -- Remember the dot http://en.wikipedia.org/wiki/User:Remember_the_dot

1 0

Downloading templates
by Dawson 13 Jan '09

13 Jan '09

Hello, I am looking for a way to download just the templates, even just a way to download the basic templates that you need to make an infobox would be fine. I have setup a mediawiki and I'm looking to use the infobox template from wikipedia. I've tried to manually copy the template but then realised there were too many transcluded pages and this would take me forever, which led me to http://en.wikipedia.org/wiki/Wikipedia_talk:Database_download#Downloading_t… and it appears that quite a few other people are asking the same question as me. One answer that's been given is to download pages- articles.xml.bz2, however it's 4.1G and includes all Wikipedia articles, and not just the templates. Could someone advise? Thank you, Dawson

3 3

parser->internalParse vs wgParser->parse produces different output for same wikitext
by Alex Powell 13 Jan '09

13 Jan '09

Hi, I have an extension that needs to parse some wiki text. I started with: $title = $parser->internalParse($title); but found that links were not always processing correctly [[Main Page]] was saying as that, but interestingly the first [[:image: ]] tags was parsing $title = $parser->parse($title,$wgTitle,ParserOptions::newFromUser($wgUser),false,false)->getText(); ... however did. This is with MW 1.13.1. However now the batch processes take forever, and the culprit is the "->parse" - it seems to take exponentially longer to run. internalParse runs quicker, so for the time being I have detected commandline functions and NS_SPECIAL pages and sent it that way (as I don't really care if the pages parse correctly or not there). Are there any expected known limitations of internalParse? Should it always output the same as parse or are there certain cases which will never work with it? Kind regards, Alex -- Alex Powell Exscien Training Ltd Tel: +44 (0) 1865 876562 Mob: +44 (0) 759 5048178 skype: alexp700 mailto:alexp@exscien.com http://www.exscien.com Registered in England and Wales 05927635, Unit 10 Wheatley Business Centre, Old London Road, Wheatley, OX33 1XW, England

2 2

Bugzilla Weekly Report
by reporter＠isidore.wikimedia.org 12 Jan '09

12 Jan '09

MediaWiki Bugzilla Report for January 05, 2009 - January 12, 2009 Status changes this week Bugs NEW : 122 Bugs ASSIGNED : 8 Bugs REOPENED : 39 Bugs RESOLVED : 87 Total bugs still open: 3211 Resolutions for the week: Bugs marked FIXED : 47 Bugs marked REMIND : 0 Bugs marked INVALID : 8 Bugs marked DUPLICATE : 19 Bugs marked WONTFIX : 11 Bugs marked WORKSFORME : 2 Bugs marked LATER : 1 Bugs marked MOVED : 0 Specific Product/Component Resolutions & User Metrics New Bugs Per Component Site requests 8 General/Unknown 6 CodeReview 5 Uploading 4 Page rendering 3 New Bugs Per Product MediaWiki 34 Wikimedia 19 MediaWiki extensions 11 Wiktionary tools 1 Top 5 Bug Resolvers JSchulz_4587 [AT] msn.com 22 raimond.spekking [AT] gmail.com 10 rhalsell [AT] wikimedia.org 6 siebrand [AT] wikipedia.be 5 mrzmanwiki [AT] gmail.com 4

1 0

Dump for dawiki is frozen
by Andreas Meier 11 Jan '09

11 Jan '09

Hello, if you look at http://download.wikipedia.org/dawiki/20090109/ the dump process for the new dump for dawiki seems to be frozen. Perhaps it should be killed manually. Best regards Andim

1 0

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2009