Στις 20-05-2013, ημέρα Δευ, και ώρα 13:18 +0200, ο/η Michael Tsikerdekis έγραψε:
33 pages (0.593/sec), 25,374 revs (455.695/sec) Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048 at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
...
The file itself is fine; proof of that is that I isolated the problematic page, removed the first revision (which had been processed without problems) and then all remaining revisions including the 'bad' one were handled properly.
This is most likely a regression: http://www.gossamer-threads.com/lists/wiki/mediawiki/128069 Our spec says to build against maven's xerces version 2.7.1, and I expect that never got the patch [1]. I'm not sure what version of the xerces library is good ([2]).
I'm adding Chad back on the cc though since he'll have to update the build specs. Chad, do you want a bugzilla report for this?
Ariel
[1] http://www.gossamer-threads.com/lists/wiki/mediawiki/128069 [2] https://issues.apache.org/jira/browse/XERCESJ-1257?page=com.atlassian.jira.p...