An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.11alpha (r23725).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
18 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 526 of 544 tests (96.69%)... 18 tests failed!
Hi.
In the following days we will release a major update of WikiXRay, along with several new and very interesting graphs.
The most important new features are explained in http://meta.wikimedia.org/wiki/Talk:WikiXRay#Following_Updates
Of special relevance will be the new Python parser, as far as we know, the most complete parser to process Wikipedia XML dumps for research purposes so far.
It can also be easily tweaked to serve as a common parser to import Wikipedia dumps back into the database (though I haven't run precise performance tests yet, someone else could try).
As I will have some spare time this summer (for the first time in years, I think) I will gladly fullfil my previous uncompleted promises of writing in Wikipedia channels about our tool.
More good news soon.
Regards,
Felipe Ortega.
---------------------------------
LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
robchurch(a)svn.wikimedia.org wrote:
> trunk/extensions/ExportWatchlist/ExportWatchlist.i18n.php
> trunk/extensions/ExportWatchlist/ExportWatchlist.php
> trunk/extensions/ExportWatchlist/SpecialExportWatchlist.php
> trunk/extensions/ExportWatchlist/SpecialImportWatchlist.php
Neat!
I think it might be a little easier to handle if it's just presented as
a textarea. Instead of going through "import" and "export" processes,
you could just edit the list directly.
This would be really handy for bulk-*unwatch*es also. You could clear
your whole list or do search-and-replace to remove a lot of pages.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGirifwRnhpk1wk44RAsGEAKCvGJFRAEFjUgOsgi5ozfbzwbiIDACfTYUI
MON3kX1ihv0lGpem8q74hBk=
=G3zL
-----END PGP SIGNATURE-----
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.11alpha (r23693).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
18 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 526 of 544 tests (96.69%)... 18 tests failed!
Hi there,
we use a little MW farm with a wiki-commons for common media files like WP.
We noticed that at the file description articles in wiki-commons there is a automatically generated list of articles where the file is used.
There are only articles from wiki-commons itself.
We would very appreciate to include all articles from our farm.
Any suggestions for configuration or changes?
THX!
BTW:
All wikis reside on the same server but in different databases.
Necessary database user accounts and passwords are known, of course.
Uwe (Baumbach)
U.Baumbach(a)web.de
_______________________________________________________________________
Jetzt neu! Schuetzen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate
kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220
Brion Vibber wrote:
> Lucene cannot be included as it is a program, and requires either
> > precomputed binaries (every operating system needs different ones) or
> the
> > source code and a compiler.
>
> Well, it's Java at least, so only one binary needed on most systems. :)
>
> Still, that's an external dependency, and running a Java daemon is not
> an out-of-the-box task on your standard LAMP server.
>
> I would like to see some improvements to the built-in search, though;
> interface improvements and some better category tagging would help a lot.
I've been playing with zend_search_lucene (for category intersections...
still) and it might make sense to think about writing an extension in that
for general search. I've been avoiding the Java version of lucene for
exactly the reason mentioned ("... running a Java daemon is not an
out-of-the-box task...").
The search on my dataset (4 million records - just categories and page ids)
is not exactly impressive (~10 seconds for worst case scenario of
"Living_People +some other big category") and Luke gives much faster
results, but I've been emailing off and on with Alexander at Zend... he says
this is important to Zend and they're putting effort into improving the
code. I'm also trying clucene - which should be pretty easy to just compile
and run, if I knew what I was doing.
So, to make a long story short - I think there are options for lucene based
but non-java search. If anyone else is working in these areas, I'd love to
hear from them.
Aerik
--
http://www.wikidweb.com - the Wiki Directory of the Web
Tim and I upgraded the lucene search engine during the weekend. It's
currently up for en.wiki, rest is to come in next few days.
I already highlighted some of the features, but here's an update list:
* Improved scoring - the score of the document is now a function of
how many other documents link to it. Same-namespace redirects now
don't show up in the search results, their names are indexed alongside
with the article they point to. E.g. searching USA will give you
United States as first hit. Further, links from the beginning of the
article are weighted more (as they are assumed to give a short
keyword-like description of the article)
* Prefix searches - searcher now understands namespaces, i.e. if you
enter help:images, it will search the Help namespace. You can also
use the 'all' prefix that will search everything. Prefixes are
customizable, i.e. you can make your custom prefixes (see below).
I hope this will bring some ease in searching for help and searching
the project/wikipedia namespace.
* Accentless search - accents are striped, this includes Hebrew
pointing and similar, also adds common transliterations (eg ü -> ue)
* Numbers/stemming - numbers are now included in the index, and
the stemming issues resolved...
If you want the install/customize the new search extension, take
a look at: http://www.mediawiki.org/Extensions:LuceneSearch
Robert