We upgraded to 1.9 yesterday and launched a new design if anyone is
interested in checking it out:
http://www.wikihow.com/Main-Page
We hit the 20,000 article mark just yesterday as well, and should pass
5 million unique readers for the month of May.
Thanks,
Travis
Hi,
There's no way at present to customise the sidebar display depending
on if the user is logged in or not, is there?
Would it be possible, given traffic/caching/whatever? It would
certainly be handy to be able to give more relevant links to different
audiences.
cheers,
Brianna
user:pfctdayelise
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.11alpha (r22594).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
30 previously passing test(s) now FAILING! :(
* Template with thumb image (with link in description) [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Simple image [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Right-aligned image [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Image with caption [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Image with frame and link [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Frameless image caption with a free URL [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Thumbnail image caption with a free URL [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 1887: A ISBN with a thumbnail [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 1887: A RFC with a thumbnail [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 1887: A mailto link with a thumbnail [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 1887: A <math> with a thumbnail- we don't render math in the parsertests by default,
so math is not stripped and turns up as escaped <math> tags. [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 648: Frameless image caption with a link [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 648: Frameless image caption with a link (suffix) [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 648: Frameless image caption with an interwiki link [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 648: Frameless image caption with a piped interwiki link [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Escape HTML special chars in image alt text [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 499: Alt text should have Ӓ, not &1234; [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Image caption containing another image [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Image caption containing a newline [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Bug 3090: External links other than http: in image captions [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 1219 URL next to image (good) [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* BUG 1219 URL next to image (broken) [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Media link [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Media link with text [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Media link with nasty text
fixme: doBlockLevels won't wrap this in a paragraph because it contains a div [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Media link to nonexistent file (bug 1702) [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Centre-aligned image [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* None-aligned image [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* Width + Height sized image (using px) (height is ignored) [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
* <references> after <gallery> (bug 6164) [Introduced between 30-May-2007 07:15:22, 1.11alpha (r22553) and 31-May-2007 07:15:31, 1.11alpha (r22594)]
18 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 465 of 513 tests (90.64%)... 48 tests failed!
Hi all,
My friend Prof. Wu and his students developed Wikigazer
( http://wil.csie.cyut.edu.tw/Wikigazer.php?hl=en ),
a cross-lingual Wikipedia search engine based on Lucene.
I would like to know if you think it's useful or not, and is it
fast enough from your location.
Thank you!
Best Regards,
/Mike/
Hi Robert,
Yes, there are multiple queries. In my scenario, "precision first"
usually implied
the amount of return results is limited. Users may not have patiences on
both waiting
for responses and reading for pages of results. That's why I prefer
sequential process
rather than parallel; I can guess a small and maybe precise result set
first, then query
for more if the result set seems to be too small, i.e. the recall is not
high enough.
For example, a query in Chinese applies word-based analyzer first,
with a limit,
say 1000:
static int m_limit = 1000;
Query query = _a_word_based_Chinese_query_here_;
ArrayList<MyResult> resultList = new ArrayList<MyResult>();
TopDocs topDocs = m_standardSearcher.search(query, (Filter)null,
m_limit);
for(ScoreDoc scoreDoc: topDocs.scoreDocs) {
Document doc = m_standardSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
MyResult aResult = new MyResult(doc, score);
resultList.add(aResult);
}
Once the size of resultList did not reach 1000, another
character-based query
will be fired to get more results up to (1000 - current size).
It's a very simple heuristic and proved to be fast enough on single
P4 2GHz
machine with 2GB RAM, which served for a 3GB Lucene index file. Results
returned within 1 sec, in average.
The problem of all multiple, parallel, or distributed Lucene queries
is, score
merging may not be reasonable, especially when indexes are in different
strategy
of tokenization.
You may be also interested in
http://issues.apache.org/jira/browse/NUTCH-92 ,
http://hellonline.com/blog/?p=55 , and
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html
Thank you!
Cheers,
/Mike/
Robert Stojnic wrote:
>
> Hm, wouldn't that require running multiple queries for a single user
> query? If I'm understanding it correctly it refines search by trying
> different queries, and merges the results?
> For the wikipedia system, speed is of out most importance, since it's
> a high traffic site, and has very few resources (compared to other
> sites of same traffic).
>
> r.
>
> On 5/23/07, *Tian-Jian Barabbas Jiang@Gmail* <barabbas(a)gmail.com
> <mailto:barabbas@gmail.com>> wrote:
>
> Although I bet you have already done it, here's my
> 2 cents:
> I usually adapt a concept to my IR system:
> Precision first, Recall next.
> For example, my system may do exact match first, get
> the results from
>
> searcher.doc(topDocs.scoreDocs[i].doc)
>
> and save them externally.
> It allows me to merge some more partial matched
> results later.
> Apparently these can be done by something like parallel
> queries, but I like to merge them sequentially by myself.
Hi all,
Tian-Jian "Barabbas" Jiang@Gmail wrote:
> It's a very simple heuristic and proved to be fast enough on single
> P4 2GHz
> machine with 2GB RAM, which served for a 3GB Lucene index file. Results
> returned within 1 sec, in average.
BTW, in case you are not satisfied with 1 sec, here's some reasonable
room to
improve:
1. My environment is, actually, lame. It is a Windows 2003 box without the
benefit of inode-alike file system for Lucene's index file format. The worse
thing is, due to the lack of storage space, some secondary index files may
be located on separated machines via SLOW Windows Network Disks.
2. There's no either pagination nor cache in my system.
I bet anyone who has a FreeBSD 6.1 server with nice pagination will get
much better performance than me. ;-)
Regards,
/Mike/
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.11alpha (r22553).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
18 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 495 of 513 tests (96.49%)... 18 tests failed!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Added SVN account for Daniel Cannon (amidaniel), working on API and
other stuff. :)
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD4DBQFGXJ1MwRnhpk1wk44RAhpGAJYvrln0OlAEFDzxKVoBxnJO7HCOAJ0QUJVD
7/1wb0ZI8o5X5sEYb6343Q==
=qPAK
-----END PGP SIGNATURE-----