Subject:
Re: [Mediawiki-l] MediaWiki + Lucene-Search2 + MWSearch extension =
ZERO search results
From:
Tim Starling <tstarling(a)wikimedia.org>
Date:
Tue, 01 Jul 2008 08:02:05 +1000
It sounds like you've isolated the problem to within a couple of
hundred lines of code. Maybe you should spend less time searching the
web for someone with your exact problem, and more time reading that code.
=)
I'd agree with ya, if I wasn't so much of a PHP newbie... I'd
consider myself more of a Perl and Bash type coder, but I definarely
understand where you are coming from with your suggestion. Luckily, I
found someone over @ MediaWiki.org's MWSearch Extension_talk page that
helped me troubleshoot my issue!
Follow me here
:: if I load up the URL in the debug log above (or
*everytime* I search now and read the debug log) in a web-browser, like
'lynx' it I see this (or something similar) ;
1
1.0 0 Main_Page
Is this the same response text that MWSearch sees? If yes, where does
MWSearch go wrong in interpreting it? If no, what is different about
the way MWSearch requests pages compared to lynx? Is it timing out?
You can use tcpdump to snoop on the communication between MWSearch and
the search server. You can use telnet to generate requests manually
and see how the search daemon responds.
-- Tim Starling
Here's what "Brian" from MWSearch Extension_talk
page helped identify,
summing up his last post, and the results we found from some
troubleshooting ;
"we can conclude from this that: 1) PHP can connect to Lucene properly
and 2) Your HTTP fetch capabilities are broken. I'm not sure what we can
do about it. The proper way is of course to fix the HTTP functions, but
I don't know how we can do that. The other option is to write a new HTTP
layer which will surely work."
<(root@/var/www/htdocs/wiki-svn06252008)> cd /var/www/htdocs/wiki-svn06252008
<(root@/var/www/htdocs/wiki-svn06252008)> php maintenance/eval.php
$sock = fsockopen('127.0.0.1', 8123);
fwrite($sock, "GET
/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10
HTTP/1.0\r\nHost: localhost\r\n\r\n"); print fread($sock, 8192);
HTTP/1.1 200
OK
Content-Type: text/plain
1
1.0 0 Main_Page
print
Http::get('http://127.0.0.1:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10'
<http://127.0.0.1:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10%27>);
print
Http::get('http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10'
<http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10%27>);
What's the chance some kind soul on the Mediawiki-l mailing list knows,
or can point me where I can figure out more in-depth information about
MediaWiki's HTTP get function that may be causing my querries to my
Lucene-Search-2 daemon on port 8123 to get stripped out?
When using PHP to talk directly to my LuceneSearch2 daemon I get a valid
response, and everything works great = the response is displayed, as
search results. The problem comes into play within my MediaWiki site
once I enable the MWSearch extension (ZERO search results), or as seen
above = when I start-up MediaWiki's PHP debug script and try to use
HTTP::get to talk with the LuceneSeach2 Daemon, I get no response so it
seems... but my LS2 daemon is definately responding to the HTTP::get
request! It sounds like MediaWiki is the culprit and MW's HTTP fetch
function is somehow stripping the search results --- as demonstrated
above. I can also get the search results from my LS2 daemon with a web
browser "lynx", telnet or with PHP.
I really hope someone can point me in the right direction, or help a
fella' out with diagnosing the issue! Thanks for your time, peace -
agentdcooper(a)gmail.com