Subject: Re: [Mediawiki-l] MediaWiki + Lucene-Search2 + MWSearch extension = ZERO search results From: Tim Starling tstarling@wikimedia.org Date: Tue, 01 Jul 2008 08:02:05 +1000
It sounds like you've isolated the problem to within a couple of hundred lines of code. Maybe you should spend less time searching the web for someone with your exact problem, and more time reading that code.
=) I'd agree with ya, if I wasn't so much of a PHP newbie... I'd consider myself more of a Perl and Bash type coder, but I definarely understand where you are coming from with your suggestion. Luckily, I found someone over @ MediaWiki.org's MWSearch Extension_talk page that helped me troubleshoot my issue!
Follow me here :: if I load up the URL in the debug log above (or *everytime* I search now and read the debug log) in a web-browser, like 'lynx' it I see this (or something similar) ;
1 1.0 0 Main_Page
Is this the same response text that MWSearch sees? If yes, where does MWSearch go wrong in interpreting it? If no, what is different about the way MWSearch requests pages compared to lynx? Is it timing out? You can use tcpdump to snoop on the communication between MWSearch and the search server. You can use telnet to generate requests manually and see how the search daemon responds.
-- Tim Starling
Here's what "Brian" from MWSearch Extension_talk page helped identify, summing up his last post, and the results we found from some troubleshooting ;
"we can conclude from this that: 1) PHP can connect to Lucene properly and 2) Your HTTP fetch capabilities are broken. I'm not sure what we can do about it. The proper way is of course to fix the HTTP functions, but I don't know how we can do that. The other option is to write a new HTTP layer which will surely work."
<(root@/var/www/htdocs/wiki-svn06252008)> cd /var/www/htdocs/wiki-svn06252008 <(root@/var/www/htdocs/wiki-svn06252008)> php maintenance/eval.php
$sock = fsockopen('127.0.0.1', 8123); fwrite($sock, "GET /search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10 HTTP/1.0\r\nHost: localhost\r\n\r\n"); print fread($sock, 8192);
HTTP/1.1 200 OK Content-Type: text/plain
1 1.0 0 Main_Page
Http::get('http://127.0.0.1:8123/search/svnwikidb/loopback?namespaces=0&offset=0&am...' http://127.0.0.1:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10%27);
Http::get('http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&am...' http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10%27);
What's the chance some kind soul on the Mediawiki-l mailing list knows, or can point me where I can figure out more in-depth information about MediaWiki's HTTP get function that may be causing my querries to my Lucene-Search-2 daemon on port 8123 to get stripped out?
When using PHP to talk directly to my LuceneSearch2 daemon I get a valid response, and everything works great = the response is displayed, as search results. The problem comes into play within my MediaWiki site once I enable the MWSearch extension (ZERO search results), or as seen above = when I start-up MediaWiki's PHP debug script and try to use HTTP::get to talk with the LuceneSeach2 Daemon, I get no response so it seems... but my LS2 daemon is definately responding to the HTTP::get request! It sounds like MediaWiki is the culprit and MW's HTTP fetch function is somehow stripping the search results --- as demonstrated above. I can also get the search results from my LS2 daemon with a web browser "lynx", telnet or with PHP.
I really hope someone can point me in the right direction, or help a fella' out with diagnosing the issue! Thanks for your time, peace -
agentdcooper@gmail.com
agent dale cooper wrote:
It sounds like you've isolated the problem to within a couple of hundred lines of code. Maybe you should spend less time searching the web for someone with your exact problem, and more time reading that code.
=) I'd agree with ya, if I wasn't so much of a PHP newbie... I'd consider myself more of a Perl and Bash type coder, but I definarely understand where you are coming from with your suggestion.
Just pretend it's perl, it's pretty much the same for these purposes.
It sounds like MediaWiki is the culprit and MW's HTTP fetch function is somehow stripping the search results
Well, Http::get() is only 68 lines. It has two branches, one of uses file_get_contents(), which should emit errors if display_errors is on, and the other uses curl_exec(), which has two error branches which return false silently:
if ( curl_getinfo( $c, CURLINFO_HTTP_CODE ) != 200 ) { if ( curl_errno( $c ) != CURLE_OK ) {
You should determine which one of these MediaWiki is using, and either enable display_errors, or add debugging statements to the two curl error branches.
Or, again, you could use tcpdump, which would probably determine the problem without dealing with the source code.
-- Tim Starling
mediawiki-l@lists.wikimedia.org