Hello. I have an intranet set up that is using mediawiki behind the scenes for content. I set up a search box on the intranet pages, that calls the api query module twice, first for title, then for text. I then take all the text matches and call the index.php render module to get the page text, so I can parse it for the searched term and highlight it in the results. I then sort all the title and page text matches alphabetically by the page titles. This all kind of works as intended, but seems like a crazy amount of hackery, so I'm hoping there's a better way. If not, then maybe you can help me solve these issues:
1) The highlighted search results include html and wikitext code because it's produced by index.php's render. Using strip_tags() helps a little, but only when the matched string has both brackets ( < > ).
2) Categories show up as page title matches if I search on the regular wiki page, but not when I go through the api. I assume the wiki code is just also doing a category search and displaying it in the page title section?
I think I'm also going to split up my title and text search results. I had them combined as that's what the users are used to in a previous system, but I think that just destroys whatever ranking system the search is using. Right?
Thanks all.
- Will
2010/3/11 Will Preston Will@olsonkundigarchitects.com:
Hello. I have an intranet set up that is using mediawiki behind the scenes for content. I set up a search box on the intranet pages, that calls the api query module twice, first for title, then for text. I then take all the text matches and call the index.php render module to get the page text, so I can parse it for the searched term and highlight it in the results. I then sort all the title and page text matches alphabetically by the page titles. This all kind of works as intended, but seems like a crazy amount of hackery, so I’m hoping there’s a better way.
list=search has an srprop=snippet parameter and other goodies in srprop.
1) The highlighted search results include html and wikitext code because it’s produced by index.php’s render. Using strip_tags() helps a little, but only when the matched string has both brackets ( < > ).
... which solves that.
2) Categories show up as page title matches if I search on the regular wiki page, but not when I go through the api. I assume the wiki code is just also doing a category search and displaying it in the page title section?
This is because srnamespace defaults to 0 (main namespace only).
I think I’m also going to split up my title and text search results. I had them combined as that’s what the users are used to in a previous system, but I think that just destroys whatever ranking system the search is using. Right?
I have no idea how the search code works internally, I only know about the code that reformats its results into an API response.
Roan Kattouw (Catrope)
-----Original Message----- From: Roan Kattouw [mailto:roan.kattouw@gmail.com] Sent: Thursday, March 11, 2010 12:33 PM To: MediaWiki API announcements & discussion Subject: Re: [Mediawiki-api] search results
2010/3/11 Will Preston Will@olsonkundigarchitects.com:
Hello. I have an intranet set up that is using mediawiki behind the scenes for content. I set up a search box on the intranet pages, that calls the api query module twice, first for title, then for text. I then take all the text matches and call the index.php render module to get the page text, so I can parse it for the searched term and highlight it in the results. I then sort all the title and page text matches alphabetically by the page titles. This all kind of works as intended, but seems like a crazy amount of hackery, so I'm hoping there's a better way.
list=search has an srprop=snippet parameter and other goodies in srprop.
1) The highlighted search results include html and wikitext code because it's produced by index.php's render. Using strip_tags() helps a little, but only when the matched string has both brackets ( < > ).
... which solves that.
2) Categories show up as page title matches if I search on the regular wiki page, but not when I go through the api. I assume the wiki code is just also doing a category search and displaying it in the page title section?
This is because srnamespace defaults to 0 (main namespace only).
I think I'm also going to split up my title and text search results. I had them combined as that's what the users are used to in a previous system, but I think that just destroys whatever ranking system the search is using. Right?
I have no idea how the search code works internally, I only know about the code that reformats its results into an API response.
Roan Kattouw (Catrope) ---------------------------------------------------------- ----------------------------------------------------------
Thanks, that sounds good. I found the bug report and the linked example, but does that only work in the bleeding edge release? My wiki is version 1.15 and srprop isn't available. Wikipedia's version is apparently 1.16alpha-wmf. Do you suggest I get that, or would something in between be better?
- Will
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
2010/3/12 Will Preston Will@olsonkundigarchitects.com:
Thanks, that sounds good. I found the bug report and the linked example, but does that only work in the bleeding edge release? My wiki is version 1.15 and srprop isn't available. Wikipedia's version is apparently 1.16alpha-wmf. Do you suggest I get that, or would something in between be better?
I suggest you wait for 1.16 to be released, shouldn't take very long now AFAIK.
Roan Kattouw (Catrope)
mediawiki-api@lists.wikimedia.org