Hi,
Are there any MediaWiki extensions that allow for searching through the text of uploaded files, such as Word documents, PDFs, etc. - whether it's part of the standard search results or in a separate interface? Or is anyone working on such a thing?
Thanks, Yaron
On Fri, Apr 27, 2012 at 1:59 PM, Yaron Koren yaron@wikiworks.com wrote:
Hi,
Are there any MediaWiki extensions that allow for searching through the text of uploaded files, such as Word documents, PDFs, etc. - whether it's part of the standard search results or in a separate interface? Or is anyone working on such a thing?
Aren't the contents of PDFs indexed in Lucene?
-Chad
On Fri, Apr 27, 2012 at 11:02 AM, Chad innocentkiller@gmail.com wrote:
On Fri, Apr 27, 2012 at 1:59 PM, Yaron Koren yaron@wikiworks.com wrote:
Are there any MediaWiki extensions that allow for searching through the text of uploaded files, such as Word documents, PDFs, etc. - whether it's part of the standard search results or in a separate interface? Or is anyone working on such a thing?
Aren't the contents of PDFs indexed in Lucene?
Nope, they're extracted and stuffed in metadata but not yet stored for search -- see https://bugzilla.wikimedia.org/show_bug.cgi?id=21061 and related.
I have seem a couple one-off extensions for indexing .doc files or such, but don't recall specifically what they are; should be floating somewhere on www.mediawiki.org but I don't know how up to date or reliable they are.
-- brion
If you're using Oracle DB i have and extension (just, not sure i published it yet).
On 27. 04. 2012 19:59, Yaron Koren wrote:
Hi,
Are there any MediaWiki extensions that allow for searching through the text of uploaded files, such as Word documents, PDFs, etc. - whether it's part of the standard search results or in a separate interface? Or is anyone working on such a thing?
Thanks, Yaron
mediawiki-l@lists.wikimedia.org