[MediaWiki-l] Search - Index uploaded files

Chad innocentkiller at gmail.com
Thu Sep 8 15:39:15 UTC 2016

On Thu, Sep 8, 2016 at 1:22 AM Dr. Hirn <drhirn at gmail.com> wrote:

> Hi Chad,
> > So Cirrus will index file contents for which we have a media handler
> > defined.
> > Right now, Pdf and Djvu files have specific media handlers that can
> extract
> > their text contents.
> Do I have to configure something more? My uploaded pdf don't get indexed.
> The relevant lines in my LocalSettings.php:
> wfLoadExtension( 'Elastica' );
> require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";
> $wgCirrusSearchServers = array('xxx.xxx.xxx.xxx');
> $wgSearchType = 'CirrusSearch';
Do you have the PdfHandler extension installed as well? If that's installed
then this should Just Work without any additional configuration. Unless
something has changed recently....

> > If you have an additional media type you want to extract text from,
> that's
> > what
> > would need implementing.
> Any hints on that?
Sure. We've got a class in MediaWiki called ImageHandler. Media types that
require special handling have a subclass of that. Here's the ones for PDF
DjVu for example:


If you wanted to index, say, Word documents, you'd need a similar class in
an extension
to provide that support (there might be an extension for word docs already,
I dunno).


More information about the MediaWiki-l mailing list