[MediaWiki-l] Search - Index uploaded files

Chad innocentkiller at gmail.com
Thu Sep 8 15:39:15 UTC 2016


On Thu, Sep 8, 2016 at 1:22 AM Dr. Hirn <drhirn at gmail.com> wrote:

> Hi Chad,
>
> > So Cirrus will index file contents for which we have a media handler
> > defined.
> > Right now, Pdf and Djvu files have specific media handlers that can
> extract
> > their text contents.
>
> Do I have to configure something more? My uploaded pdf don't get indexed.
>
> The relevant lines in my LocalSettings.php:
>
> wfLoadExtension( 'Elastica' );
> require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";
> $wgCirrusSearchServers = array('xxx.xxx.xxx.xxx');
> $wgSearchType = 'CirrusSearch';
>
>
>
Do you have the PdfHandler extension installed as well? If that's installed
then this should Just Work without any additional configuration. Unless
something has changed recently....


> > If you have an additional media type you want to extract text from,
> that's
> > what
> > would need implementing.
>
> Any hints on that?
>
>
Sure. We've got a class in MediaWiki called ImageHandler. Media types that
require special handling have a subclass of that. Here's the ones for PDF
and
DjVu for example:

https://phabricator.wikimedia.org/diffusion/EPHD/browse/master/PdfHandler_body.php
https://phabricator.wikimedia.org/diffusion/MW/browse/master/includes/media/DjVu.php

If you wanted to index, say, Word documents, you'd need a similar class in
an extension
to provide that support (there might be an extension for word docs already,
I dunno).

-Chad


More information about the MediaWiki-l mailing list