Re: [MediaWiki-l] Search - Index uploaded files

8 Sep 2016

On Thu, Sep 8, 2016 at 1:22 AM Dr. Hirn &lt;drhirn(a)gmail.com&gt; wrote:

...
  Hi Chad,

  So Cirrus will index file contents for which we
have a media handler
 defined.
 Right now, Pdf and Djvu files have specific media handlers that can  extract
  their text contents. 
 Do I have to configure something more? My uploaded pdf don't get indexed.

 The relevant lines in my LocalSettings.php:

 wfLoadExtension( 'Elastica' );
 require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";
 $wgCirrusSearchServers = array('xxx.xxx.xxx.xxx');
 $wgSearchType = 'CirrusSearch';

 Do you have the PdfHandler extension installed as well? If that's installed
then this should Just Work without any additional configuration. Unless
something has changed recently....

...
   If you have an
additional media type you want to extract text from,  that's
  what
 would need implementing. 
 Any hints on that?

 Sure. We've got a class in MediaWiki called ImageHandler. Media types that
require special handling have a subclass of that. Here's the ones for PDF
and
DjVu for example:

https://phabricator.wikimedia.org/diffusion/EPHD/browse/master/PdfHandler_b…
https://phabricator.wikimedia.org/diffusion/MW/browse/master/includes/media…

If you wanted to index, say, Word documents, you'd need a similar class in
an extension
to provide that support (there might be an extension for word docs already,
I dunno).

-Chad

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [MediaWiki-l] Search - Index uploaded files