Re: [Wikitech-l] Indexing non-text content in LuceneSearch

8 Mar 2013

      (1) seems like the right way to go to me too.
There may be other ways but puppet/files/lucene/lucene.jobs.sh has a
function called
import-db() which creates a dump like this:
php $MWinstall/common/multiversion/MWScript.php dumpBackup.php $dbname
--current > $dumpfile
Ram
On Thu, Mar 7, 2013 at 1:05 PM, Daniel Kinzler daniel@brightbyte.de wrote:
...
On 07.03.2013 20:58, Brion Vibber wrote:
...
...

The indexer code (without plugins) should not know about Wikibase,

but it may
...
...
have hard coded knowledge about JSON. It could have a special indexing
mode for
...
...
JSON, in which the structure is deserialized and traversed, and any
values are
...
...
added to the index (while the keys used in the structure would be
ignored). We
...
...
may still be indexing useless interna from the JSON, but at least there
would be
...
...
a lot fewer false negatives.
Indexing structured data could be awesome -- again I think of file
metadata as well as wikidata-style stuff. But I'm not sure how easy
that'll be. Should probably be in addition to the text indexing,
rather than replacing.
Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want
indexed structured data, the question is just how to get that into the
LSearch
infrastructure.
-- daniel

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Indexing non-text content in LuceneSearch