Thanks for the response Erik, it's been very informative. I have a few follow up
questions (inline)
On 29 octombrie 2015 17:56:25 EET, Erik Bernhardson <ebernhardson(a)wikimedia.org>
wrote:
On Thu, Oct 29, 2015 at 8:47 AM, Strainu
<strainu10(a)gmail.com> wrote:
Hi,
I've been reading the
mw.org and wikitech pages on Cirrussearch (and
the code) in the hope that I will be able to understand how is the
page content transformed before being sent to ES and how is it kept
in
ES and I have a few questions:
1. Is the documentation available anywhere? I don't see it on
https://doc.wikimedia.org/
Feature documentation is at
https://www.mediawiki.org/wiki/Help:CirrusSearch,
operational documentation is at
https://wikitech.wikimedia.org/wiki/Search
I was referring to the code docs, they make it easier to follow the class hierarchy.
2. What part of the whole ecosystem transforms
the wikitext into
indexable text? Where can I find it? It should be somewhere
downstream
fromCirrusSearch\Updater::updateFromTitle(), but
I can't figure uout
where exactly.
The documents are built using the classes in
https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/tree/master/…
I see you use already parsed text. I'm wondering if using the output of
mwparserfromhell would work - I have some wikitext that is not in a mw database that I
would like to index. I'm guessing I'll have to write some code, but the idea
would be the same.
If this transformation doesn't happen, from
where is the searchable
text obtained?
3. Where can I find the ES schema used for wikipages? Is it different
for images/categories?
ES schema is the same everywhere, the easiest way to see what the data
looks like is just request a dump for a particular page. This will
output
json, i use a chrome extension called JsonView to make this look nice:
https://wikitech.wikimedia.org/wiki/Search?action=cirrusdump
That is very cool indeed.
Thanks again,
Strainu
Thanks,
Strainu
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.