Lucene Indexer Error

List overview All Threads
Download

newer

older

Looking for MediaWiki...

Problem with custom namespace:...

Emufarmers Sangly

29 Jun 2007 29 Jun '07

11:30 a.m.

I've set up Lucene following the instructions on http://meta.wikimedia.org/wiki/Installing_lucene_search When I get to the indexing stage, I get this error: Unhandled Exception: java.io.IOException: no root element: U+58 at org.mediawiki.importer.XmlDumpReader.readDump () [0x00000] at MediaWiki.Search.SearchTool.SearchTool.ImportDump (System.Stringdumpfile, System.String database) [0x00000] at MediaWiki.Search.SearchTool.SearchTool.Main (System.String[] args) [0x00000] My friend has also set up Lucene on his machine, and he gets the same error. We're both at a loss about what the problem is.

Show replies by date

Brion Vibber

29 Jun 29 Jun

1:41 p.m.

New subject: [Mediawiki-l] Lucene Indexer Error

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...

I've set up Lucene following the instructions on http://meta.wikimedia.org/wiki/Installing_lucene_search When I get to the indexing stage, I get this error: Unhandled Exception: java.io.IOException: no root element: U+58

You might want to confirm that your XML dump file is ok.

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGhVJhwRnhpk1wk44RAvUnAKCMsJYRFV5kh+zicQ8FzNjWDEj0EACfS5sW rzOHrU9ghUHkrPKVB5ynfIk= =Uwoz -----END PGP SIGNATURE-----

Emufarmers Sangly

5:44 p.m.

New subject: [Mediawiki-l] Lucene Indexer Error

On 6/29/07, Brion Vibber brion@wikimedia.org wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...
I've set up Lucene following the instructions on http://meta.wikimedia.org/wiki/Installing_lucene_search When I get to

the

...
indexing stage, I get this error: Unhandled Exception: java.io.IOException: no root element: U+58

You might want to confirm that your XML dump file is ok.

How can I do this? We took the dumps two weeks apart, so I don't see how there could be a problem unless there's a problem in the database itself, or with the dumping maintenance tool.

Brion Vibber

2 Jul 2 Jul

10:30 a.m.

New subject: [Mediawiki-l] Lucene Indexer Error

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...

On 6/29/07, Brion Vibber brion@wikimedia.org wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...
I've set up Lucene following the instructions on http://meta.wikimedia.org/wiki/Installing_lucene_search When I get to

the

...
indexing stage, I get this error: Unhandled Exception: java.io.IOException: no root element: U+58

You might want to confirm that your XML dump file is ok.

How can I do this? We took the dumps two weeks apart, so I don't see how there could be a problem unless there's a problem in the database itself, or with the dumping maintenance tool.

Try looking at the file.

Is it properly-formatted XML?

Or does it have, say, CGI headers at the start of the file?

Or is it compressed?

Or does it have a big error message?

Or something else?

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGiRoswRnhpk1wk44RAm3aAKDQek+K5zS4kZJ309U9vJiFBNZ17wCfdyGg bV9a2NgmalA8mOOX/igzI94= =euMN -----END PGP SIGNATURE-----

Emufarmers Sangly

1:59 p.m.

New subject: [Mediawiki-l] Lucene Indexer Error

On 7/2/07, Brion Vibber brion@wikimedia.org wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...
On 6/29/07, Brion Vibber brion@wikimedia.org wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...
I've set up Lucene following the instructions on http://meta.wikimedia.org/wiki/Installing_lucene_search When I get to

the

...
indexing stage, I get this error: Unhandled Exception: java.io.IOException: no root element: U+58

You might want to confirm that your XML dump file is ok.

How can I do this? We took the dumps two weeks apart, so I don't see

how

...
there could be a problem unless there's a problem in the database

itself, or

...
with the dumping maintenance tool.

Try looking at the file.

Is it properly-formatted XML?

Or does it have, say, CGI headers at the start of the file?

Or is it compressed?

Or does it have a big error message?

Or something else?

-- brion vibber (brion @ wikimedia.org)

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGiRoswRnhpk1wk44RAm3aAKDQek+K5zS4kZJ309U9vJiFBNZ17wCfdyGg bV9a2NgmalA8mOOX/igzI94= =euMN -----END PGP SIGNATURE-----

Oh, yes, I got it sorted a couple days ago: The server was adding headers to the dumps, so I used -q for getting them (sorry for not posting back sooner).

The issue I'm presently grappling with is how to get a fresh index every 24 hours and have the daemon recognize it. A cronjob and a script get the index and import it okay, but it seems as though I need to restart the daemon for it to use the new index. Right now I just have the script do: killall mono (just doing killall MWDaemon didn't kill all the necessary processes) MWDaemon

Of course, this probably isn't the best solution, since it would cause problems in the unlikely event I ever run something else with mono. I see that there's some sort of update daemon included with the package, but I'm not sure how to run it properly. I'm also trying to find if there's some sort of shutdown signal that the daemon will accept over GET.

Brion Vibber

2:17 p.m.

New subject: [Mediawiki-l] Lucene Indexer Error

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...

The issue I'm presently grappling with is how to get a fresh index every 24 hours and have the daemon recognize it. A cronjob and a script get the index and import it okay, but it seems as though I need to restart the daemon for it to use the new index.

That's right.

...

Right now I just have the script do: killall mono (just doing killall MWDaemon didn't kill all the necessary processes)

/etc/init.d/mwsearch restart

Assuming you installed the init scripts.

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGiU9EwRnhpk1wk44RAn8PAKCdXVQ1lo0DJt6JumZ88HacPaYnuQCfY5yn pG0D/jEhdOkplRDad4rco7Q= =M5oD -----END PGP SIGNATURE-----

Emufarmers Sangly

4:20 p.m.

New subject: [Mediawiki-l] Lucene Indexer Error

On 7/2/07, Brion Vibber brion@wikimedia.org wrote:

...

...
Right now I just have the script do: killall mono (just doing killall MWDaemon didn't kill all the necessary processes)

/etc/init.d/mwsearch restart

Assuming you installed the init scripts.

It doesn't appear that I did. Where can I get them/find instructions for setting them up?

I just noticed the Extension:LuceneSearch (looks like somebody added it to the Lucene page just yesterday); should I be using this instead of mwsearch?

Brion Vibber

3 Jul 3 Jul

11:58 a.m.

New subject: [Mediawiki-l] Lucene Indexer Error

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Emufarmers Sangly wrote:

...

On 7/2/07, Brion Vibber brion@wikimedia.org wrote:

...
...
Right now I just have the script do: killall mono (just doing killall MWDaemon didn't kill all the necessary processes)

/etc/init.d/mwsearch restart

Assuming you installed the init scripts.

It doesn't appear that I did. Where can I get them/find instructions for setting them up?

make install

...

I just noticed the Extension:LuceneSearch (looks like somebody added it to the Lucene page just yesterday); should I be using this instead of mwsearch?

I'd recommend giving the new one a try.

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGioAnwRnhpk1wk44RAsQSAKCkYaUY2qJw6GQO2RtpYlPECiMHeACguDVa EYNBWyMVvX+Nb0Vqt4rUkoM= =VAip -----END PGP SIGNATURE-----

6355

Age (days ago)

6359

Last active (days ago)

mediawiki-l@lists.wikimedia.org

7 comments

2 participants

tags (0)

participants (2)

Brion Vibber
Emufarmers Sangly