So, here is the repo:
https://github.com/benapetr/wikimedia-botslogs
I will move it to wikimedia git later :) it's pretty ugly code I forked from my previous project I've been working on (huggle wa), also it's poorly commented. I will try to improve that
On Wed, Mar 21, 2012 at 12:06 AM, MZMcBride z@mzmcbride.com wrote:
Hi. I'm going to combine replies just so I don't hit wikitech-l a dozen times.
Petr Bena wrote:
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
Fantastic! It seems to be much better than the old search. :-)
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
You've got an XSS vulnerability in the current script. You need to escape all output! In this case, you need to be sure to escape quotation marks in particular.
Bergi wrote:
Cool. I didn't know that there had already been an engine?
Yes, I hacked one up some time ago. It lives at https://toolserver.org/~mwbot/.
Petr Bena wrote:
Yes there is some python script, but it always took so long for it to search something that I always decided to just close browser (10+ minutes to execute search)
Yes, it's just a very simple (and quite hackish) Python CGI wrapper for the operating system's grep. As the logs have grown, grepping has taken longer and longer. Plus the results truncation is done at the Python level, not the grep level, so a search with a lot of results takes much longer to return results, as I recall. A proper search index is going to be much better. :D
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l