Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
I would like to help you. I'll take a look at it today.
On Tue, Mar 20, 2012 at 12:02 PM, Petr Bena benapetr@gmail.com wrote:
Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Petr Bena schrieb:
Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
Cool. I didn't know that there had already been an engine?
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
Just one litte CSS thing: div.content could need an "overflow: auto;" instead of "overflow:scroll".
Bergi
Yes there is some python script, but it always took so long for it to search something that I always decided to just close browser (10+ minutes to execute search)
On Tue, Mar 20, 2012 at 5:18 PM, Bergi a.d.bergi@web.de wrote:
Cool. I didn't know that there had already been an engine?
On Tue, 20 Mar 2012 04:02:00 -0700, Petr Bena benapetr@gmail.com wrote:
Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
We could use a better mime type for the actual log files themselves so we're not asked to download them in Opera and Chrome. And fragment links directly to the line. Since no-one seams to support rfc5147 how about converting all the old logs into a html or xml+xslt document we can create anchors for fragments in. Maybe we could also try formatting things much more like irc. Turning links into rel=nofollow anchors like irc clients do. Converting the color codes that wikibugs uses. Trying to adapt the name highlighting irc clients use.
Converting it to html is no problem, however the color stuff and such is probably bit harder :) I will try to improve this
On Tue, Mar 20, 2012 at 5:46 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:
On Tue, 20 Mar 2012 04:02:00 -0700, Petr Bena benapetr@gmail.com wrote:
Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
We could use a better mime type for the actual log files themselves so we're not asked to download them in Opera and Chrome. And fragment links directly to the line. Since no-one seams to support rfc5147 how about converting all the old logs into a html or xml+xslt document we can create anchors for fragments in. Maybe we could also try formatting things much more like irc. Turning links into rel=nofollow anchors like irc clients do. Converting the color codes that wikibugs uses. Trying to adapt the name highlighting irc clients use.
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Throw whatever you got into a git repo somewhere, I might help tune that up too.
On Tue, 20 Mar 2012 09:52:11 -0700, Petr Bena benapetr@gmail.com wrote:
Converting it to html is no problem, however the color stuff and such is probably bit harder :) I will try to improve this
On Tue, Mar 20, 2012 at 5:46 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:
On Tue, 20 Mar 2012 04:02:00 -0700, Petr Bena benapetr@gmail.com wrote:
Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
We could use a better mime type for the actual log files themselves so we're not asked to download them in Opera and Chrome. And fragment links directly to the line. Since no-one seams to support rfc5147 how about converting all the old logs into a html or xml+xslt document we can create anchors for fragments in. Maybe we could also try formatting things much more like irc. Turning links into rel=nofollow anchors like irc clients do. Converting the color codes that wikibugs uses. Trying to adapt the name highlighting irc clients use.
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I would rather put it to wikimedia svn, I hope you have an account there
On Tue, Mar 20, 2012 at 6:04 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:
Throw whatever you got into a git repo somewhere, I might help tune that up too.
On Tue, 20 Mar 2012 09:52:11 -0700, Petr Bena benapetr@gmail.com wrote:
Converting it to html is no problem, however the color stuff and such is probably bit harder :) I will try to improve this
On Tue, Mar 20, 2012 at 5:46 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:
On Tue, 20 Mar 2012 04:02:00 -0700, Petr Bena benapetr@gmail.com wrote:
Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
We could use a better mime type for the actual log files themselves so we're not asked to download them in Opera and Chrome. And fragment links directly to the line. Since no-one seams to support rfc5147 how about converting all the old logs into a html or xml+xslt document we can create anchors for fragments in. Maybe we could also try formatting things much more like irc. Turning links into rel=nofollow anchors like irc clients do. Converting the color codes that wikibugs uses. Trying to adapt the name highlighting irc clients use.
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Sure... ;) I just like git.
On Tue, 20 Mar 2012 10:06:35 -0700, Petr Bena benapetr@gmail.com wrote:
I would rather put it to wikimedia svn, I hope you have an account there
On Tue, Mar 20, 2012 at 6:04 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:
Throw whatever you got into a git repo somewhere, I might help tune that up too.
On Tue, 20 Mar 2012 09:52:11 -0700, Petr Bena benapetr@gmail.com wrote:
Converting it to html is no problem, however the color stuff and such is probably bit harder :) I will try to improve this
On Tue, Mar 20, 2012 at 5:46 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:
On Tue, 20 Mar 2012 04:02:00 -0700, Petr Bena benapetr@gmail.com wrote:
Hi,
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
http://bots.wmflabs.org/~wm-bot/searchlog/
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
We could use a better mime type for the actual log files themselves so we're not asked to download them in Opera and Chrome. And fragment links directly to the line. Since no-one seams to support rfc5147 how about converting all the old logs into a html or xml+xslt document we can create anchors for fragments in. Maybe we could also try formatting things much more like irc. Turning links into rel=nofollow anchors like irc clients do. Converting the color codes that wikibugs uses. Trying to adapt the name highlighting irc clients use.
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi. I'm going to combine replies just so I don't hit wikitech-l a dozen times.
Petr Bena wrote:
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
Fantastic! It seems to be much better than the old search. :-)
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
You've got an XSS vulnerability in the current script. You need to escape all output! In this case, you need to be sure to escape quotation marks in particular.
Bergi wrote:
Cool. I didn't know that there had already been an engine?
Yes, I hacked one up some time ago. It lives at https://toolserver.org/~mwbot/.
Petr Bena wrote:
Yes there is some python script, but it always took so long for it to search something that I always decided to just close browser (10+ minutes to execute search)
Yes, it's just a very simple (and quite hackish) Python CGI wrapper for the operating system's grep. As the logs have grown, grepping has taken longer and longer. Plus the results truncation is done at the Python level, not the grep level, so a search with a lot of results takes much longer to return results, as I recall. A proper search index is going to be much better. :D
MZMcBride
So, here is the repo:
https://github.com/benapetr/wikimedia-botslogs
I will move it to wikimedia git later :) it's pretty ugly code I forked from my previous project I've been working on (huggle wa), also it's poorly commented. I will try to improve that
On Wed, Mar 21, 2012 at 12:06 AM, MZMcBride z@mzmcbride.com wrote:
Hi. I'm going to combine replies just so I don't hit wikitech-l a dozen times.
Petr Bena wrote:
I created a search engine for irc logs, it works much faster than current engine and it's written in php. I will send a source code soon.
Fantastic! It seems to be much better than the old search. :-)
I don't understand html, so output is ugly. If someone wants to help to improve it, let me know.
You've got an XSS vulnerability in the current script. You need to escape all output! In this case, you need to be sure to escape quotation marks in particular.
Bergi wrote:
Cool. I didn't know that there had already been an engine?
Yes, I hacked one up some time ago. It lives at https://toolserver.org/~mwbot/.
Petr Bena wrote:
Yes there is some python script, but it always took so long for it to search something that I always decided to just close browser (10+ minutes to execute search)
Yes, it's just a very simple (and quite hackish) Python CGI wrapper for the operating system's grep. As the logs have grown, grepping has taken longer and longer. Plus the results truncation is done at the Python level, not the grep level, so a search with a lot of results takes much longer to return results, as I recall. A proper search index is going to be much better. :D
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org