Hi, I've traditionally used analog (http://www.analog.cx/) for my log file analysis, but I've recently started using MediaWiki as my CMS for my sites, and now I'm not sure what the best way to analyze the logs is. I use all the pretty url rewriting stuff, so my pages look like this:
http://blah.com/This_Is_a_Page
so I can't use .html, or .php, or all the analog defaults to determine what a "page" is. I'm currently just setting analog to count everything as a page view, but that obviously grossly overestimates the actual page views due to all the css, js, images, and whatnot.
Does anybody have an analog config file they use with MediaWiki that works well? Is there something better than analog to use with MediaWiki? I'm talking just about server log analysis, not client side analytics stuff like urchin/google/etc., that's a different issue.
Thanks for any advice you have, Chris
Chris Hecker:
CH> I use all the pretty url rewriting stuff, so my pages look like this:
CH> http://blah.com/This_Is_a_Page
CH> so I can't use .html, or .php, or all the analog defaults to determine CH> what a "page" is.
we usually recommend people not put pages in the site's root directory. there are many reasons for this, but in this case, if you used "/wiki/Some_page", you would be able to designate /wiki/* as pages.
- river.
River Tarnell wrote:
Chris Hecker: CH> I use all the pretty url rewriting stuff, so my pages look like this: CH> http://blah.com/This_Is_a_Page we usually recommend people not put pages in the site's root directory. there are many reasons for this
Really? Why? I didn't see any mention of this when I was originally setting things up and getting the url rewrite stuff working. Are the reasons documented somewhere?
Also, if the wiki was in a subdir, then all of the css/js stuff and images would also be there, right, so I'd have the same problem. Or are you saying use rewrite to put only the index.php?title= queries into a virtual subdir? It doesn't really matter, because I can't change that now, but I'm curious.
Thanks, Chris
Chris Hecker schreef:
Also, if the wiki was in a subdir, then all of the css/js stuff and images would also be there, right, so I'd have the same problem. Or are you saying use rewrite to put only the index.php?title= queries into a virtual subdir? It doesn't really matter, because I can't change that now, but I'm curious.
The point is that you redirect e.g. /wiki/Main_Page to /w/index.php?title=Main_Page . CSS/JS stuff is in /w/skins/ , and is not affected by the pretty URL stuff. You can easily set up this scheme in Apache by adding "Alias /wiki /w/index.php" (without the quotes) to httpd.conf. The disadvantages of redirecting /Main_Page to /index.php?title=Main_Page is that you have to write your RewriteRules well: you don't want "index.php?title=Main_Page" to be interpreted as a page name, resulting in an infinite loop. Also, you have to watch out for /api.php , /skins/ and /images . It's possible, but it's more trouble than it's worth.
On 05/01/2008, Roan Kattouw roan.kattouw@home.nl wrote:
Chris Hecker schreef:
Also, if the wiki was in a subdir, then all of the css/js stuff and images would also be there, right, so I'd have the same problem. Or are you saying use rewrite to put only the index.php?title= queries into a virtual subdir? It doesn't really matter, because I can't change that now, but I'm curious.
The point is that you redirect e.g. /wiki/Main_Page to /w/index.php?title=Main_Page . CSS/JS stuff is in /w/skins/ , and is not affected by the pretty URL stuff. You can easily set up this scheme in Apache by adding "Alias /wiki /w/index.php" (without the quotes) to httpd.conf. The disadvantages of redirecting /Main_Page to /index.php?title=Main_Page is that you have to write your RewriteRules well: you don't want "index.php?title=Main_Page" to be interpreted as a page name, resulting in an infinite loop. Also, you have to watch out for /api.php , /skins/ and /images . It's possible, but it's more trouble than it's worth.
I had to do something like this recently - a page at internal.foo.com/ that was supposed to have a wiki hanging off it at internal.foo.com/wiki - meaning the main page was internal.foo.com/wiki/Main_Page and *not* internal.foo,com/ itself. This URL arrangement was out of my hands, so I had to somehow make this thing work.
(Now they've discovered wikis are cool(tm) and are considering making internal.foo.com/ be the wiki Main_Page after all ...)
I eventually wrote some insanely complicated rewrite rules :-) They looked a bit like:
Everyone ends up with a twisty maze of little rewrite rules, all different, but you can do most of the awful things you may have to.
- d.
On 06/01/2008, David Gerard dgerard@gmail.com wrote:
I eventually wrote some insanely complicated rewrite rules :-) They looked a bit like:
RewriteEngine On RewriteRule ^/wiki/skins/(.+)$ /w/skins/$1 [L] RewriteRule ^/wiki/images/(.+)$ /w/images/$1 [L] RewriteRule ^/wiki/Images/(.+)$ /w/images/$1 [L] RewriteRule ^/wiki$ /w/index.php?title=Main_Page [PT,L,QSA] RewriteRule ^/wiki/$ /w/index.php?title=Main_Page [PT,L,QSA] # RewriteRule ^/wiki/index.php?title=(.+)$ /w/index.php?title=$1 RewriteRule ^/wiki/(.+)$ /w/index.php?title=$1 [PT,L,QSA]
As you can see, I was basically experimenting to see what the hell worked, commenting and uncommenting lines as I went.
Everyone ends up with a twisty maze of little rewrite rules, all different, but you can do most of the awful things you may have to.
- d.
Everyone ends up with a twisty maze of little rewrite rules, all different, but you can do most of the awful things you may have to.
Just say NO. Long live long URLs! http://www.mediawiki.org/wiki/Manual:Short_URL#Forcing_long_URLs
http://www.mediawiki.org/wiki/Manual:Short_URL#Forcing_long_URLs
Wait, don't you mean http://www.mediawiki.org/w/index.php?title=Manual:Short_URL#Forcing_long_URL... ;)
On Jan 7, 2008 2:12 PM, jidanni@jidanni.org wrote:
Everyone ends up with a twisty maze of little rewrite rules, all different, but you can do most of the awful things you may have to.
Just say NO. Long live long URLs! http://www.mediawiki.org/wiki/Manual:Short_URL#Forcing_long_URLs
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hmm, all of those problem examples work fine for me, and I just followed the main mediawiki pages on the subject. Here are my rewrite rules:
RewriteEngine On # test if rewrite should stop for special directories RewriteRule ~$ - [F] RewriteRule #$ - [F] RewriteRule ^(images|skins|mailman|pipermail)/ - [L] # all php scripts. RewriteRule .php$ - [L] # do the rewrite RewriteRule ^/?(.*)$ /index.php?title=$1 [L,QSA]
See any problems?
Assuming this doesn't have big unfixable problems (and if it does we should update the mediawiki doc pages on this subject), I would definitely say it's worth the trouble because a) it wasn't much trouble, and b) it's just so nice to have the squeaky clean URLs.
Anyway, I'm still a long way from figuring out how to do the log analysis given my current situation, anybody got any ideas on that? If not, I guess I can just look at the hits and grind out a zillion regexps that cut it down to just the pages. I was just hoping somebody had already been through this. :)
Thanks, Chris
David Gerard wrote:
On 05/01/2008, Roan Kattouw roan.kattouw@home.nl wrote:
Chris Hecker schreef:
Also, if the wiki was in a subdir, then all of the css/js stuff and images would also be there, right, so I'd have the same problem. Or are you saying use rewrite to put only the index.php?title= queries into a virtual subdir? It doesn't really matter, because I can't change that now, but I'm curious.
The point is that you redirect e.g. /wiki/Main_Page to /w/index.php?title=Main_Page . CSS/JS stuff is in /w/skins/ , and is not affected by the pretty URL stuff. You can easily set up this scheme in Apache by adding "Alias /wiki /w/index.php" (without the quotes) to httpd.conf. The disadvantages of redirecting /Main_Page to /index.php?title=Main_Page is that you have to write your RewriteRules well: you don't want "index.php?title=Main_Page" to be interpreted as a page name, resulting in an infinite loop. Also, you have to watch out for /api.php , /skins/ and /images . It's possible, but it's more trouble than it's worth.
I had to do something like this recently - a page at internal.foo.com/ that was supposed to have a wiki hanging off it at internal.foo.com/wiki - meaning the main page was internal.foo.com/wiki/Main_Page and *not* internal.foo,com/ itself. This URL arrangement was out of my hands, so I had to somehow make this thing work.
(Now they've discovered wikis are cool(tm) and are considering making internal.foo.com/ be the wiki Main_Page after all ...)
I eventually wrote some insanely complicated rewrite rules :-) They looked a bit like:
Everyone ends up with a twisty maze of little rewrite rules, all different, but you can do most of the awful things you may have to.
- d.
Chris Hecker schreef:
Assuming this doesn't have big unfixable problems (and if it does we should update the mediawiki doc pages on this subject), I would definitely say it's worth the trouble because a) it wasn't much trouble, and b) it's just so nice to have the squeaky clean URLs.
Using Alias /wiki /w/index.php and putting all your stuff in /w is *much* easier. You just have to adjust $wgArticlePath and $wgScript in LocalSettings.php to make it work. A problem with your RewriteRules is that you can't create a page called "api.php" on your wiki (or anything ending in ".php", really).
Roan Kattouw (Catrope)
On 1/6/08, Roan Kattouw roan.kattouw@home.nl wrote:
Using Alias /wiki /w/index.php and putting all your stuff in /w is *much* easier. You just have to adjust $wgArticlePath and $wgScript in LocalSettings.php to make it work. A problem with your RewriteRules is that you can't create a page called "api.php" on your wiki (or anything ending in ".php", really).
And you're going to severely confuse search engines that look for /robots.txt, and user agents that look for /favicon.ico.
A problem with your RewriteRules is that you can't create a page called "api.php" on your wiki
You can't create a page called api.php anyway, since it'd have to be Api.php unless I'm mistaken. :) But seriously, given the way I use MediaWiki that's not a big problem for me, the nice urls are way higher priority given my use cases.
Simetrical wrote:
I recently added an Official Developer-Recommended Rewrite Rule System (TM) section to the page on mediawiki.org:
You should not rewrite articles to be in the document root.
Thanks, but bummer. Well, I guess I'll just hope it doesn't get broken by future releases, because MediaWiki currently makes a great CMS for a "regular looking website", and it would be a shame if everything had to be in a subdirectory for it to work in the future since that's a bit more limiting for the site design. I guess I will live on the edge for a while and hope for the best.
but the former causes a variety of problems, including problems with robots.txt, favicon.ico, and script paths.
I understand the problem with the robots.txt and favicon.ico (or any file in the root), but what do you mean by "script paths"?
Chris
Roan Kattouw wrote:
<div class="moz-text-flowed" style="font-family: -moz-fixed">Chris Hecker schreef: > Assuming this doesn't have big unfixable problems (and if it does we > should update the mediawiki doc pages on this subject), I would > definitely say it's worth the trouble because a) it wasn't much > trouble, and b) it's just so nice to have the squeaky clean URLs. > > Using Alias /wiki /w/index.php and putting all your stuff in /w is *much* easier. You just have to adjust $wgArticlePath and $wgScript in LocalSettings.php to make it work. A problem with your RewriteRules is that you can't create a page called "api.php" on your wiki (or anything ending in ".php", really).
Roan Kattouw (Catrope)
</div>
On 1/6/08, Chris Hecker checker@d6.com wrote:
I understand the problem with the robots.txt and favicon.ico (or any file in the root), but what do you mean by "script paths"?
Things like skins/ that must be outwardly accessible. Again, you wouldn't really be able to create a "Skins" page.
Of course, with more software support, all of this would be perfectly possible. Things like robots.txt and the PHP files can be added to a blacklist checked by the Title constructor, so no page by that name can be created. The software could even be nice enough to transparently serve the correct files if something asks for robots.txt and the admin didn't specifically exempt it from the rewrite rule. There doesn't, however, seem to be much interest in doing any of that at present, and the configuration I posted remains the recommended one.
Simetrical schreef:
On 1/6/08, Chris Hecker checker@d6.com wrote:
I understand the problem with the robots.txt and favicon.ico (or any file in the root), but what do you mean by "script paths"?
Things like skins/ that must be outwardly accessible. Again, you wouldn't really be able to create a "Skins" page.
You would be able to create it, but it would only be accessible through /Skins and not through /skins , which people might find weird . It's best to just blacklist things like that, though, (skins, images, api.php, etc.) to avoid confusion.
Roan Kattouw (Catrope)
On 1/5/08, Chris Hecker checker@d6.com wrote:
Really? Why? I didn't see any mention of this when I was originally setting things up and getting the url rewrite stuff working. Are the reasons documented somewhere?
I recently added an Official Developer-Recommended Rewrite Rule System (TM) section to the page on mediawiki.org:
http://www.mediawiki.org/wiki/Manual:Short_URL#Recommended_setup_.28Wikipedi...
wikitech-l@lists.wikimedia.org