What is the percentage of Wikipedia pages that get edited each day (or week, or month), and has that percentage changed over time?
Or what are the number of daily page views, page edits, and searches?
Of course, on the first day, 100 % of the pages were altered... so it would be reasonable to think that the rate of change slowly decreases as a wiki site grows. Is this so? Do wiki pages find their final form, never to be edited again, or at least more and more seldom? Could we get some numbers on this?
Would it be reasonable to update the search index each time a new version of a page is saved? In that case, the search would still be indexed (and fast), but it would always be up-to-date.
Would it be reasonable to update the search index each time a new version of a page is saved? In that case, the search would still be indexed (and fast), but it would always be up-to-date.
The search engine actually isn't updated anymore. It searches through the MySQL database now, so it is always up-to-date.
Magnus
Lars Aronsson wrote:
Would it be reasonable to update the search index each time a new version of a page is saved? In that case, the search would still be indexed (and fast), but it would always be up-to-date.
This is true now, since the pages are in a true database with Magnus's new software. In the old version, all the data was just stored in text files on disk. I wrote a program to go through and analyze the keywords from all the pages and titles, and construct a search index from that.
I always wanted to put it on a cron job to update nightly, but it was so inefficient that I didn't feel comfortable letting it run without supervision, and I didn't feel comfortable running it all that often.
Now that everything is in a real database, it should be true that with a little playing around and tweaking, we can get decent results that are fast and also always instantly updated.
The current version is a very simple SQL query. It doesn't work so well in terms of being intelligent about returning what you probably want.
Jimbo, I sent you a zillion mails today (your night;), with a better search engine among them.
Magnus
-----Original Message----- From: wikipedia-l-admin@nupedia.com [mailto:wikipedia-l-admin@nupedia.com]On Behalf Of Jimmy Wales Sent: Saturday, January 26, 2002 10:06 PM To: wikipedia-l@nupedia.com Subject: Re: [Wikipedia-l] rate of change
Lars Aronsson wrote:
Would it be reasonable to update the search index each time a new version of a page is saved? In that case, the search would still be indexed (and fast), but it would always be up-to-date.
This is true now, since the pages are in a true database with Magnus's new software. In the old version, all the data was just stored in text files on disk. I wrote a program to go through and analyze the keywords from all the pages and titles, and construct a search index from that.
I always wanted to put it on a cron job to update nightly, but it was so inefficient that I didn't feel comfortable letting it run without supervision, and I didn't feel comfortable running it all that often.
Now that everything is in a real database, it should be true that with a little playing around and tweaking, we can get decent results that are fast and also always instantly updated.
The current version is a very simple SQL query. It doesn't work so well in terms of being intelligent about returning what you probably want. [Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l
Cool, and I'm going through them now, making updates. :-)
Magnus Manske wrote:
Jimbo, I sent you a zillion mails today (your night;), with a better search engine among them.
Magnus
-----Original Message----- From: wikipedia-l-admin@nupedia.com [mailto:wikipedia-l-admin@nupedia.com]On Behalf Of Jimmy Wales Sent: Saturday, January 26, 2002 10:06 PM To: wikipedia-l@nupedia.com Subject: Re: [Wikipedia-l] rate of change
Lars Aronsson wrote:
Would it be reasonable to update the search index each time a new version of a page is saved? In that case, the search would still be indexed (and fast), but it would always be up-to-date.
This is true now, since the pages are in a true database with Magnus's new software. In the old version, all the data was just stored in text files on disk. I wrote a program to go through and analyze the keywords from all the pages and titles, and construct a search index from that.
I always wanted to put it on a cron job to update nightly, but it was so inefficient that I didn't feel comfortable letting it run without supervision, and I didn't feel comfortable running it all that often.
Now that everything is in a real database, it should be true that with a little playing around and tweaking, we can get decent results that are fast and also always instantly updated.
The current version is a very simple SQL query. It doesn't work so well in terms of being intelligent about returning what you probably want. [Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l
[Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l
wikipedia-l@lists.wikimedia.org