mmm, yummy. When will we get up the nerve to turn full-text searching back on?
--Jimbo
Is this even a good idea? I know everyone has assumed we will, but the current use of Google has its advantages too (see the Village Pump).
Or has this been fully discussed here already, long ago?
Andrew Alder aka user:andrewa
At 07:10 AM 3/12/03 -0800, Jimmy Wales wrote:
mmm, yummy. When will we get up the nerve to turn full-text searching back on?
--Jimbo _______________________________________________ Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
**** andrewa @ alder . ws http://www.zeta.org.au/~andrewa Phone 9441 4476 Mobile 04 2525 4476 ****
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
I wrote:
mmm, yummy. When will we get up the nerve to turn full-text searching back on?
Andrew Alder wrote:
Is this even a good idea? I know everyone has assumed we will, but the current use of Google has its advantages too (see the Village Pump).
Or has this been fully discussed here already, long ago?
As for me, I always just assumed it. There are some big drawbacks to google, namely that it isn't realtime, which makes doing certain kinds of study difficult. Also, Michael Hardy has reported to me that one page he used to find in Google can no longer be found in Google, due presumably to the vagaries of Google indexing.
--Jimbo
On Wednesday 03 December 2003 14:55, Jimmy Wales wrote:
I wrote:
mmm, yummy. When will we get up the nerve to turn full-text searching back on?
Andrew Alder wrote:
Is this even a good idea? I know everyone has assumed we will, but the current use of Google has its advantages too (see the Village Pump).
Or has this been fully discussed here already, long ago?
As for me, I always just assumed it. There are some big drawbacks to google, namely that it isn't realtime, which makes doing certain kinds of study difficult. Also, Michael Hardy has reported to me that one page he used to find in Google can no longer be found in Google, due presumably to the vagaries of Google indexing.
I have stumbled upon quite a few wikipedia pages that were not being indexed by google, but the same pages on one of the many mirror-type (nationmaster, etc.) sites were being indexed. My take on the whole situation is that google is treating en.wikipedia.org and en2.wikipedia.org as two different entities. Search for 'Rivers of France wikipedia' (http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=utf-8&q=R...) to see an example of both en and en2 competing for the top spot.
Example for knock-offs scoring higher than wikipedia: Search for 'Napoleonic code' (http://www.google.com/search?q=Napoleonic+code&sourceid=mozilla-search&a...) where both sciencedaily.com (IIRC a rather new mirror) and nationmaster.com rank higher than wikipedia. This could not be explained with the google pagerank algorithm, because wikipedia surely gets a lot more quality and quantity links than others do - But it would make sense if the wikipedia ranking essentially gets divided by two.
This would seem to reduce traffic to wikipedia, which would obviously be a bad thing. Is there some different load-balancing scheme that could be implemented that would be transparent to google?
Best, Sascha Noyes
This would seem to reduce traffic to wikipedia, which would obviously be a bad thing. Is there some different load-balancing scheme that could be implemented that would be transparent to google?
Well, this has been discussed before. Either some kind of load balancing equipment (Cisco), load-balancing software (on Linux, say), reverse proxy software (such as in Apache), or round-robin DNS (easiest). I forget why we decided to do en and en2 in the first place...
On Dec 3, 2003, at 12:55, Nick Reinking wrote:
Well, this has been discussed before. Either some kind of load balancing equipment (Cisco), load-balancing software (on Linux, say), reverse proxy software (such as in Apache), or round-robin DNS (easiest).
Yup. Also it would not be impossible to let things that don't claim to be Mozilla (most browsers claim to be) see only en, while the Mozillas (and compatibles) get bounced when necessary to en2.
I forget why we decided to do en and en2 in the first place...
Because we don't have a load-balancing router and I have no access to our DNS server, and it was quick to setup in the meantime.
-- brion vibber (brion @ pobox.com)
Sascha Noyes wrote:
Example for knock-offs scoring higher than wikipedia: Search for 'Napoleonic code' (http://www.google.com/search?q=Napoleonic+code&sourceid=mozilla-search&a...) where both sciencedaily.com (IIRC a rather new mirror) and nationmaster.com rank higher than wikipedia. This could not be explained with the google pagerank algorithm, because wikipedia surely gets a lot more quality and quantity links than others do - But it would make sense if the wikipedia ranking essentially gets divided by two.
Sciencedaily.com seems to get referenced a lot by the scientific and education communities - I got the impression that the couple running it have a lot of friends and contacts in those communities, but that might just be a side-effect of being journalists.
New WP slogan idea: "why look at a mirror when you can bask in the sun of the real Wikipedia?"
(Yeah yeah, sunburned by vandalism then incinerated in the nuclear furnace of an edit war... :-) )
Stan
G'day Jimmy and the Group (and Ray and Brion whose posts crossed this one)
At 11:55 AM 3/12/03 -0800, Jimmy Wales wrote:
I wrote:
mmm, yummy. When will we get up the nerve to turn full-text searching back on?
Andrew Alder wrote:
Is this even a good idea? I know everyone has assumed we will, but the current use of Google has its advantages too (see the Village Pump).
Or has this been fully discussed here already, long ago?
As for me, I always just assumed it. There are some big drawbacks to google, namely that it isn't realtime, which makes doing certain kinds of study difficult. Also, Michael Hardy has reported to me that one page he used to find in Google can no longer be found in Google, due presumably to the vagaries of Google indexing.
Hmmmm. The not being realtime may be a plus or minus, this is the very thing we tossed around a little in the Pump.
Certainly for Wikipedia contributors, realtime is best. But perhaps not so for readers, who are after stable content.
And there may be another advantage in using Google. AFAIK Google doesn't publish their ranking algorithms, or even say if they change, so as to impede attempts to rig the rankings. But, from time to time they may well take notice if a number of different IPs using their search engine through Wikipedia. This isn't rigging the rankings, on the contrary, it's providing Google with accurate and relevant information, which they may use. It can't do our rankings any harm!
Food for thought? I'm not at all opposed to having an in-house search engine, I just thought it might be good to consider the pros, cons, and alternatives.
I've had no trouble using Google, in fact I've had a couple of instances where "go to" didn't find an article I knew was there, but Google did! I'm afraid I haven't documented these, they were no big hassle and may have been capitalisation or the like.
Andrew A
**** andrewa @ alder . ws http://www.zeta.org.au/~andrewa Phone 9441 4476 Mobile 04 2525 4476 ****
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
Full taxt searching can make some editing a lot easier for such housekeeping as fixing misspellings and broken links.
Ec
Andrew Alder wrote:
Is this even a good idea? I know everyone has assumed we will, but the current use of Google has its advantages too (see the Village Pump).
Or has this been fully discussed here already, long ago?
Jimmy Wales wrote:
mmm, yummy. When will we get up the nerve to turn full-text searching back on?
On Dec 3, 2003, at 12:04, Andrew Alder wrote:
Is this even a good idea? I know everyone has assumed we will, but the current use of Google has its advantages too (see the Village Pump).
It doesn't update live. It can't reach some pages at all. There's no control over how to cull results by namespace (ie, articles only). etc.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org