G'day Peter and the Group
At 07:39 AM 4/12/03 +0000, Peter Bartlett wrote:
. The not being realtime may be a plus or minus, this is the very
thing we tossed around a little in the Pump.
Certainly for Wikipedia contributors, realtime is best. But perhaps
not so for readers, who are after stable content.
As was pointed out on the pump, there is no reason to suppose that the pedia was any more "stable" when Google took its snapshot than it is at any other time..
True. No argument at all with this.
But the probability that Google indexes a particular version is roughly proportional to the time for which that version is the current version. Therefore, the version presented by Google is on average more stable than the "current" version. I tried to point this out, but I'm afraid I didn't do it very clearly.
These two effects tend to cancel each other out. The version presented by Google is not necessarily better than the current version, but for readers it's probably no worse and may even be better. That was my point.
Andrew A
**** andrewa @ alder . ws http://www.zeta.org.au/~andrewa Phone 9441 4476 Mobile 04 2525 4476 ****
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
On Dec 4, 2003, at 01:47, Andrew Alder wrote:
But the probability that Google indexes a particular version is roughly proportional to the time for which that version is the current version. Therefore, the version presented by Google is on average more stable than the "current" version. I tried to point this out, but I'm afraid I didn't do it very clearly.
When somebody clicks the link in the Google search results they see the *current* version, not Google's crawl-time cached copy (unless they happen to know what the Google cache is, how to use it, and prefer to do so instead of clicking through to the page itself, which is sure to be a vanishingly small proportion of visitors). The key words in the search are likely to be in the title itself or general description, and will probably be fairly stable across revisions.
Google is how people get _to_ Wikipedia and does a good job at it; as an internal navigation mechanism it's wholly unsatisfactory for contributors who need to be able to check the current state of things in detail.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
On Dec 4, 2003, at 01:47, Andrew Alder wrote:
But the probability that Google indexes a particular version is roughly proportional to the time for which that version is the current version. Therefore, the version presented by Google is on average more stable than the "current" version. I tried to point this out, but I'm afraid I didn't do it very clearly.
When somebody clicks the link in the Google search results they see the *current* version, not Google's crawl-time cached copy (unless they happen to know what the Google cache is, how to use it, and prefer to do so instead of clicking through to the page itself, which is sure to be a vanishingly small proportion of visitors). The key words in the search are likely to be in the title itself or general description, and will probably be fairly stable across revisions.
Google is how people get _to_ Wikipedia and does a good job at it; as an internal navigation mechanism it's wholly unsatisfactory for contributors who need to be able to check the current state of things in detail.
Why not offer all: Search box - search button - radio button "Go" (preselected) - radio button "fulltext" - radio button "Google"
Magnus
G'day Magnus, Brion and the Group
At 02:26 PM 4/12/03 +0100, Magnus Manske wrote:
Brion Vibber wrote:
On Dec 4, 2003, at 01:47, Andrew Alder wrote:
But the probability that Google indexes a particular version is roughly proportional to the time for which that version is the current version. Therefore, the version presented by Google is on average more stable than the "current" version. I tried to point this out, but I'm afraid I didn't do it very clearly.
When somebody clicks the link in the Google search results they see the *current* version, not Google's crawl-time cached copy (unless they happen to know what the Google cache is, how to use it, and prefer to do so instead of clicking through to the page itself, which is sure to be a vanishingly small proportion of visitors). The key words in the search are likely to be in the title itself or general description, and will probably be fairly stable across revisions.
Google is how people get _to_ Wikipedia and does a good job at it; as an internal navigation mechanism it's wholly unsatisfactory for contributors who need to be able to check the current state of things in detail.
Why not offer all: Search box - search button - radio button "Go" (preselected) - radio button "fulltext" - radio button "Google"
That's a very good suggestion IMO.
Andrew A
**** andrewa @ alder . ws http://www.zeta.org.au/~andrewa Phone 9441 4476 Mobile 04 2525 4476 ****
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
Perhaps internal realtime searches should be available only to registered members to keep load down? Also, there may not be a contradiction between using Google and having realtime updates. Perhaps you could provide Google.com with information directly by arrangement and inform them of changes? I'd be surprised, given the freedom their research people are reputed to have and the number of 'beta' projects they've had if they are unable to do this. Whether they're willing is another matter.
An XML service or similar would be useful to people running 'pedias off-line and synching with the main server at intervals. Running off-line would lower lag time for editing users and (back to the point) allow searches of the local copy without hitting the server.
OTOH, updates could be irrelevant to the person running the off-line copy and be wasting resources. I need to think about this some more. Index updates? On demand checking (a local but internetworked copy)? Hmmm...
Russell
Google is how people get _to_ Wikipedia and does a good job at it; as an internal navigation mechanism it's wholly unsatisfactory for contributors who need to be able to check the current state of things in detail.
-- brion vibber (brion @ pobox.com)
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
G'day Brion and the Group
Hmmm...
At 02:10 AM 4/12/03 -0800, Brion Vibber wrote:
On Dec 4, 2003, at 01:47, Andrew Alder wrote:
But the probability that Google indexes a particular version is roughly proportional to the time for which that version is the current version. Therefore, the version presented by Google is on average more stable than the "current" version. I tried to point this out, but I'm afraid I didn't do it very clearly.
When somebody clicks the link in the Google search results they see the *current* version, not Google's crawl-time cached copy (unless they happen to know what the Google cache is, how to use it, and prefer to do so instead of clicking through to the page itself, which is sure to be a vanishingly small proportion of visitors). The key words in the search are likely to be in the title itself or general description, and will probably be fairly stable across revisions.
No argument at all with any of this. Again, perhaps I haven't expressed myself very well.
Assuming also that our meta data or whatever their bots follow keeps the spider out of the history (a good assumption I hope), the link from the Google search results will point to the current version, regardless of which version was retrieved by the crawler. The only difference is in the searching that produces the result list. So, as far as which version is presented (=targeted) by the *link* is concerned, Google is realtime.
But surely this supports the idea that, for a *reader*, Google is *no worse* than an in-house search?
Google is how people get _to_ Wikipedia and does a good job at it; as an internal navigation mechanism it's wholly unsatisfactory for contributors who need to be able to check the current state of things in detail.
Again, agree. I've now been through this both on the list and on the Pump. My comments relate to *readers*, not *contributors*.
Contributors certainly need fully realtime tools, which means in-house. I said that before.
Andrew A
**** andrewa @ alder . ws http://www.zeta.org.au/~andrewa Phone 9441 4476 Mobile 04 2525 4476 ****
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
On Dec 4, 2003, at 16:36, Andrew Alder wrote:
Again, agree. I've now been through this both on the list and on the Pump. My comments relate to *readers*, not *contributors*.
Contributors certainly need fully realtime tools, which means in-house. I said that before.
So why are we talking about this? Google isn't going to vanish from the web because we get the internal search back online.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
So why are we talking about this? Google isn't going to vanish from the web because we get the internal search back online.
Obviously, Brion, you must have missed the last Cabal memo in the next step in world domination, bwah ha ha ha! ;-)
--Jimbo
Jimmy Wales wrote:
Brion Vibber wrote:
So why are we talking about this? Google isn't going to vanish from the web because we get the internal search back online.
Obviously, Brion, you must have missed the last Cabal memo in the next step in world domination, bwah ha ha ha! ;-)
On cabal-l@wikipedia.org ? :-)
Magnus
From: Magnus Manske
Jimmy Wales wrote:
Brion Vibber wrote:
So why are we talking about this? Google isn't going to vanish from
the
web because we get the internal search back online.
Obviously, Brion, you must have missed the last Cabal memo in the
next
step
in world domination, bwah ha ha ha! ;-)
On cabal-l@wikipedia.org ? :-)
You mean cabal-tinc-l@wikipedia.org.
G'day Brion and the Group
At 04:42 PM 4/12/03 -0800, Brion Vibber wrote:
On Dec 4, 2003, at 16:36, Andrew Alder wrote:
Again, agree. I've now been through this both on the list and on the Pump. My comments relate to *readers*, not *contributors*.
Contributors certainly need fully realtime tools, which means in-house. I said that before.
So why are we talking about this? Google isn't going to vanish from the web because we get the internal search back online.
We can drop it anytime you like.
I don't understand that last quip. I don't think anyone proposed anything like that.
In summary, questions that have been raised (with any useful answers I think are agreed) were:
* Has there been previous discussion or thought as to whether the in-house search (hence IHS) offers any advantages over Google? A: No.
* Are there any advantages to IHS? A: For contributors, Yes. A: For readers, No.
* Are there any disadvantages to IHS? A: Yes.
* Is it a case of one or the other? A: Not necessarily, there may be other options that offer the advantages of both.
It's just an attempt at some lateral thinking.
The discussion would be better on the Wiki than on the email list IMO, but apart from myself nobody was interested in talking about it on the Village Pump (where someone else raised it), while here we've had a discussion which I at least find quite a clutter in the mailbox. On the Wiki I can easily scroll up and down the discussion in a single screen, which I like, and refactor useful content with a few clicks. It's many times more time and trouble to do this on an email list. My personal productivity drops to somewhere between one-third and one-tenth I estimate, depending on the subject, which is annoying.
I guess you're finding the same problem.
I guess also that the easiest thing would be just to go back to what we had before performance problems led to disabling the internal search. I can't even remember what that was, so I'll have no complaints either way.
Anyway, that's what I think we've been talking about, and I think the "why" is obvious: To get the best possible Wikipedia.
Andrew A aka user:andrewa
**** andrewa @ alder . ws http://www.zeta.org.au/~andrewa Phone 9441 4476 Mobile 04 2525 4476 ****
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
On Thu, 2003-12-04 at 01:47, Andrew Alder wrote:
G'day Peter and the Group
At 07:39 AM 4/12/03 +0000, Peter Bartlett wrote:
. The not being realtime may be a plus or minus, this is the very
thing we tossed around a little in the Pump.
Certainly for Wikipedia contributors, realtime is best. But
perhaps not so for readers, who are after stable content.
As was pointed out on the pump, there is no reason to suppose that the pedia was any more "stable" when Google took its snapshot than it is at any other time..
True. No argument at all with this.
But the probability that Google indexes a particular version is roughly proportional to the time for which that version is the current version. Therefore, the version presented by Google is on average more stable than the "current" version. I tried to point this out, but I'm afraid I didn't do it very clearly.
No, it's not; it's exactly the same. Consider (for ease of exposition) an article in the middle of an edit war. Three-quarters of the time it has version "A" (the "stable" version); one-quarter of the time it has version "B" (the "unstable" version). Then, three-quarters of the time, when the Google spider grabs and indexes this article, it will get version "A". Three-quarters of the time, if somebody did a full-text search on the "current" wikipedia database, they would get version "A". Exactly the same.
There's also the effect that Brion mentioned: even if Google happened to index a "stable" version when the current version was "unstable", the person would (by default) still end up reading the current, unstable version.
Carl Witty
G'day Carl and the Group
At 10:06 AM 4/12/03 -0800, Carl Witty wrote:
On Thu, 2003-12-04 at 01:47, Andrew Alder wrote:
G'day Peter and the Group
At 07:39 AM 4/12/03 +0000, Peter Bartlett wrote:
. The not being realtime may be a plus or minus, this is the very
thing we tossed around a little in the Pump.
Certainly for Wikipedia contributors, realtime is best. But
perhaps not so for readers, who are after stable content.
As was pointed out on the pump, there is no reason to suppose that the pedia was any more "stable" when Google took its snapshot than it is at any other time..
True. No argument at all with this.
But the probability that Google indexes a particular version is roughly proportional to the time for which that version is the current version. Therefore, the version presented by Google is on average more stable than the "current" version. I tried to point this out, but I'm afraid I didn't do it very clearly.
No, it's not; it's exactly the same. Consider (for ease of exposition) an article in the middle of an edit war. Three-quarters of the time it has version "A" (the "stable" version); one-quarter of the time it has version "B" (the "unstable" version). Then, three-quarters of the time, when the Google spider grabs and indexes this article, it will get version "A". Three-quarters of the time, if somebody did a full-text search on the "current" wikipedia database, they would get version "A". Exactly the same.
There's also the effect that Brion mentioned: even if Google happened to index a "stable" version when the current version was "unstable", the person would (by default) still end up reading the current, unstable version.
I think you're right. I said my Math Stats were 20 years ago!
But, if you are right, it means that there's neither an advantage nor a disadvantage in the delay Google gives.
Andrew A
**** andrewa @ alder . ws http://www.zeta.org.au/~andrewa Phone 9441 4476 Mobile 04 2525 4476 ****
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.532 / Virus Database: 326 - Release Date: 27/10/03
wikitech-l@lists.wikimedia.org