It looks like google is pulling some information from wikipedia in a pseudo semantic way,
http://www.google.com/search?c2coff=1&q=hd+dvd+capacity&btnG=Search
from
http://en.wikipedia.org/wiki/HD_DVD
Am I just noticing this, or is it new?
Judson [[:en:User:Cohesion]]
On 1/3/07, cohesion cohesion@sleepyhead.org wrote:
It looks like google is pulling some information from wikipedia in a pseudo semantic way,
http://www.google.com/search?c2coff=1&q=hd+dvd+capacity&btnG=Search
from
http://en.wikipedia.org/wiki/HD_DVD
Am I just noticing this, or is it new?
It has been around at least a few months. They might have recently improved on it though.
Anthony
On 1/3/07, Anthony wikilegal@inbox.org wrote:
On 1/3/07, cohesion cohesion@sleepyhead.org wrote:
It looks like google is pulling some information from wikipedia in a pseudo semantic way,
http://www.google.com/search?c2coff=1&q=hd+dvd+capacity&btnG=Search
from
http://en.wikipedia.org/wiki/HD_DVD
Am I just noticing this, or is it new?
It has been around at least a few months. They might have recently improved on it though.
BTW, it's not limited to Wikipedia: http://www.google.com/search?hl=en&q=mars+radius&btnG=Search
Mars — Radius: 3,397 KM According to http://www.schoolsobservatory.org.uk/astro/textb/solsys/mars.htm
On 1/3/07, Thomas Dalton thomas.dalton@gmail.com wrote:
Am I just noticing this, or is it new?
It's been around for a while. Try asking google questions - "When was Celebrity X born?", "How tall is the Empire State Building?", that kind of thing.
Those examples don't work.
But it is interesting seeing the difference in the response of radius mars
and
radius of mars
Those examples don't work.
The first one does, I think. The second one doesn't... oh well. There are quite a few questions that google can parse, though.
But it is interesting seeing the difference in the response of radius mars
and
radius of mars
The first one uses google's semantic fact finding feature, the second one uses its calculator feature (which has lots of constants in it, including astronomical values).
On 1/3/07, cohesion cohesion@sleepyhead.org wrote:
It looks like google is pulling some information from wikipedia in a pseudo semantic way,
http://www.google.com/search?c2coff=1&q=hd+dvd+capacity&btnG=Search
from
http://en.wikipedia.org/wiki/HD_DVD
Am I just noticing this, or is it new?
Well, the feature itself is not new. It seems, however, that google is now automatically parsing info-box information, in this case: http://en.wikipedia.org/wiki/Template:Infobox_media
This way, http://www.google.com/search?sourceid=navclient-ff&ie=UTF-8&rls=GGGL... is working too, since http://en.wikipedia.org/wiki/Betamax contains
Betamax Media type: Video recording media Encoding: Magnetic tape Developed by: Sony Usage: Video storage
Try googling: Betamax Usage GD-ROM media type VHS encoding
and so on (http://en.wikipedia.org/wiki/Special:Whatlinkshere/Template:Infobox_media)
So in order to find more of these shortcuts, one should check the infoboxen on en.wp
Mathias
Mathias Schindler wrote:
On 1/3/07, cohesion cohesion@sleepyhead.org wrote:
It looks like google is pulling some information from wikipedia in a pseudo semantic way,
http://www.google.com/search?c2coff=1&q=hd+dvd+capacity&btnG=Search
from
http://en.wikipedia.org/wiki/HD_DVD
Am I just noticing this, or is it new?
Well, the feature itself is not new. It seems, however, that google is now automatically parsing info-box information, in this case: http://en.wikipedia.org/wiki/Template:Infobox_media
This way, http://www.google.com/search?sourceid=navclient-ff&ie=UTF-8&rls=GGGL... is working too, since http://en.wikipedia.org/wiki/Betamax contains
Betamax Media type: Video recording media Encoding: Magnetic tape Developed by: Sony Usage: Video storage
Try googling: Betamax Usage GD-ROM media type VHS encoding
and so on (http://en.wikipedia.org/wiki/Special:Whatlinkshere/Template:Infobox_media)
So in order to find more of these shortcuts, one should check the infoboxen on en.wp
Mathias
So Google has basically created a database of all Wikipedia's infoboxes, and is serving up entries from them as search results on its website? Does this count as mirroring our content? Are they in compliance with whatever rules apply to whatever it is they're doing?
Aside from that, something concerns me here. These snippets are displayed right at the top of search pages, above the search results, even when the Wikipedia article itself is nowhere near the top search result. In other words, anyone who manages to sneak the right value in at the critical moment when Google is re-indexing the page can achieve an effect similar to a [[Googlebomb]], but even more powerful. How long before people start craftily changing infobox labels and values in an attempt to abuse this?
*blocks self for WP:BEANS violation*
-Gurch
On 1/3/07, Gurch matthew.britton@btinternet.com wrote:
So Google has basically created a database of all Wikipedia's infoboxes, and is serving up entries from them as search results on its website? Does this count as mirroring our content? Are they in compliance with whatever rules apply to whatever it is they're doing?
Facts aren't copyrightable, and even to the extent any of this was found copyrightable it would almost surely be fair use.
Anthony
Anthony wrote:
On 1/3/07, Gurch matthew.britton@btinternet.com wrote:
So Google has basically created a database of all Wikipedia's infoboxes, and is serving up entries from them as search results on its website? Does this count as mirroring our content? Are they in compliance with whatever rules apply to whatever it is they're doing?
Facts aren't copyrightable, and even to the extent any of this was found copyrightable it would almost surely be fair use.
Anthony
I was thinking more about the restrictions we impose on mirror sites (including but not limited to the GFDL), but I accept your point that facts aren't copyrightable. Whether a collection of Wikipedia infoboxes can be equated with a collection of facts is another matter, though; if that was true, surely it could be argued that the rest of the article is also no more than a collection of facts, and hence not subject to copyright?
-Gurch
On 1/3/07, Gurch matthew.britton@btinternet.com wrote:
facts aren't copyrightable. Whether a collection of Wikipedia infoboxes can be equated with a collection of facts is another matter, though; if that was true, surely it could be argued that the rest of the article is also no more than a collection of facts, and hence not subject to copyright?
In case of doubt, only a judge can give you a preliminary answer as long as there is a superior court.
Mathias
On 1/3/07, Gurch matthew.britton@btinternet.com wrote:
I was thinking more about the restrictions we impose on mirror sites (including but not limited to the GFDL), but I accept your point that facts aren't copyrightable.
The GFDL is only enforcible to the extent of copyright law. Not sure what other restrictions you're talking about. Trademark law doesn't seem to apply here either, though.
Whether a collection of Wikipedia infoboxes can be equated with a collection of facts is another matter, though; if that was true, surely it could be argued that the rest of the article is also no more than a collection of facts, and hence not subject to copyright?
An encyclopedia article is more than just a collection of facts. Those facts are arranged in a particular manner, and expressed in a particular way. If you could somehow reduce the encyclopedia article to the raw facts, and then re-express them in your own words and in a random order, then you *might* be able to escape the copyright of the original. Of course even then, collections of facts are themselves copyrightable *as a collection*, to the extent that the selection of which facts to present is creative.
So an encyclopedia article is almost surely copyrightable, whereas a single fact from an encyclopedia article is almost surely not. In between, is where lawyers make their money.
All of this is completely US-centric, of course. I can't imagine a jurisdiction where what Google is doing would actually result in a copyright infringement lawsuit, but there are some pretty strange laws out there.
Anthony
On 1/3/07, Gurch matthew.britton@btinternet.com wrote:
So Google has basically created a database of all Wikipedia's infoboxes, and is serving up entries from them as search results on its website? Does this count as mirroring our content? Are they in compliance with whatever rules apply to whatever it is they're doing?
They are extracting mere facts, there is no copyright on facts. US law (AFAIK) does not know a copyright on databases as such (different in the EU). They are attributing the source along with a link on the wikipedia site. IMHO, they are compliant in both the legal and the moral part.
Aside from that, something concerns me here. These snippets are displayed right at the top of search pages, above the search results, even when the Wikipedia article itself is nowhere near the top search result. In other words, anyone who manages to sneak the right value in at the critical moment when Google is re-indexing the page can achieve an effect similar to a [[Googlebomb]], but even more powerful. How long before people start craftily changing infobox labels and values in an attempt to abuse this?
That would be pretty much pointless. In case you are worrying, you might want to help bring the "stable version" feature back to life...
Mathias Schindler wrote:
On 1/3/07, Gurch matthew.britton@btinternet.com wrote:
So Google has basically created a database of all Wikipedia's infoboxes, and is serving up entries from them as search results on its website? Does this count as mirroring our content? Are they in compliance with whatever rules apply to whatever it is they're doing?
They are extracting mere facts, there is no copyright on facts. US law (AFAIK) does not know a copyright on databases as such (different in the EU). They are attributing the source along with a link on the wikipedia site. IMHO, they are compliant in both the legal and the moral part.
Aside from that, something concerns me here. These snippets are displayed right at the top of search pages, above the search results, even when the Wikipedia article itself is nowhere near the top search result. In other words, anyone who manages to sneak the right value in at the critical moment when Google is re-indexing the page can achieve an effect similar to a [[Googlebomb]], but even more powerful. How long before people start craftily changing infobox labels and values in an attempt to abuse this?
That would be pretty much pointless. In case you are worrying, you might want to help bring the "stable version" feature back to life...
How would I go about doing that? Also, when you say "back to life", are you suggesting it is somehow "dead"? I was under the impression the devs were planning to implement stable versions properly as soon as they finished single login.
On 04/01/07, Gurch matthew.britton@btinternet.com wrote:
So Google has basically created a database of all Wikipedia's infoboxes, and is serving up entries from them as search results on its website? Does this count as mirroring our content? Are they in compliance with whatever rules apply to whatever it is they're doing?
Aside from that, something concerns me here. These snippets are displayed right at the top of search pages, above the search results, even when the Wikipedia article itself is nowhere near the top search result. In other words, anyone who manages to sneak the right value in at the critical moment when Google is re-indexing the page can achieve an effect similar to a [[Googlebomb]], but even more powerful. How long before people start craftily changing infobox labels and values in an attempt to abuse this?
*blocks self for WP:BEANS violation*
What is wrong with Google manipulating data, they do provide attribution for it and they are not creating new works, just pulling facts from old works as-is.
A googlebomb is possible still if you know how to do it and not get caught. And wikipedia has always known that google can index any version, even the vandalised ones, so its nothing new from that aspect.
Peter Ansell
On 03/01/07, cohesion cohesion@sleepyhead.org wrote:
It looks like google is pulling some information from wikipedia in a pseudo semantic way,
I don't see how this is faking semantics. They are taking semantically tagged data and answering questions with it. That is as close as AI will get for a long time to semantics. The "Semantic Mediawiki" does exactly this, although they do it in a more clunky, hard to use manner where you enter fact-relationships in-text.
Peter Ansell
On 1/3/07, Peter Ansell ansell.peter@gmail.com wrote:
On 03/01/07, cohesion cohesion@sleepyhead.org wrote:
It looks like google is pulling some information from wikipedia in a pseudo semantic way,
I don't see how this is faking semantics. They are taking semantically tagged data and answering questions with it.
Well, the data isn't really semantically tagged at all, they are just assuming things based on proximity.
Anyway, I don't think this is a bad thing at all, maybe if we knew more about what they are looking for we could help the index out. Helping people find information is what we're all about, right? :D
Judson [[:en:User:Cohesion]]
Well, the data isn't really semantically tagged at all, they are just assuming things based on proximity.
It's tagged using the semantics of the English language. The idea behind the "semantic web" is that we tag facts using a system easier to parse (for a computer) than English, but if google can parse English (in this limited context), then good for them.