I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.
The entire site is simply a bad scraping of the wikipedia, with all attributions removed.
Maury
On Tue, 07 Jun 2005 11:05:05 -0400, Maury Markowitz wrote:
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.
They are a known annoyance [1] and under investigation at some WP page that I for some reason can't get access to right now.
Roger
[1] We had several WP pages mistakenly reported as copyvios based on biography.ms.
Roger Luethi wrote:
On Tue, 07 Jun 2005 11:05:05 -0400, Maury Markowitz wrote:
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.
They are a known annoyance [1] and under investigation at some WP page that I for some reason can't get access to right now.
Roger
[1] We had several WP pages mistakenly reported as copyvios based on biography.ms.
Cache of the discussion page here:
Strangely, when viewing their site I sometimes find notice: "The Wikipedia http://www.wikipedia.org content included on this page is licensed under the GFDL http://david-kahn.biography.ms/redirect.asp?direction=http://www.gnu.org/copyleft/fdl.html" at the bottom of the page and sometimes not (on the same page). Sometimes it appears after refreshing the page several times.
Roger Luethi wrote:
Strangely, when viewing their site I sometimes find notice: "The Wikipedia http://www.wikipedia.org content included on this page is licensed under the GFDL http://david-kahn.biography.ms/redirect.asp?direction=http://www.gnu.org/copyleft/fdl.html" at the bottom of the page and sometimes not (on the same page). Sometimes it appears after refreshing the page several times. _______________________________________________
I get the same thing, one or two refreshes and then the Wikipedia credit shows up. Clear cookies and cache and return to the page and the credit is gone.
Fibonacci sent them a non-compliance letter about a month ago (in Spanish). I don't know if the 'credit on refresh' is a recent enhancement, or if it's been there all along.
I've cleared 4 - 6 Wikipedia pages in the last couple of months that were mistakenly tagged as copyright violations, because biography.ms wasn't crediting us.
Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.
It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...
I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.
Ant
PS : I must not forget to add the pdf link to the article so I can find it next time.
Maury Markowitz a écrit:
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.
The entire site is simply a bad scraping of the wikipedia, with all attributions removed.
Maury
I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.
It seems this is solvable though, if Google could simply read the attribution links. Of course in this particular case it wouldn't work, because biography.ms has scraped them off, but in general it should world well.
I'll write them.
Maury
On 6/7/05, Maury Markowitz maury_markowitz@hotmail.com wrote:
It seems this is solvable though, if Google could simply read the attribution links. Of course in this particular case it wouldn't work, because biography.ms has scraped them off, but in general it should world well.
I'll write them.
And give sites another incentive to remove the attribution links? ... These sites mirror wikipedia to gain traffic, ... not out of the goodness of their heart. :)
I notice a recent change to the pages, including a boilerplate link to the wikipedia as a whole. This means the site remains non-compliant. I will contact them again.
Maury
Hi Ant,
I actually encounter this problem a lot more.
This is presumably because more of the topics I search for have a Wikipedia article but not a lot of other sites about them, or other sites with low pagerank on Google.
Since of course there are so many verbatim copies of en.wikipedia, a search on "yonaguni language" on Google will leave you almost exclusively with copies of the same few pages on en:, although it may or may not include two separate pages on Wikitravel (brand new), and some pages from my blog.
Mark
On 07/06/05, Anthere anthere9@yahoo.com wrote:
Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.
It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...
I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.
Ant
PS : I must not forget to add the pdf link to the article so I can find it next time.
Maury Markowitz a écrit:
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.
The entire site is simply a bad scraping of the wikipedia, with all attributions removed.
Maury
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
Actually, I searched for it, and Wikipedia copies appear to be the *only* results (a search for the Japanese equivalent is more fruitful however).
Sadly, the actual en.wiki page titled "yonaguni language" is the penultimate result. It can be found on the 7th page.
Mark
On 08/06/05, Mark Williamson node.ue@gmail.com wrote:
Hi Ant,
I actually encounter this problem a lot more.
This is presumably because more of the topics I search for have a Wikipedia article but not a lot of other sites about them, or other sites with low pagerank on Google.
Since of course there are so many verbatim copies of en.wikipedia, a search on "yonaguni language" on Google will leave you almost exclusively with copies of the same few pages on en:, although it may or may not include two separate pages on Wikitravel (brand new), and some pages from my blog.
Mark
On 07/06/05, Anthere anthere9@yahoo.com wrote:
Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.
It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...
I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.
Ant
PS : I must not forget to add the pdf link to the article so I can find it next time.
Maury Markowitz a écrit:
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.
The entire site is simply a bad scraping of the wikipedia, with all attributions removed.
Maury
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE
On 6/7/05, Anthere anthere9@yahoo.com wrote:
Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.
It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...
I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.
Ant
Google search string;
"Donella Meadows" pdf -wikipedia
Puddl Duk a écrit:
On 6/7/05, Anthere anthere9@yahoo.com wrote:
Google search string;
"Donella Meadows" pdf -wikipedia
/me is astonished. I just did not *think* of adding the "pdf" in the search area. This is perfect. Thanks :-)
ant
On 09/06/05, Anthere anthere9@yahoo.com wrote:
/me is astonished. I just did not *think* of adding the "pdf" in the search area. This is perfect. Thanks :-)
Of course, if you want to go one step further, you can use "filetype:pdf" (which also eliminates any need to exclude Wikipedia and its mirrors since they don't host PDFs) :D
wikipedia-l@lists.wikimedia.org