www.biography.ms is illegally using Wikipedia content

Scheduled global downtime

Maury Markowitz

7 Jun 2005 7 Jun '05

9:05 p.m.

I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.

The entire site is simply a bad scraping of the wikipedia, with all attributions removed.

Maury

Show replies by date

Roger Luethi

7 Jun 7 Jun

9:25 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

On Tue, 07 Jun 2005 11:05:05 -0400, Maury Markowitz wrote:

...

I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.

They are a known annoyance [1] and under investigation at some WP page that I for some reason can't get access to right now.

Roger

[1] We had several WP pages mistakenly reported as copyvios based on biography.ms.

Andrew Venier

10:45 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

Roger Luethi wrote:

...

On Tue, 07 Jun 2005 11:05:05 -0400, Maury Markowitz wrote:

...
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.

They are a known annoyance [1] and under investigation at some WP page that I for some reason can't get access to right now.

Roger

[1] We had several WP pages mistakenly reported as copyvios based on biography.ms.

Cache of the discussion page here:

http://tinyurl.com/8hgrx

Strangely, when viewing their site I sometimes find notice: "The Wikipedia http://www.wikipedia.org content included on this page is licensed under the GFDL http://david-kahn.biography.ms/redirect.asp?direction=http://www.gnu.org/copyleft/fdl.html" at the bottom of the page and sometimes not (on the same page). Sometimes it appears after refreshing the page several times.

Puddl Duk

11:37 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

...

Roger Luethi wrote:

Strangely, when viewing their site I sometimes find notice: "The Wikipedia http://www.wikipedia.org content included on this page is licensed under the GFDL http://david-kahn.biography.ms/redirect.asp?direction=http://www.gnu.org/copyleft/fdl.html" at the bottom of the page and sometimes not (on the same page). Sometimes it appears after refreshing the page several times. _______________________________________________

I get the same thing, one or two refreshes and then the Wikipedia credit shows up. Clear cookies and cache and return to the page and the credit is gone.

Fibonacci sent them a non-compliance letter about a month ago (in Spanish). I don't know if the 'credit on refresh' is a recent enhancement, or if it's been there all along.

I've cleared 4 - 6 Wikipedia pages in the last couple of months that were mistakenly tagged as copyright violations, because biography.ms wasn't crediting us.

Anthere

11:32 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.

It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...

I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.

Ant

PS : I must not forget to add the pdf link to the article so I can find it next time.

Maury Markowitz a écrit:

...

I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.

The entire site is simply a bad scraping of the wikipedia, with all attributions removed.

Maury

Maury Markowitz

11:39 p.m.

New subject: www.biography.ms is illegally using Wikipediacontent

...

I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.

It seems this is solvable though, if Google could simply read the attribution links. Of course in this particular case it wouldn't work, because biography.ms has scraped them off, but in general it should world well.

I'll write them.

Maury

Gregory Maxwell

8 Jun 8 Jun

3:16 a.m.

New subject: www.biography.ms is illegally using Wikipediacontent

On 6/7/05, Maury Markowitz maury_markowitz@hotmail.com wrote:

...

It seems this is solvable though, if Google could simply read the attribution links. Of course in this particular case it wouldn't work, because biography.ms has scraped them off, but in general it should world well.

I'll write them.

And give sites another incentive to remove the attribution links? ... These sites mirror wikipedia to gain traffic, ... not out of the goodness of their heart. :)

Maury Markowitz

9 Jun 9 Jun

8:29 p.m.

New subject: www.biography.ms is illegally usingWikipediacontent

I notice a recent change to the pages, including a boilerplate link to the wikipedia as a whole. This means the site remains non-compliant. I will contact them again.

Maury

Mark Williamson

8 Jun 8 Jun

2:43 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

Hi Ant,

I actually encounter this problem a lot more.

This is presumably because more of the topics I search for have a Wikipedia article but not a lot of other sites about them, or other sites with low pagerank on Google.

Since of course there are so many verbatim copies of en.wikipedia, a search on "yonaguni language" on Google will leave you almost exclusively with copies of the same few pages on en:, although it may or may not include two separate pages on Wikitravel (brand new), and some pages from my blog.

Mark

On 07/06/05, Anthere anthere9@yahoo.com wrote:

...

Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.

It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...

I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.

Ant

PS : I must not forget to add the pdf link to the article so I can find it next time.

Maury Markowitz a écrit:

...
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.

The entire site is simply a bad scraping of the wikipedia, with all attributions removed.

Maury

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

Mark Williamson

2:47 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

Actually, I searched for it, and Wikipedia copies appear to be the *only* results (a search for the Japanese equivalent is more fruitful however).

Sadly, the actual en.wiki page titled "yonaguni language" is the penultimate result. It can be found on the 7th page.

Mark

On 08/06/05, Mark Williamson node.ue@gmail.com wrote:

...

Hi Ant,

I actually encounter this problem a lot more.

This is presumably because more of the topics I search for have a Wikipedia article but not a lot of other sites about them, or other sites with low pagerank on Google.

Since of course there are so many verbatim copies of en.wikipedia, a search on "yonaguni language" on Google will leave you almost exclusively with copies of the same few pages on en:, although it may or may not include two separate pages on Wikitravel (brand new), and some pages from my blog.

Mark

On 07/06/05, Anthere anthere9@yahoo.com wrote:

...
Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.

It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...

I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.

Ant

PS : I must not forget to add the pdf link to the article so I can find it next time.

Maury Markowitz a écrit:

...
I was doing a little googling this morning while the wiki was down, and came across a reference to a site, www.biography.ms.

The entire site is simply a bad scraping of the wikipedia, with all attributions removed.

Maury

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

Puddl Duk

9 Jun 9 Jun

11 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

On 6/7/05, Anthere anthere9@yahoo.com wrote:

...

Funny that you mention it. Yesterday, I was looking for an old pdf version of Donella Meadows work. Had a hard time finding it because all what was on the net about her was copies of the wikipedia articles (mine in good part, so this was not what I was looking for). One of the most proeminent copy was precisely this website.

It was also one of the first time I regretted articles existed in wikipedia... because I could not find the original paper I was looking for, because the web was so much infested by the wikipedia articles...

I suddently regretted Google was not able to distinguish that all the articles were the same, and hidding other articles which are unique, but simply less visible. I actually wonder if this is not a bit problematic. It looks as if the search system was not able to handle this properly any more.

Ant

Google search string;

"Donella Meadows" pdf -wikipedia

Anthere

11:18 p.m.

New subject: www.biography.ms is illegally using Wikipedia content

Puddl Duk a écrit:

...

On 6/7/05, Anthere anthere9@yahoo.com wrote:

...

Google search string;

"Donella Meadows" pdf -wikipedia

/me is astonished. I just did not *think* of adding the "pdf" in the search area. This is perfect. Thanks :-)

ant

Rowan Collins

10 Jun 10 Jun

4:59 a.m.

New subject: www.biography.ms is illegally using Wikipedia content

On 09/06/05, Anthere anthere9@yahoo.com wrote:

...

/me is astonished. I just did not *think* of adding the "pdf" in the search area. This is perfect. Thanks :-)

Of course, if you want to go one step further, you can use "filetype:pdf" (which also eliminates any need to exclude Wikipedia and its mirrors since they don't host PDFs) :D

-- Rowan Collins BSc [IMSoP]

7108

Age (days ago)

7110

Last active (days ago)

wikipedia-l@lists.wikimedia.org

12 comments

8 participants

tags (0)

participants (8)

Andrew Venier
Anthere
Gregory Maxwell
Mark Williamson
Maury Markowitz
Puddl Duk
Roger Luethi
Rowan Collins