The "Google test" used to be a tool for checking the notability of a subject or to find sources about it. For some languages it may be also used for other purposes - for example in Hebrew, the spelling of which is not established so well, it is very frequently used for finding the most common spelling, especially for article titles. It was never the ultimate tool, of course, but it was useful. With the proliferation of sites that indiscriminately copy Wikipedia content it is becoming less and less useful.
For some time i used to fight this problem by adding "-site:wikipedia.org-site: wapedia.mobi -site:miniwiki.org" etc. to my search queries, but i hit a wall: Google limits the search string to 32 words, and today there are many more than 32 sites that clone Wikipedia, so this trick is also becoming useless.
I know that some Wikipedias customized Special:Search, adding other search engines except Wikipedias built-in one. I tried to see whether any Wikipedia added an ability to search using Google (or Bing, or Yahoo, or any other search engine) excluding Wikipedia clones. Does anyone know whether it's possible to build such a thing? And maybe it already exists and i didn't search well enough?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore
On 12/08/2010 12:46 PM, Amir E. Aharoni wrote:
The "Google test" used to be a tool for checking the notability of a subject or to find sources about it. For some languages it may be also used for other purposes - for example in Hebrew, the spelling of which is not established so well, it is very frequently used for finding the most common spelling, especially for article titles. It was never the ultimate tool, of course, but it was useful. With the proliferation of sites that indiscriminately copy Wikipedia content it is becoming less and less useful.
For some time i used to fight this problem by adding "-site:wikipedia.org-site: wapedia.mobi -site:miniwiki.org" etc. to my search queries, but i hit a wall: Google limits the search string to 32 words, and today there are many more than 32 sites that clone Wikipedia, so this trick is also becoming useless.
You may try "-wikipedia -ויקיפדיה" to narrow it down further, but I don't think there is any full solution.
On Wed, Dec 8, 2010 at 10:46 PM, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
For some time i used to fight this problem by adding "-site:wikipedia.org-site: wapedia.mobi -site:miniwiki.org" etc. to my search queries, but i hit a wall: Google limits the search string to 32 words, and today there are many more than 32 sites that clone Wikipedia, so this trick is also becoming useless.
If you have Firefox there's an addon that will let you filter out mirrors (among other things). See:
http://meta.wikimedia.org/wiki/Mirror_filter
If the copyright license has been followed -wikipedia should exclude all clones. However, often, material is copied without crediting it to Wikipedia.
Fred
User:Fred Bauder
The "Google test" used to be a tool for checking the notability of a subject or to find sources about it. For some languages it may be also used for other purposes - for example in Hebrew, the spelling of which is not established so well, it is very frequently used for finding the most common spelling, especially for article titles. It was never the ultimate tool, of course, but it was useful. With the proliferation of sites that indiscriminately copy Wikipedia content it is becoming less and less useful.
For some time i used to fight this problem by adding "-site:wikipedia.org-site: wapedia.mobi -site:miniwiki.org" etc. to my search queries, but i hit a wall: Google limits the search string to 32 words, and today there are many more than 32 sites that clone Wikipedia, so this trick is also becoming useless.
I know that some Wikipedias customized Special:Search, adding other search engines except Wikipedias built-in one. I tried to see whether any Wikipedia added an ability to search using Google (or Bing, or Yahoo, or any other search engine) excluding Wikipedia clones. Does anyone know whether it's possible to build such a thing? And maybe it already exists and i didn't search well enough?
-- Amir Elisha Aharoni · ×Ö¸×Ö´×ר ×Ö±×Ö´×ש×ָע ×Ö·×ֲר×Ö¹× Ö´× http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Wed, Dec 8, 2010 at 15:42, Fred Bauder fredbaud@fairpoint.net wrote:
If the copyright license has been followed -wikipedia should exclude all clones. However, often, material is copied without crediting it to Wikipedia.
Yes, but that may also exclude sites that are useful and original, but happen to mention Wikipedia.
On 8 December 2010 15:26, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Yes, but that may also exclude sites that are useful and original, but happen to mention Wikipedia.
Add -"quoted sentence from article intro" to the search?
- d.
Sounds like we need to have a notable search engine that includes only "approved and allowed" sources, that would be nice to have.
On Wed, Dec 8, 2010 at 5:08 PM, David Gerard dgerard@gmail.com wrote:
On 8 December 2010 15:26, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Yes, but that may also exclude sites that are useful and original, but happen to mention Wikipedia.
Add -"quoted sentence from article intro" to the search?
- d.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Thu, Dec 9, 2010 at 9:55 AM, Domas Mituzas midom.lists@gmail.com wrote:
On Dec 8, 2010, at 6:21 PM, Mike Dupont wrote:
Sounds like we need to have a notable search engine that includes only "approved and allowed" sources, that would be nice to have.
Sounds like a great community project, Wiki Search!
yes it would be great. As i said, it could just include all pages listed as REF pages and that would allow people to review the results and find pages that should not belong.
We also need to cache all these pages, best would be with a revision history. It should be similar to or using archive.org.
The searching could also use lucene or some other project. It does not have to be google.
On this note, I would really like to see a wordindex for openstreetmap as well, there is a huge amount of information that could be relevant in osm that should be easier to use in WP.
mike
On Thu, Dec 9, 2010 at 9:55 AM, Domas Mituzas midom.lists@gmail.com wrote:
On Dec 8, 2010, at 6:21 PM, Mike Dupont wrote:
Sounds like we need to have a notable search engine that includes only "approved and allowed" sources, that would be nice to have.
Sounds like a great community project, Wiki Search!
yes it would be great. As i said, it could just include all pages listed as REF pages and that would allow people to review the results and find pages that should not belong.
We also need to cache all these pages, best would be with a revision history. It should be similar to or using archive.org.
The searching could also use lucene or some other project. It does not have to be google.
On this note, I would really like to see a wordindex for openstreetmap as well, there is a huge amount of information that could be relevant in osm that should be easier to use in WP.
mike
Openstreetmap is a wiki still in the "Wild West" phase. Words cannot express the nonsense it hosts.
Fred
User:Fred Bauder
On Thu, Dec 9, 2010 at 12:52 PM, Fred Bauder fredbaud@fairpoint.net wrote:
On Thu, Dec 9, 2010 at 9:55 AM, Domas Mituzas midom.lists@gmail.com wrote:
On Dec 8, 2010, at 6:21 PM, Mike Dupont wrote:
Sounds like we need to have a notable search engine that includes only "approved and allowed" sources, that would be nice to have.
Sounds like a great community project, Wiki Search!
yes it would be great. As i said, it could just include all pages listed as REF pages and that would allow people to review the results and find pages that should not belong.
We also need to cache all these pages, best would be with a revision history. It should be similar to or using archive.org.
The searching could also use lucene or some other project. It does not have to be google.
On this note, I would really like to see a wordindex for openstreetmap as well, there is a huge amount of information that could be relevant in osm that should be easier to use in WP.
mike
Openstreetmap is a wiki still in the "Wild West" phase. Words cannot express the nonsense it hosts.
If you are looking for a place named "X" or a location for some article then it would be nice to have a better search engine of that content. Wikipedia can help. Of course the WP articles are of a higher standard than alot of OSM data, but there is a greater coverage. There are alot of articles with no coords that could be fixed or assisted by editor having a faster and better index to the OSM data, no doubt. mike
On 8 December 2010 11:46, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
For some time i used to fight this problem by adding "-site:wikipedia.org-site: wapedia.mobi -site:miniwiki.org" etc. to my search queries, but i hit a wall: Google limits the search string to 32 words, and today there are many more than 32 sites that clone Wikipedia, so this trick is also becoming useless.
As noted above you can use -wikipedia; alternately, keywords common on mirrors, such as -mediawiki, -gfdl could be worth trying.
On Wednesday 08 December 2010 05:16 PM, Amir E. Aharoni wrote:
I know that some Wikipedias customized Special:Search, adding other search engines except Wikipedias built-in one. I tried to see whether any Wikipedia added an ability to search using Google (or Bing, or Yahoo, or any other search engine) excluding Wikipedia clones. Does anyone know whether it's possible to build such a thing? And maybe it already exists and i didn't search well enough?
http://ml.wikipedia.org/w/index.php?title=Special%3ASearch
not excluding other sites, but only including results from ml.wikipedia.org using site:ml.wikipedia.org in query
I thought about this more, It would be to extract a list of all pages that are included as <ref> in the WP. We would use this for the search engine. we should also make sure that all referenced pages (not linked ones) are stored in archive.org or someplace permanent. I wonder if there is some API to extract this list easily? mike
On Wed, Dec 8, 2010 at 6:49 PM, praveenp me.praveen@gmail.com wrote:
On Wednesday 08 December 2010 05:16 PM, Amir E. Aharoni wrote:
I know that some Wikipedias customized Special:Search, adding other search engines except Wikipedias built-in one. I tried to see whether any Wikipedia added an ability to search using Google (or Bing, or Yahoo, or any other search engine) excluding Wikipedia clones. Does anyone know whether it's possible to build such a thing? And maybe it already exists and i didn't search well enough?
http://ml.wikipedia.org/w/index.php?title=Special%3ASearch
not excluding other sites, but only including results from ml.wikipedia.org using site:ml.wikipedia.org in query
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Bonjour
Could you change the url for wikiwix, just remove "lang=fr", since currently the search results are french and not ml as expected.
Cordialement Pascal Martin 06 13 89 77 32 02 32 40 23 69
----- Original Message ----- From: "Mike Dupont" jamesmikedupont@googlemail.com To: "Wikimedia Foundation Mailing List" foundation-l@lists.wikimedia.org Sent: Wednesday, December 08, 2010 7:58 PM Subject: Re: [Foundation-l] excluding Wikipedia clones from searching
I thought about this more, It would be to extract a list of all pages that are included as <ref> in the WP. We would use this for the search engine. we should also make sure that all referenced pages (not linked ones) are stored in archive.org or someplace permanent. I wonder if there is some API to extract this list easily? mike
On Wed, Dec 8, 2010 at 6:49 PM, praveenp me.praveen@gmail.com wrote:
On Wednesday 08 December 2010 05:16 PM, Amir E. Aharoni wrote:
I know that some Wikipedias customized Special:Search, adding other search engines except Wikipedias built-in one. I tried to see whether any Wikipedia added an ability to search using Google (or Bing, or Yahoo, or any other search engine) excluding Wikipedia clones. Does anyone know whether it's possible to build such a thing? And maybe it already exists and i didn't search well enough?
http://ml.wikipedia.org/w/index.php?title=Special%3ASearch
not excluding other sites, but only including results from ml.wikipedia.org using site:ml.wikipedia.org in query
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-- James Michael DuPont Member of Free Libre Open Source Software Kosova and Albania flossk.org flossal.org
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
wikimedia-l@lists.wikimedia.org