I agree that this sounds like an interesting experiment. I hope that you get good faith
editors. I worry that you’ll get COI editors playing with the search rankings. Do you have
a way in mind to deal with that issue?
Pine
From: emijrp
Sent: Monday, 22 October, 2012 08:29
To: Research into Wikimedia content and communities
Subject: [Wiki-research-l] A wiki search engine
Hi all;
I'm starting a new project, a wiki search engine. It uses MediaWiki, Semantic
MediaWiki and other minor extensions, and some tricky templates and bots.
I remember Wikia Search and how it failed. It had the mini-article thingy for the
introduction, and then a lot of links compiled by a crawler. Also something similar to a
social network.
My project idea (which still needs a cool name) is different. Althought it uses an
introduction and images copied from Wikipedia, and some links from the "External
links" sections, it is only a start. The purpose is that community adds, removes and
orders the results for each term, and creates redirects for similar terms to avoid
duplicates.
Why this? I think that Google PageRank isn't enough. It is frequently abused by
farmlinks, SEOs and other people trying to put their websites above.
Search "Shakira" in Google for example. You see 1) Official site, 2) Wikipedia
3) Twitter 4) Facebook, then some videos, some news, some images, Myspace. It wastes 3 or
more results in obvious nice sites (WP, TW, FB). The wiki search engine puts these sites
in the top, and an introduction and related terms, leaving all the space below to not so
obvious but interesting websites. Also, if you search for "semantic queries"
like "right-wing newspapers" in Google, you won't find real newspapers but
"people and sites discussing about ring-wing newspapers". Or latex and LaTeX
being shown in the same results pages. These issues can be resolved with disambiguation
result pages.
How we choose which results are above or below? The rules are not fully designed yet, but
we can put official sites in the first place, then .gov or .edu domains which are
important ones, and later unofficial websites, blogs, giving priority to local language,
etc. And reaching consensus.
We can control aggresive spam with spam blacklists, semi-protect or protect highly visible
pages, and use bots or tools to check changes.
It obviously has a CC BY-SA license and results can be exported. I think that this
approach is the opposite to Google today.
For weird queries like "Albert Einstein birthplace" we can redirect to the most
obvious results page (in this case Albert Einstein) using a hand-made redirect or by
software (some little change in MediaWiki).
You can check a pretty alpha version here
http://www.todogratix.es (only Spanish by now
sorry) which I'm feeding with some bots.
I think that it is an interesting experiment. I'm open to your questions and
feedback.
Regards,
emijrp
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam
Personal website:
https://sites.google.com/site/emijrp/
--------------------------------------------------------------------------------
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l