Re: [Wiki-research-l] A wiki search engine

4 Aug 2013


      Hi all again;
After some months, we have the domain for LibreFind[1] and some usable
results[2][3] (the bot is running). Also, there is a mailing list[4] and a
Google Code project[5].
I would like you can join the brainstorm. We need to establish some
policies about how to sort results, bots to check dead links, crawlers to
improve the results, and many more. You can request an account for the
closed beta.
Thanks for your time,
emijrp
[1] http://www.librefind.org
[2] http://www.librefind.org/wiki/Spain
[3] http://www.librefind.org/wiki/Edgar_Allan_Poe
[4] http://groups.google.com/group/librefind
[5] https://code.google.com/p/librefind/
2012/10/27 emijrp emijrp@gmail.com
...
After some tests and usability improvements, I'm going to launch an
English alpha version.
I still need a cool name for the project, any idea?
Stay tunned.
2012/10/23 emijrp emijrp@gmail.com
...
Yes, there are some options: (semi)protections, blocks, spam black lists,
flaggedrevs, abuse filter and some more. All them are well known MediaWiki
features and extensions.
Thanks for your interest.
2012/10/23 ENWP Pine deyntestiss@hotmail.com
...
I agree that this sounds like an interesting experiment. I hope that you
get good faith editors. I worry that you’ll get COI editors playing with
the search rankings. Do you have a way in mind to deal with that issue?
Pine
*From:* emijrp emijrp@gmail.com
*Sent:* Monday, 22 October, 2012 08:29
*To:* Research into Wikimedia content and communitieswiki-research-l@lists.wikimedia.org
*Subject:* [Wiki-research-l] A wiki search engine
Hi all;
I'm starting a new project, a wiki search engine. It uses MediaWiki,
Semantic MediaWiki and other minor extensions, and some tricky templates
and bots.
I remember Wikia Search and how it failed. It had the mini-article
thingy for the introduction, and then a lot of links compiled by a crawler.
Also something similar to a social network.
My project idea (which still needs a cool name) is different. Althought
it uses an introduction and images copied from Wikipedia, and some links
from the "External links" sections, it is only a start. The purpose is that
community adds, removes and orders the results for each term, and creates
redirects for similar terms to avoid duplicates.
Why this? I think that Google PageRank isn't enough. It is frequently
abused by farmlinks, SEOs and other people trying to put their websites
above.
Search "Shakira" in Google for example. You see 1) Official site, 2)
Wikipedia 3) Twitter 4) Facebook, then some videos, some news, some images,
Myspace. It wastes 3 or more results in obvious nice sites (WP, TW, FB).
The wiki search engine puts these sites in the top, and an introduction and
related terms, leaving all the space below to not so obvious but
interesting websites. Also, if you search for "semantic queries" like
"right-wing newspapers" in Google, you won't find real newspapers but
"people and sites discussing about ring-wing newspapers". Or latex and
LaTeX being shown in the same results pages. These issues can be resolved
with disambiguation result pages.
How we choose which results are above or below? The rules are not fully
designed yet, but we can put official sites in the first place, then .gov
or .edu domains which are important ones, and later unofficial websites,
blogs, giving priority to local language, etc. And reaching consensus.
We can control aggresive spam with spam blacklists, semi-protect or
protect highly visible pages, and use bots or tools to check changes.
It obviously has a CC BY-SA license and results can be exported. I think
that this approach is the opposite to Google today.
For weird queries like "Albert Einstein birthplace" we can redirect to
the most obvious results page (in this case Albert Einstein) using a
hand-made redirect or by software (some little change in MediaWiki).
You can check a pretty alpha version here http://www.todogratix.es(only Spanish by now sorry) which I'm feeding with some bots.
I think that it is an interesting experiment. I'm open to your questions
and feedback.
Regards,
emijrp
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es
| WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com
| WikiTeam http://code.google.com/p/wikiteam/
Personal website: https://sites.google.com/site/emijrp/


Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es
| WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com
| WikiTeam http://code.google.com/p/wikiteam/
Personal website: https://sites.google.com/site/emijrp/
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es
| WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com
| WikiTeam http://code.google.com/p/wikiteam/
Personal website: https://sites.google.com/site/emijrp/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] A wiki search engine