Re: [Mediawiki-l] mediawiki google like search and crawler access based on ip address

12 Oct 2007

Also, lucene is just a search library.

Internally, I use the lucene search library for indexing the crawled
contents.

A nice thing about crawled content vs. indexing the mediawiki database dump.
There are lot of useful aspects when you look at the html and its link
structure. But otherwise, a database dump won't give that. Those useful html
content could be a better source for search relevancy.

Cheers,

Jian

On 10/11/07, Emufarmers Sangly &lt;emufarmers(a)gmail.com&gt; wrote:
...

 On 10/11/07, jian chen &lt;chenjian1227(a)gmail.com&gt; wrote:

  The search engine is built using java and has 3
components. Crawler,
 Indexer
 and Searcher.

 Right now I have a question for the community. We have a requirement to
 lock
 down the wiki so that only logged in users could see the wiki content.
 But,
 in order for the crawler to download the content, I need to find a way  to
  enable the access to the crawler based on ip
address.

 Does mediawiki support such a feature to turn on/off access based on ip
 address? 
 The Python Wikipediabot (http://meta.wikimedia.org/wiki/Pywikipedia) can
 log
 in as a user; I'm sure you can add its login functionality.  But I'm not
 sure whether it's such a good idea to crawl a live wiki: The Lucene search
 engine (http://www.mediawiki.org/wiki/Lucene) forms an index of a database
 dump instead.  Is there some reason that you need the crawling
 functionality?

 --
 Arr, ye emus, http://emufarmers.com
 _______________________________________________
 MediaWiki-l mailing list
 MediaWiki-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] mediawiki google like search and crawler access based on ip address