Re: [Mediawiki-api] Search page content without download the page?

17 Aug 2015

On Sun, Aug 16, 2015 at 1:25 PM, ArtGiray . &lt;giraybal(a)gmail.com&gt; wrote:

...

 my question "how can i regex search on server side" (without download the
 page content)

 so my query should look like:

https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=…

As you see that doesn't work.  As
https://www.mediawiki.org/wiki/Help:CirrusSearch#insource: suggests, the
closest you can come is

https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=…

the prefix search narrows the search to pages starting with "Car" but
you'll get Carnivàle, Carbon dioxide, etc. too. I _think_ "Car" would be
the first page returned in the set of matching pages, otherwise you  might
have to handle continue  processing before conclusively determining the
page titled "Car" doesn't match the regex.

...
  this mean* i want to search only "Car" page
and is it contain regex
 (a+b+c)?* (so this search is *should work on server side*)
 if it's true then query return the title and status = found etc. then i
 will download "raw wikitext" with second query.

 Summary: I will search 1.000.000 page one by one and i don't want to
 download each page to my computer then search in my computer (its to lazy
 for bandwidth)

That's kinda weird. Perhaps if your regex only matches a small number of
pages you can search for it then intersect the set of pages returned with
your list of a million (!!?) pages.

-- 
=S Page  WMF Tech writer

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Mediawiki-api] Search page content without download the page?