On Sun, Aug 16, 2015 at 1:25 PM, ArtGiray . <giraybal(a)gmail.com> wrote:
my question "how can i regex search on server side" (without download the
page content)
so my query should look like:
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=…
As you see that doesn't work. As
https://www.mediawiki.org/wiki/Help:CirrusSearch#insource: suggests, the
closest you can come is
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=…
the prefix search narrows the search to pages starting with "Car" but
you'll get Carnivàle, Carbon dioxide, etc. too. I _think_ "Car" would be
the first page returned in the set of matching pages, otherwise you might
have to handle continue processing before conclusively determining the
page titled "Car" doesn't match the regex.
this mean* i want to search only "Car" page
and is it contain regex
(a+b+c)?* (so this search is *should work on server side*)
if it's true then query return the title and status = found etc. then i
will download "raw wikitext" with second query.
Summary: I will search 1.000.000 page one by one and i don't want to
download each page to my computer then search in my computer (its to lazy
for bandwidth)
That's kinda weird. Perhaps if your regex only matches a small number of
pages you can search for it then intersect the set of pages returned with
your list of a million (!!?) pages.
--
=S Page WMF Tech writer