On Sun, Aug 16, 2015 at 1:25 PM, ArtGiray . giraybal@gmail.com wrote:
my question "how can i regex search on server side" (without download the page content)
so my query should look like:
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
As you see that doesn't work. As https://www.mediawiki.org/wiki/Help:CirrusSearch#insource: suggests, the closest you can come is
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
the prefix search narrows the search to pages starting with "Car" but you'll get Carnivàle, Carbon dioxide, etc. too. I _think_ "Car" would be the first page returned in the set of matching pages, otherwise you might have to handle continue processing before conclusively determining the page titled "Car" doesn't match the regex.
this mean* i want to search only "Car" page and is it contain regex (a+b+c)?* (so this search is *should work on server side*) if it's true then query return the title and status = found etc. then i will download "raw wikitext" with second query.
Summary: I will search 1.000.000 page one by one and i don't want to download each page to my computer then search in my computer (its to lazy for bandwidth)
That's kinda weird. Perhaps if your regex only matches a small number of pages you can search for it then intersect the set of pages returned with your list of a million (!!?) pages.