Hi, For example a want to search "a+b+c" regex word in 1.000.000 page content. I can get all page content (this mean i download the page content to my pc) from api and search it.
But i don't want to download 1.000.000 pages to my pc (because it is very lazy search and too many bandwidth for me and for wikipedia), i just want search a word from the server side?
Is it possible to search page content from server side and if page contain my words then i get it's title?
i want like this query: https://en.wikipedia.org/w/api.php?action=query&type=iscontainword%C2%AE...
See https://www.mediawiki.org/wiki/Help:CirrusSearch#insource:
Il 14/08/2015 23:30, ArtGiray . ha scritto:
Hi, For example a want to search "a+b+c" regex word in 1.000.000 page content. I can get all page content (this mean i download the page content to my pc) from api and search it.
But i don't want to download 1.000.000 pages to my pc (because it is very lazy search and too many bandwidth for me and for wikipedia), i just want search a word from the server side?
Is it possible to search page content from server side and if page contain my words then i get it's title?
i want like this query: https://en.wikipedia.org/w/api.php?action=query&type=iscontainword%C2%AE...
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
i wrote my own java api it using https://en.wikipedia.org/w/api.php and isn't it possible a single query to current api.php to search with regex? (could you send a working link exist)
2015-08-15 0:35 GMT+03:00 Ricordisamoa ricordisamoa@openmailbox.org:
See https://www.mediawiki.org/wiki/Help:CirrusSearch#insource:
Il 14/08/2015 23:30, ArtGiray . ha scritto:
Hi, For example a want to search "a+b+c" regex word in 1.000.000 page content. I can get all page content (this mean i download the page content to my pc) from api and search it.
But i don't want to download 1.000.000 pages to my pc (because it is very lazy search and too many bandwidth for me and for wikipedia), i just want search a word from the server side?
Is it possible to search page content from server side and if page contain my words then i get it's title?
i want like this query: https://en.wikipedia.org/w/api.php?action=query&type=iscontainword%C2%AE...
Mediawiki-api mailing listMediawiki-api@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t... (very slow, use sparingly)
Il 14/08/2015 23:47, ArtGiray . ha scritto:
i wrote my own java api it using https://en.wikipedia.org/w/api.php and isn't it possible a single query to current api.php to search with regex? (could you send a working link exist)
2015-08-15 0:35 GMT+03:00 Ricordisamoa <ricordisamoa@openmailbox.org mailto:ricordisamoa@openmailbox.org>:
See https://www.mediawiki.org/wiki/Help:CirrusSearch#insource: Il 14/08/2015 23:30, ArtGiray . ha scritto:
Hi, For example a want to search "a+b+c" regex word in 1.000.000 page content. I can get all page content (this mean i download the page content to my pc) from api and search it. But i don't want to download 1.000.000 pages to my pc (because it is very lazy search and too many bandwidth for me and for wikipedia), i just want search a word from the server side? Is it possible to search page content from server side and if page contain my words then i get it's title? i want like this query: https://en.wikipedia.org/w/api.php?action=query&type=iscontainword®exword=a+b+c&limit=10 _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org <mailto:Mediawiki-api@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org <mailto:Mediawiki-api@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
why "insource" etc. special query words not showing in api doc....... Anyway thank you it's slow but perfect.
2015-08-15 0:59 GMT+03:00 Ricordisamoa ricordisamoa@openmailbox.org:
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t... (very slow, use sparingly)
Il 14/08/2015 23:47, ArtGiray . ha scritto:
i wrote my own java api it using https://en.wikipedia.org/w/api.php and isn't it possible a single query to current api.php to search with regex? (could you send a working link exist)
2015-08-15 0:35 GMT+03:00 Ricordisamoa ricordisamoa@openmailbox.org:
See https://www.mediawiki.org/wiki/Help:CirrusSearch#insource:
Il 14/08/2015 23:30, ArtGiray . ha scritto:
Hi, For example a want to search "a+b+c" regex word in 1.000.000 page content. I can get all page content (this mean i download the page content to my pc) from api and search it.
But i don't want to download 1.000.000 pages to my pc (because it is very lazy search and too many bandwidth for me and for wikipedia), i just want search a word from the server side?
Is it possible to search page content from server side and if page contain my words then i get it's title?
i want like this query: https://en.wikipedia.org/w/api.php?action=query&type=iscontainword%C2%AE...
Mediawiki-api mailing listMediawiki-api@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing listMediawiki-api@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bsearch "You can use the search string to invoke special search features, depending on what the wiki's search backend implements."
Il 15/08/2015 00:11, ArtGiray . ha scritto:
why "insource" etc. special query words not showing in api doc....... Anyway thank you it's slow but perfect.
2015-08-15 0:59 GMT+03:00 Ricordisamoa <ricordisamoa@openmailbox.org mailto:ricordisamoa@openmailbox.org>:
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=insource%3A%2Fa%2Bb%2Bc%2F (very slow, use sparingly) Il 14/08/2015 23:47, ArtGiray . ha scritto:
i wrote my own java api it using https://en.wikipedia.org/w/api.php and isn't it possible a single query to current api.php to search with regex? (could you send a working link exist) 2015-08-15 0:35 GMT+03:00 Ricordisamoa <ricordisamoa@openmailbox.org <mailto:ricordisamoa@openmailbox.org>>: See https://www.mediawiki.org/wiki/Help:CirrusSearch#insource: Il 14/08/2015 23:30, ArtGiray . ha scritto:
Hi, For example a want to search "a+b+c" regex word in 1.000.000 page content. I can get all page content (this mean i download the page content to my pc) from api and search it. But i don't want to download 1.000.000 pages to my pc (because it is very lazy search and too many bandwidth for me and for wikipedia), i just want search a word from the server side? Is it possible to search page content from server side and if page contain my words then i get it's title? i want like this query: https://en.wikipedia.org/w/api.php?action=query&type=iscontainword®exword=a+b+c&limit=10 _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org <mailto:Mediawiki-api@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org <mailto:Mediawiki-api@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org <mailto:Mediawiki-api@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org <mailto:Mediawiki-api@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
thank you but my last question is how can i search spesific title?
i can't add "&title=" parameter.
like: https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
so if title Car contain "a+b+c" then i want a success response?
2015-08-15 1:16 GMT+03:00 Ricordisamoa ricordisamoa@openmailbox.org:
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bsearch "You can use the search string to invoke special search features, depending on what the wiki's search backend implements."
Il 15/08/2015 00:11, ArtGiray . ha scritto:
why "insource" etc. special query words not showing in api doc....... Anyway thank you it's slow but perfect.
2015-08-15 0:59 GMT+03:00 Ricordisamoa ricordisamoa@openmailbox.org:
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t... (very slow, use sparingly)
Il 14/08/2015 23:47, ArtGiray . ha scritto:
i wrote my own java api it using https://en.wikipedia.org/w/api.php and isn't it possible a single query to current api.php to search with regex? (could you send a working link exist)
2015-08-15 0:35 GMT+03:00 Ricordisamoa ricordisamoa@openmailbox.org:
See https://www.mediawiki.org/wiki/Help:CirrusSearch#insource:
Il 14/08/2015 23:30, ArtGiray . ha scritto:
Hi, For example a want to search "a+b+c" regex word in 1.000.000 page content. I can get all page content (this mean i download the page content to my pc) from api and search it.
But i don't want to download 1.000.000 pages to my pc (because it is very lazy search and too many bandwidth for me and for wikipedia), i just want search a word from the server side?
Is it possible to search page content from server side and if page contain my words then i get it's title?
i want like this query: https://en.wikipedia.org/w/api.php?action=query&type=iscontainword%C2%AE...
Mediawiki-api mailing listMediawiki-api@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing listMediawiki-api@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing listMediawiki-api@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
On Fri, Aug 14, 2015 at 3:16 PM, Ricordisamoa < ricordisamoa@openmailbox.org> responded:
Il 15/08/2015 00:11, ArtGiray . ha scritto:
why "insource" etc. special query words not showing in api doc....... Anyway thank you it's slow but perfect.
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bsearch "You can use the search string to invoke special search features, depending on what the wiki's search backend implements."
Yes. The generated API search documentation only knows about the srsearch parameter; what you can do wihin it depends on the search backend. https://www.mediawiki.org/wiki/API:Search_and_discovery also mentions this and presents various options.
my last question is how can i search spesific title?
i can't add "&title=" parameter.
like: https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
so if title Car contain "a+b+c" then i want a success response?
If you want to find pages whose titles contain "Car", see "intitle" in https://www.mediawiki.org/wiki/Help:CirrusSearch. You can combine intitle: and insource:, thus intitle:Car insource:/a+b+c/. (Be careful escaping the space and the '+' symbols.)
If you know the exact page title and only want to search that, then ask that title for its content and do your own pattern match. You need to figure out whether you want to match in the raw wikitext, or with expanded templates, or the resulting HTML. https://www.mediawiki.org/wiki/API:Parsing_wikitext has some guidance. (There's the new https://www.mediawiki.org/wiki/RESTBase API if you want to match in the generated page HTML and can do some light DOM parsing.)
First thanks for your informations.
"raw wikitext" mean: client downloads the page content then search.
But my main question is not "how can i search regex in wikitext (client side)" my question "how can i regex search on server side" (without download the page content)
so my query should look like:
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
this mean* i want to search only "Car" page and is it contain regex (a+b+c)?* (so this search is *should work on server side*) if it's true then query return the title and status = found etc. then i will download "raw wikitext" with second query.
Summary: I will search 1.000.000 page one by one and i don't want to download each page to my computer then search in my computer (its to lazy for bandwidth)
I want search my "regex word" from server side (this mean without download the page content) then if it result return true then i want download the page content.
(sorry for poor english, and mistakes)
2015-08-15 5:10 GMT+03:00 S Page spage@wikimedia.org:
On Fri, Aug 14, 2015 at 3:16 PM, Ricordisamoa <
ricordisamoa@openmailbox.org> responded:
Il 15/08/2015 00:11, ArtGiray . ha scritto:
why "insource" etc. special query words not showing in api doc....... Anyway thank you it's slow but perfect.
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bsearch "You can use the search string to invoke special search features, depending on what the wiki's search backend implements."
Yes. The generated API search documentation only knows about the srsearch parameter; what you can do wihin it depends on the search backend. https://www.mediawiki.org/wiki/API:Search_and_discovery also mentions this and presents various options.
my last question is how can i search spesific title?
i can't add "&title=" parameter.
like: https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
so if title Car contain "a+b+c" then i want a success response?
If you want to find pages whose titles contain "Car", see "intitle" in https://www.mediawiki.org/wiki/Help:CirrusSearch. You can combine intitle: and insource:, thus intitle:Car insource:/a+b+c/. (Be careful escaping the space and the '+' symbols.)
If you know the exact page title and only want to search that, then ask that title for its content and do your own pattern match. You need to figure out whether you want to match in the raw wikitext, or with expanded templates, or the resulting HTML. https://www.mediawiki.org/wiki/API:Parsing_wikitext has some guidance. (There's the new https://www.mediawiki.org/wiki/RESTBase API if you want to match in the generated page HTML and can do some light DOM parsing.)
-- =S Page WMF Tech writer
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
On Sun, Aug 16, 2015 at 1:25 PM, ArtGiray . giraybal@gmail.com wrote:
my question "how can i regex search on server side" (without download the page content)
so my query should look like:
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
As you see that doesn't work. As https://www.mediawiki.org/wiki/Help:CirrusSearch#insource: suggests, the closest you can come is
https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=t...
the prefix search narrows the search to pages starting with "Car" but you'll get Carnivàle, Carbon dioxide, etc. too. I _think_ "Car" would be the first page returned in the set of matching pages, otherwise you might have to handle continue processing before conclusively determining the page titled "Car" doesn't match the regex.
this mean* i want to search only "Car" page and is it contain regex (a+b+c)?* (so this search is *should work on server side*) if it's true then query return the title and status = found etc. then i will download "raw wikitext" with second query.
Summary: I will search 1.000.000 page one by one and i don't want to download each page to my computer then search in my computer (its to lazy for bandwidth)
That's kinda weird. Perhaps if your regex only matches a small number of pages you can search for it then intersect the set of pages returned with your list of a million (!!?) pages.
mediawiki-api@lists.wikimedia.org