Give me a few minutes I can get you a database dump of
what you need.
On Sat, May 6, 2017 at 5:25 PM, Abdulfattah Safa <fattah.safa(a)gmail.com>
wrote:
1. I'm usng max as a limit parameter
2. I'm not sure if the dumps have the data I need. I need to get the
titles
for all Articles (name space = 0), with no redirects and also need the
titles of all Categories (namespace = 14) without redirects
On Sat, May 6, 2017 at 11:39 PM Eran Rosenthal <eranroz89(a)gmail.com>
wrote:
1. You can use limit parameter to get more titles
in each request
2. For getting many entries it is recommended to extract from dumps or
from
database using quarry
On May 6, 2017 22:36, "Abdulfattah Safa" <fattah.safa(a)gmail.com> wrote:
> for the & in $Continue=-||, it's a type. It doesn't exist in the code.
>
> On Sat, May 6, 2017 at 10:12 PM Abdulfattah Safa <
fattah.safa(a)gmail.com>
> wrote:
>
> > I'm trying to get all the page titles in Wikipedia in namespace
using
the
> API as following:
>
>
https://en.wikipedia.org/w/api.php?action=query&format=
xml&list=allpages&apnamespace=0&apfilterredir=nonredirects&
aplimit=max&$continue=-||$apcontinue=BASE_PAGE_TITLE
>
> I keep requesting this url and checking the response if contains
continue
> tag. if yes, then I use same request but
change the *BASE_PAGE_TITLE
*to
> > the value in apcontinue attribute in the response.
> > My applications had been running since 3 days and number of
retrieved
>
exceeds 30M, whereas it is about 13M in the dumps.
> any idea?
>
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l