those are official, I ran the report from toollabs which is Wikimedia's
developer platform which includes a copy of en.Wikipedia's database (with
sensitive fields removed). Without looking at your code and doing some
testing, which unfortunately I don't have the time for, I cannot help
debugging why your code isn't working. Those two files where created by
running "sql enwiki_p "select page_namespace from page where
page_is_redirect =0 and page_namespace = 0;"> ns_0.txt" then compressing
the resulting text file via 7zip. For the category namespace I just changed
page_namespace = 0 to page_namespace = 14,
On Sun, May 7, 2017 at 3:41 AM, Abdulfattah Safa <fattah.safa(a)gmail.com>
wrote:
hello John,
Thanks for your effort. Actually I need official dumps as I need to use
them in my thesis.
Could you please point me how did you get these ones?
Also, any idea why the API doesn't work properly for en Wikipedia? I use
the same code for other language and it worked.
Thanks,
Abed,
On Sun, May 7, 2017 at 1:45 AM John <phoenixoverride(a)gmail.com> wrote:
Here you go
ns_0.7z <http://tools.wmflabs.org/betacommand-dev/reports/ns_0.7z>
ns_14.7z <http://tools.wmflabs.org/betacommand-dev/reports/ns_14.7z>
On Sat, May 6, 2017 at 5:27 PM, John <phoenixoverride(a)gmail.com> wrote:
> Give me a few minutes I can get you a database dump of what you need.
>
> On Sat, May 6, 2017 at 5:25 PM, Abdulfattah Safa <
fattah.safa(a)gmail.com>
> wrote:
>
>> 1. I'm usng max as a limit parameter
>> 2. I'm not sure if the dumps have the data I need. I need to get the
>> titles
>> for all Articles (name space = 0), with no redirects and also need the
>> titles of all Categories (namespace = 14) without redirects
>>
>> On Sat, May 6, 2017 at 11:39 PM Eran Rosenthal <eranroz89(a)gmail.com>
>> wrote:
>>
>> > 1. You can use limit parameter to get more titles in each request
>> > 2. For getting many entries it is recommended to extract from dumps
or
> from
> > database using quarry
> >
> > On May 6, 2017 22:36, "Abdulfattah Safa" <fattah.safa(a)gmail.com>
wrote:
> >
> > > for the & in $Continue=-||, it's a type. It doesn't exist in
the
code.
> > >
> > > On Sat, May 6, 2017 at 10:12 PM Abdulfattah Safa <
> fattah.safa(a)gmail.com>
> > > wrote:
> > >
> > > > I'm trying to get all the page titles in Wikipedia in namespace
> using
> > the
> > > > API as following:
> > > >
> > > >
https://en.wikipedia.org/w/api.php?action=query&format=
> > >
xml&list=allpages&apnamespace=0&apfilterredir=nonredirects&
> > > aplimit=max&$continue=-||$apcontinue=BASE_PAGE_TITLE
> > > >
> > > > I keep requesting this url and checking the response if contains
> > continue
> > > > tag. if yes, then I use same request but change the
*BASE_PAGE_TITLE
> *to
> > > the value in apcontinue attribute in the response.
> > > My applications had been running since 3 days and number of
retrieved
> > > exceeds 30M, whereas it is about 13M in the dumps.
> > > any idea?
> > >
> > >
> > >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> >
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l