those are official, I ran the report from toollabs which is Wikimedia's developer platform which includes a copy of en.Wikipedia's database (with sensitive fields removed). Without looking at your code and doing some testing, which unfortunately I don't have the time for, I cannot help debugging why your code isn't working. Those two files where created by running "sql enwiki_p "select page_namespace from page where page_is_redirect =0 and page_namespace = 0;"> ns_0.txt" then compressing the resulting text file via 7zip. For the category namespace I just changed page_namespace = 0 to page_namespace = 14,
On Sun, May 7, 2017 at 3:41 AM, Abdulfattah Safa fattah.safa@gmail.com wrote:
hello John, Thanks for your effort. Actually I need official dumps as I need to use them in my thesis. Could you please point me how did you get these ones? Also, any idea why the API doesn't work properly for en Wikipedia? I use the same code for other language and it worked.
Thanks, Abed,
On Sun, May 7, 2017 at 1:45 AM John phoenixoverride@gmail.com wrote:
Here you go ns_0.7z http://tools.wmflabs.org/betacommand-dev/reports/ns_0.7z ns_14.7z http://tools.wmflabs.org/betacommand-dev/reports/ns_14.7z
On Sat, May 6, 2017 at 5:27 PM, John phoenixoverride@gmail.com wrote:
Give me a few minutes I can get you a database dump of what you need.
On Sat, May 6, 2017 at 5:25 PM, Abdulfattah Safa <
fattah.safa@gmail.com>
wrote:
- I'm usng max as a limit parameter
- I'm not sure if the dumps have the data I need. I need to get the
titles for all Articles (name space = 0), with no redirects and also need the titles of all Categories (namespace = 14) without redirects
On Sat, May 6, 2017 at 11:39 PM Eran Rosenthal eranroz89@gmail.com wrote:
- You can use limit parameter to get more titles in each request
- For getting many entries it is recommended to extract from dumps
or
from
database using quarry
On May 6, 2017 22:36, "Abdulfattah Safa" fattah.safa@gmail.com
wrote:
for the & in $Continue=-||, it's a type. It doesn't exist in the
code.
On Sat, May 6, 2017 at 10:12 PM Abdulfattah Safa <
fattah.safa@gmail.com>
wrote:
> I'm trying to get all the page titles in Wikipedia in namespace
using
the
> API as following: > > https://en.wikipedia.org/w/api.php?action=query&format= xml&list=allpages&apnamespace=0&apfilterredir=nonredirects& aplimit=max&$continue=-||$apcontinue=BASE_PAGE_TITLE > > I keep requesting this url and checking the response if contains
continue
> tag. if yes, then I use same request but change the
*BASE_PAGE_TITLE
*to
> the value in apcontinue attribute in the response. > My applications had been running since 3 days and number of
retrieved
> exceeds 30M, whereas it is about 13M in the dumps. > any idea? > > > _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l