Re: [Wikitech-l] Get Wikipedia Page Titles using API looks Endless

7 May 2017

      those are official, I ran the report from toollabs which is Wikimedia's
developer platform which includes a copy of en.Wikipedia's database (with
sensitive fields removed). Without looking at your code and doing some
testing, which unfortunately I don't have the time for, I cannot help
debugging why your code isn't working. Those two files where created by
running "sql enwiki_p "select page_namespace from page where
page_is_redirect =0 and page_namespace = 0;"> ns_0.txt" then compressing
the resulting text file via 7zip. For the category namespace I just changed
page_namespace = 0 to page_namespace = 14,
On Sun, May 7, 2017 at 3:41 AM, Abdulfattah Safa fattah.safa@gmail.com
wrote:
...
hello John,
Thanks for your effort. Actually I need official dumps as I need to use
them in my thesis.
Could you please point me how did you get these ones?
Also, any idea why the API doesn't work properly for en Wikipedia? I use
the same code for other language and it worked.
Thanks,
Abed,
On Sun, May 7, 2017 at 1:45 AM John phoenixoverride@gmail.com wrote:
...
Here you go
ns_0.7z http://tools.wmflabs.org/betacommand-dev/reports/ns_0.7z
ns_14.7z http://tools.wmflabs.org/betacommand-dev/reports/ns_14.7z
On Sat, May 6, 2017 at 5:27 PM, John phoenixoverride@gmail.com wrote:
...
Give me a few minutes I can get you a database dump of what you need.
On Sat, May 6, 2017 at 5:25 PM, Abdulfattah Safa <
fattah.safa@gmail.com>
...
...
wrote:
...

I'm usng max as a limit parameter
I'm not sure if the dumps have the data I need. I need to get the

titles
for all Articles (name space = 0), with no redirects and also need the
titles of all Categories (namespace = 14) without redirects
On Sat, May 6, 2017 at 11:39 PM Eran Rosenthal eranroz89@gmail.com
wrote:
...

You can use limit parameter to get more titles in each request
For getting many entries it is recommended to extract from dumps

or
...
...
...
from
...
database using quarry
On May 6, 2017 22:36, "Abdulfattah Safa" fattah.safa@gmail.com
wrote:
...
...
...
...
for the & in $Continue=-||, it's a type. It doesn't exist in the
code.
...
...
...
...
On Sat, May 6, 2017 at 10:12 PM Abdulfattah Safa <
fattah.safa@gmail.com>
...
...
wrote:
> I'm trying to get all the page titles in Wikipedia in namespace
using
...
the
...
> API as following:
>
> https://en.wikipedia.org/w/api.php?action=query&format=
xml&list=allpages&apnamespace=0&apfilterredir=nonredirects&
aplimit=max&$continue=-||$apcontinue=BASE_PAGE_TITLE
>
> I keep requesting this url and checking the response if contains
continue
...
> tag. if yes, then I use same request but change the
*BASE_PAGE_TITLE
...
...
...
*to
...
> the value in apcontinue attribute in the response.
> My applications had been running since 3 days and number of
retrieved
...
...
> exceeds 30M, whereas it is about 13M in the dumps.
> any idea?
>
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Get Wikipedia Page Titles using API looks Endless