I would second the recommendation of using the dumps for such a large graphing project. If it's more than a couple hundred pages the API/database queries can get bulky
On Sun, Mar 18, 2018 at 5:07 PM Brian Wolff bawolff@gmail.com wrote:
Hi,
You can run longer queries by getting access to toolforge ( https://wikitech.wikimedia.org/wiki/Portal:Toolforge) and running from the command line.
However the query in question might still take an excessively long time (if you are doing all of wikipedia). I would expect that query to result in about 150mb of data and maybe take days to complete.
You can also break it down into parts by adding WHERE page_title >='a' AND page_title < 'b'
Note, also of interest: full dumps of all the links is available at
https://dumps.wikimedia.org/enwiki/20180301/enwiki-20180301-pagelinks.sql.gz (you would also need https://dumps.wikimedia.org/enwiki/20180301/enwiki-20180301-page.sql.gz to convert page ids to page names) -- Brian On Sunday, March 18, 2018, Nick Bell bhink03@gmail.com wrote:
Hi there,
I'm a final year Mathematics student at the University of Bristol, and
I'm
studying Wikipedia as a graph for my project.
I'd like to get data regarding the number of outgoing links on each page, and the number of pages with links to each page. I have already inquired about this with the Analytics Team mailing list, who gave me a
few
suggestions.
One of these was to run the code at this link
query/25400 with these instructions:
"You will have to fork it and remove the "LIMIT 10" to get it to run on all the English Wikipedia articles. It may take too long or produce too much data, in which case please ask on this list for someone who can run it for you."
I ran the code as instructed, but the query was killed as it took longer than 30 minutes to run. I asked if anyone on the mailing list could run
it
for me, but no one replied saying they could. The guy who wrote the code suggested I try this mailing list to see if anyone can help.
I'm a beginner in programming and coding etc., so any and all help you
can
give me would be greatly appreciated.
Many thanks, Nick Bell University of Bristol _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l