Re: [Toolserver-l] Extracting basic revision data

29 Nov 2010


      Михајло Анђелковић wrote:
...
I would ask for allowance to run a request that can be resource
consuming if not properly scaled:
SELECT page.page_title as title, rev_user_text as user, rev_timestamp
as timestamp, rev_len as len FROM revision JOIN page ON page.page_id =
rev_page WHERE rev_id > 0 AND rev_id < [...] AND rev_deleted = 0;
This is intended to extract basic data about all publicly visible
revisions from 1 to [...]. Info about each revision would be a 4-tuple
title/user name/time/length. I need this data to start generating a
timeline of editing of srwiki, so it is intended to be run only once
for each revision.
If this is generally allowed to do, my question is how large chunks of
data can I take at once, and how long should be waited between two
takes?
srwiki_p isn't very large (3665333 revisions and 413987 pages), so I
personally wouldn't worry about performance very much at all. If you were
going to run this query on enwiki_p or another larger database, it might be
more of a concern. Run the queries that you need to run.
The "Queries" page on the Toolserver wiki might be helpful to you.[1]
Looking at your query, you should pull page.page_namespace or specify
page_namespace = 0. Pulling only page.page_title without specifying a
namespace will output useless results. I'm also unclear why you'd need to
specify rev_id > 0, though you might have your reasons for doing so.
Your Toolserver account has a quota (viewable with 'quota -v') that you
might hit if you're outputting a lot of data to disk. You can always use
/mnt/user-store/ or file a ticket in JIRA if you need an increased quota.
MZMcBride
[1] https://wiki.toolserver.org/view/Queries

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Extracting basic revision data