That's not the way I read Brion's proposal.  It looks to me like there would only be records for each new revision and for those revisions and pages that were updated--that no old data that had not been updated or created would be included.  Either way, this is essential.  I'm sure no one would disagree. 

-Aaron

On Fri, Apr 1, 2011 at 12:06 PM, Luca de Alfaro <luca@dealfaro.com> wrote:
Not quite... if I am reading correctly the proposal by Brion, this would list all the pages that changed in a specific interval.  If the interval is large, like a month, this could be a very large size, if all the history of a page is provided. 
What I was suggesting is to include only the changes (the revisions) that occur in a specific time span. 

Luca


On Thu, Mar 31, 2011 at 5:33 PM, Yuvi Panda <yuvipanda@gmail.com> wrote:
Would incremental dumps, as described by brion long time ago
(http://leuksman.com/log/2007/10/14/incremental-dumps/) be what you're
looking for?

On Fri, Apr 1, 2011 at 5:01 AM, Aaron Halfaker <aaron.halfaker@gmail.com> wrote:
> If periodic update dumps are being considered, information that describes
> changes to old data (page deletes, user renames, etc) would be very useful
> to have along with new revisions.
>
> -Aaron
>
> On Mar 31, 2011 6:27 PM, "Luca de Alfaro" <luca@dealfaro.org> wrote:
>> I think I would be very interested in 3, or even, in having every month a
>> dump of that month's revisions. As I have built tools for the xml dumps,
>> no
>> change in format is good for me (and for WikiTrust).
>>
>> I would find incremental dumps (with occasional, yearly, full dumps) much
>> easier to manage than full dumps.
>>
>> Luca
>>
>> On Thu, Mar 31, 2011 at 2:27 PM, Yuvi Panda <yuvipanda@gmail.com> wrote:
>>
>>> Hi, I'm a student planning on doing GSoC this year on mediawiki.
>>> Specifically, I'd like to work on data dumps.
>>>
>>> I'm writing this to gauge what would be useful to the research
>>> community. Several ideas thrown about include:
>>> 1. JSON Dumps
>>> 2. Sqlite Dumps
>>> 3. Daily dumps of revisions in last 24 hours
>>> 4. Dumps optimized for very fast import into various external storage
>>> and smaller size (diffs)
>>> 5. JSON/CSV for Special:Import and Special:Export
>>>
>>> Would any of these be useful? Or is there anything else that I'm
>>> missing, that you would consider much more useful?
>>>
>>> Feedback would be invaluable :)
>>>
>>> Thanks :)
>>> --
>>> Yuvi Panda T
>>> http://yuvi.in/blog
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>



--
Yuvi Panda T
http://yuvi.in/blog

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l