Hi Dan,
Thanks a lot. I think I bumped into that link at some point and then I wasn't able to come across it again. There is a point that is not entirely clear to me
"Thus, note that incremental downloads of these dumps may generate inconsistent data. Consider using EventStreams for real time updates on MediaWiki changes (API docs)."
I am planning to retrieve updated versions of the metadata regularly. So I guess I have to use EventStream to access the recent changes? AFAIU there recent changes come from the RecentChanges table [1]. So what would be a proper stream of actions? For example:
1. Dowload the mediawiki_history dump once and parse it 2. For every new update of my data pool, access recent changes through event stream as per [2]
Did understand this correctly?
Last thing, in the pageview archive there are three types of file: automated, spider and user. Am I right in understanding that "user" relates to pageviews operated by real persons, while "automated" and "spider" by programs (not sure about the difference between the two)?
Cristina
[1] https://www.mediawiki.org/wiki/Manual:Recentchanges_table [2] https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams