This is a good discussion. The true dynamics won't be known until you've got live users on the system, but based on what I've seen with existing Wikipedia edits, the dynamics will be even more complex than predicted so far (which is already pretty complex!).
Some other things to consider:
- the focus of Wikipedia articles drifts over time (with good feedback loops built in to the system, this should hopefully be self-correcting)
- label/description disagreement occurs - title says one thing, first few sentences (which is often all people scan when working quickly) say something different, the article taken as a whole is about a third thing
- you'll see different behavior depending on whether you track by article number (internal ID) or article title
- the granularity of Wikipedia articles depends on the length of the text, not just semantics. Concepts with lots of text get split across multiple articles (e.g. WW II), while concepts which don't have much written about them risk getting combined into composite articles about multiple concepts.
- redirects are used for: aliases, misspellings, "see instead" references to semantically different articles, and probably other things that I'm not aware of. This can complicate doing something meaningful with them.
Another source for data on the current articles and their behavior is Freebase. Wikipedia based topics which have been split or combined retain an audit trail that lets you figure out what happened. It only covers the last 5 years and only English Wikipedia, but within those limitations it could provide some interesting insights. I'm happy to help anyone who wants to work with this data.
Tom