Hey folks,
I was talking to ottomata today about developing a schema for processing revisions in Hadoop. We came across a deep problem with field names that I'd like to discuss because I want people to be aware of the problem.
To explain this, I'll use an example. Let's say you want to get the namespace of this page: https://en.wikipedia.org/wiki/Biology
In javascript, this is represented as the variable *wgNamespaceNumber*.
In the database, this is represented as *page.page_namespace*
In the XML database dump, this is represented as the value at *<page><ns> *or *<namespaces><namespace.key> *depending where you are.
Right now, ottomata and I are considering the more descriptive name *page_namespace_id* since the value of all of these valiables/fields is an identifier -- not a name. I think that this is a *good* name if we consider it in a vacuum, but if we choose it, we'll add yet another name for wiki devs & analysts to be aware of.
Given the context of this decision, my instinct is to choose the least surprising name. Since I mostly work with the database, that would mean I'd choose *page_namespace*.
This is just one example of such nonsense. The decisions we make in formats that we produce now can have immeasurable effects on the sanity of others. I hope that the decisions we make today will minimize such pain, but it's hard to know for sure.
-Aaron