Hey folks,
I was talking to ottomata today about developing a schema for processing revisions in Hadoop. We came across a deep problem with field names that I'd like to discuss because I want people to be aware of the problem.
To explain this, I'll use an example. Let's say you want to get the namespace of this page:
In javascript, this is represented as the variable wgNamespaceNumber.
In the database, this is represented as page.page_namespace
In the XML database dump, this is represented as the value at <page><ns> or <namespaces><namespace.key> depending where you are.
Right now, ottomata and I are considering the more descriptive name page_namespace_id since the value of all of these valiables/fields is an identifier -- not a name. I think that this is a *good* name if we consider it in a vacuum, but if we choose it, we'll add yet another name for wiki devs & analysts to be aware of.
Given the context of this decision, my instinct is to choose the least surprising name. Since I mostly work with the database, that would mean I'd choose page_namespace.
This is just one example of such nonsense. The decisions we make in formats that we produce now can have immeasurable effects on the sanity of others. I hope that the decisions we make today will minimize such pain, but it's hard to know for sure.
-Aaron