Hey folks,

I was talking to ottomata today about developing a schema for processing revisions in Hadoop.  We came across a deep problem with field names that I'd like to discuss because I want people to be aware of the problem.  

To explain this, I'll use an example.  Let's say you want to get the namespace of this page:

In javascript, this is represented as the variable wgNamespaceNumber.

In the database, this is represented as page.page_namespace

In the XML database dump, this is represented as the value at <page><ns> or <namespaces><namespace.key> depending where you are.

Right now, ottomata and I are considering the more descriptive name page_namespace_id since the value of all of these valiables/fields is an identifier -- not a name.   I think that this is a *good* name if we consider it in a vacuum, but if we choose it, we'll add yet another name for wiki devs & analysts to be aware of.

Given the context of this decision, my instinct is to choose the least surprising name.  Since I mostly work with the database, that would mean I'd choose page_namespace.

This is just one example of such nonsense.  The decisions we make in formats that we produce now can have immeasurable effects on the sanity of others.  I hope that the decisions we make today will minimize such pain, but it's hard to know for sure.  
