I'd like to put a placeholder in Phab or
Trello for this work, but please
help me out because I am still new....could someone help summarize the
context and what we are trying solve?
Also, would this go into Research, Eng or Refinery backlog?
Thanks!
On Thu, Dec 11, 2014 at 1:52 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Bikeshed indeed -- this seems to be a project
that could soak up a lot
of time. I'm with Aaron -- let's be
consistent with the principle of least
surprise and use an existing identifier. The database seems as good a place
to start as any.
I disagree that this is bikeshedding. The reason people look back after
a year at a project and go "yuck, wish we named those things differently"
is precisely because this type of effort is incorrectly labeled as
bikeshedding. We are *not* talking a bout a bike shed. We're talking
about a schema that will hopefully serve hundreds or thousands of
researchers and our own growing team (I'm considering both Aaron's revision
schema and the data warehouse schema).
So, I'm not sure that is necessary for the
term "identifier" which I
> assume that "id" abbreviates. Regardless it seems clear that these
numbers
> are thought of as primary identifiers of a namespace that can otherwise
> have many names. For example, see this snippet from the result of this
> query:
>
http://es.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop…
>
> "1": {
> "id": 1,
> "case": "first-letter",
> "*": "Discusi\u00f3n",
> "subpages": "",
> "canonical": "Talk"
>
> },
>
Fair enough, namespace_id seems like a good name for a property of a
page entity then.
I don't see us getting rid of legacy naming
right now. I don't see how
> adding a new name helps anyone -- veteran or newbie.
>
I disagree that we have to care at all about legacy names. I disagree
that the principle of least surprise leads one to prefer database names.
To me, that's more surprising because database conventions have no place in
json. If I was new to this world, it also seems more surprising. If I was
an existing user, I don't think I would be at all surprised as long as the
names were clear and the schemas well documented. This page_namespace_id
is a bit of a red herring because we have harder things to tackle like
"restrictions".
However, if we were to develop a mapping of
canonical names and pursue
> that from here forward, we might be able to move beyond the old names for
> the most important data sources in a few of years. However, I'm skeptical
> that we'll ever be able to change any production DB field names.
>
We need not be tied to the production db names. The data warehouse
effort is trying to transform a confusing schema riddled with
idiosyncrasies into a clean, easy to understand, and easy to work with,
dimensional model. In the process, we are also trying to capture changes
to objects over time so we are greatly expanding the usefulness of the
database. Good naming matters and we should take our time.
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org