Re: [Wikitech-l] IBM DB2 patch for MediaWiki

27 Dec 2008

Hello.

After a few days of pondering the issues, I would like to explain what I 
suggested in my previous message, in more detail and (hopefully) more 
clearly.

What I'm about to say is pretty abstract, so it's difficult to convey 
the right meaning. Please forgive me if I say something you already 
know, or just nonsense :-)

Jesús Quiroga escribió:
...
  I believe a better solution is to design a
domain-specific language, an 
 idea not very different from your first one.
 This DSL would model the interaction between the application and the DB 
 as it is now, and would be designed to evolve. That's it.

The problem I discuss is how to best access the data store from an 
application. I believe the right answer is different for each project, 
but it's not difficult to evaluate the alternatives, one by one, in a 
given context. I think it is worthwhile to do that in the context of 
MediaWiki.

I will refer to wiki modules and databases as if they were 'hosts' 
connected to a 'network', to highlight the role of languages in the 
operation of the system at runtime.

The first way to access the data store is the 'direct' one:

    [polyglot wiki] <--- mysDataL ---> [mysql]
    [polyglot wiki] <--- posDataL ---> [postgresql]
    [polyglot wiki] <--- db2DataL ---> [db2]

Here, the polyglot wiki module talks to every database using the proper 
languages. 'mysDataL' means 'the data language understood by MySQL', 
'posDataL' means 'the data language understood by PostgreSQL', etc.

The polyglot wiki promises to learn several languages and to speak them 
correctly forever, so, if a new database comes along or any of their 
data languages evolves, the polyglot wiki is forced to adapt at a 
potentially great cost. Besides, any change to the database schema can 
trigger lots of updates to the wiki code, and be very costly too.

The advantages of this way are well known: it is fast, no need to do 
design, easy to understand.
The drawbacks are apparently few, but devastating: verbose and complex 
code in multiple places in the wiki module, very costly to maintain, 
even more costly to evolve. All changes cost a lot, in time and effort.

The second way to access the data store that is usually considered is 
the 'indirect' one:

    [wiki] <--- wikiDataL ---> [polyglot translator]

    [polyglot translator] <--- mysDataL ---> [mysql]
    [polyglot translator] <--- posDataL ---> [postgresql]
    [polyglot translator] <--- db2DataL ---> [db2]

Here, wikiDataL means 'some relational data definition and manipulation 
language suitable for use by the wiki'.

The polyglot translator promises to learn wikiDataL and the other 
dialects and to evolve with them, so it has all the problems the wiki 
had in the direct way, but now the cost is lower because a lot of 
complexity is 'hidden' inside the translator and can't reach the wiki. 
As a result, wiki code is not updated as much, and it's much cleaner and 
less verbose.

The advantages of this way are: wiki module code is simpler, cost of 
evolution is reduced.
The drawbacks are apparently many: it's slower, design is needed, harder 
to understand, a new language (wikiDataL), translator can be very 
complex. However, the need to reduce the cost to achieve change is 
usually so great that these inconveniences are minor in comparison.

Now the interesting bit begins. A third possible way to access the data 
store, the 'interpreted' one:

    [wiki] <--- wikiNeedL ---> [polyglot interpreter]

    [polyglot interpreter] <--- mysDataL ---> [mysql]
    [polyglot interpreter] <--- posDataL ---> [postgresql]
    [polyglot interpreter] <--- db2DataL ---> [db2]

Here, wikiNeedL means 'some language adequate for the wiki to express 
its data access needs and nothing else'.

wikiNeedL is the domain-specific language I wrote about in my previous 
message.

The differences between wikiDataL and wikiNeedL are mainly these:
   - wikiNeedL would contain just enough wiki concepts to express the 
wiki's needs, so it's effectively confined to that domain. wikiDataL 
belongs to the relational data model domain, which is quite different.
   - in general, wikiNeedL would have different semantics than the 
dialects understood by the databases, so the translation step becomes 
more like interpretation, rather than just syntactic transformations. 
wikiDataL usually has the same semantics than the dialects.
   - wikiNeedL would contain just enough concepts to satisfy current 
needs, and will be open to extension. wikiDataL aims to be 
general-purpose and to fulfill current and future needs.

The main reason to consider the 'interpreted' way is, of course, that it 
helps reduce even more the cost to achieve change.

So that's what I was talking about. I will say more about the 
differences between the indirect and the interpreted ways in a future 
message.

Thanks for your attention.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] IBM DB2 patch for MediaWiki