Hi. The call with the WMF today turned up a pretty big issue.
WikibaseQueryEngine (WQE) depends on two big 3rd party libraries:
* Symphony (used for implementing a command line client for the query engine, which we use to generate database tables) * doctine/dbal (a database abstraction layer we use to run queries and create database tables)
In order to deploy these, the WMF would require a line-by-line review of something around 50thousand lines of code (or maybe it was 30thousand, or 80thousand - a lot, in any case). This is not feasible.
We (Katie, Jeroen, Chris, Nik, me, etc) have come up with a plan to get rid of these dependencies:
1) Split the command line interface into a separate component (WQE-CLI perhaps) that would not be deployed. Symphony is then out of the picture.
2) Go back to using MediaWiki's DB abstraction for running queries. This should be easy.
3) For generating DB tables (aka schema creation), we create a separate component (WQE schema generator or something) that would use dbal to generate staic sql files for the supported db systems (most importantly, mysql and sqlite). This would be part of our build step, but the code relying on dbal would not be deployed. Only the generated sql files would be used for deployment (either through the update script, or, in the case of WMF, manually).
In the end, WQE could use either MW or dbal for running queries, so we could deploy it without dbal on the WMF cluster, but it could also be used outside MediaWiki.
If there are no objections, I propose to take this on for the next sprint. Perhaps we can start to split this into tasks on bugzilla already.
Cheers, Daniel
Hey,
- Symphony
The Symfony Console component, no the Symfony framework.
- Split the command line interface into a separate component (WQE-CLI
perhaps) that would not be deployed.
This is easy to do, though will be less convenient for users of those scripts.
- Go back to using MediaWiki's DB abstraction for running queries.
We cannot directly use it without making QueryEngine depend on MediaWiki, and without losing the ability to run the tests with an in-memory SQLite database. Luckily we can easily switch back to using the interface originally written for this task: https://github.com/wmde/WikibaseDatabase/blob/master/src/QueryInterface/Quer...
- For generating DB tables (aka schema creation), we create a separate
component
The schema definition code should not be moved into it's own component. It belongs very close to the code using the schema. So I don't see a sane way of getting rid of all code that depends on DBAL in a sane way. We could create our own implementation of this functionality again, though I do not consider that to be a sane approach. The code in question will not be executed on the WMF cluster though. And if you want to be really paranoid about it, you can just delete dbal from the build before it goes onto the cluster, and be sure it is indeed not executed.
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Hey,
I'm also curious to if WMF is indeed not running any CLI tools on the cluster which happen to use Symfony Console.
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Am 04.09.2014 20:03, schrieb Jeroen De Dauw:
Hey,
I'm also curious to if WMF is indeed not running any CLI tools on the cluster which happen to use Symfony Console.
As far as I know, no unreviewed 3rd party php code is running on the public facing app servers. Anything that has a debian package is ok. Don't know about PEAR...
-- daniel
On Fri, Sep 5, 2014 at 11:01 AM, Daniel Kinzler <daniel.kinzler@wikimedia.de
wrote:
Am 04.09.2014 20:03, schrieb Jeroen De Dauw:
Hey,
I'm also curious to if WMF is indeed not running any CLI tools on the
cluster
which happen to use Symfony Console.
99% sure it is not. (e.g. we don't run composer, which uses symfony console, in the cluster... only on labs instances)
As far as I know, no unreviewed 3rd party php code is running on the public facing app servers. Anything that has a debian package is ok. Don't know about PEAR...
Highly doubt we use stuff from pear.
Cheers, Katie
-- daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
On 2014-09-05 11:01, Daniel Kinzler wrote:
As far as I know, no unreviewed 3rd party php code is running on the public facing app servers. Anything that has a debian package is ok. Don't know about PEAR...
Then we are nearly fine. Comparing debian testing with https://github.com/wmde/WikibaseQueryEngine/blob/894636508c0b590b02bdd325f42... :
$ aptitude show php-doctrine-dbal |grep Version Version: 2.4.2-4
That fullfills "doctrine/dbal": "~2.4".
$ aptitude show php-symfony-console |grep Version Version: 2.3.1+dfsg-1
This is a bit behind "symfony/console": "~2.4". No newer version available in debian, yet. Which can be changed.
Hi Chris, hi Rob!
During our discussion about using 3rd party software, you said that the rule of thumb is "if it has a debian package, it's fine". Jan now pointed out that both dbal and symfony do have debian packages.
Does that mean we can just use them after all? Or does this rule not apply to php code? If not, why not?
-- daniel
Am 07.09.2014 20:14, schrieb Jan Zerebecki:
On 2014-09-05 11:01, Daniel Kinzler wrote:
As far as I know, no unreviewed 3rd party php code is running on the public facing app servers. Anything that has a debian package is ok. Don't know about PEAR...
Then we are nearly fine. Comparing debian testing with https://github.com/wmde/WikibaseQueryEngine/blob/894636508c0b590b02bdd325f42... :
$ aptitude show php-doctrine-dbal |grep Version Version: 2.4.2-4
That fullfills "doctrine/dbal": "~2.4".
$ aptitude show php-symfony-console |grep Version Version: 2.3.1+dfsg-1
This is a bit behind "symfony/console": "~2.4". No newer version available in debian, yet. Which can be changed.
On Fri, Sep 5, 2014 at 2:01 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 04.09.2014 20:03, schrieb Jeroen De Dauw:
I'm also curious to if WMF is indeed not running any CLI tools on the cluster which happen to use Symfony Console.
As far as I know, no unreviewed 3rd party php code is running on the public facing app servers. Anything that has a debian package is ok. Don't know about PEAR...
I probably misspoke in that conversation.
There are two main review processes to get external dependencies installed on the Wikimedia cluster. One way is by checking it in somewhere in the source, and going through our code review process. The other way is to get it deployed as part of the base operating system.
If you're going to go the source control route, then it needs to go through code review.
If you're going to go the operating system route, then TechOps will make the call. I don't know everything that goes into their thought process, but having a Debian package is a necessary (but not always sufficient) means of getting it deployed. The value of relying on packaging goes way down if you aren't prepared to use the version that comes with the Ubuntu LTS versions. So, if you're thinking that "oh, there's a package, great, let's now get them to upgrade to the bleeding edge!", you're likely to be disappointed. Also, TechOps is pretty stingy about what they accept responsibility for.
TechOps tends to be skeptical of language specific tools such as PEAR, Composer, npm, pip, CPAN, etc. When we use those things, we tend to use them in conjunction with source control and the review process there.
Hope this helps.
Rob
Hi Rob, thanks for clarifying!
I guess I just oversimplified what was said in our discussion. I'll try to summarize what you now wrote:
If there is a package for dbal/symfony/whatever in Ubuntu LTS, we have a good chance, but no guarantee, that TechOps is fine with deploying it.
I understand that we are basically relying on the quality control and security vetting that (hopefully) goes into making LTS packages.
Is that about right?
daniel
Am 09.09.2014 19:01, schrieb Rob Lanphier:
On Fri, Sep 5, 2014 at 2:01 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 04.09.2014 20:03, schrieb Jeroen De Dauw:
I'm also curious to if WMF is indeed not running any CLI tools on the cluster which happen to use Symfony Console.
As far as I know, no unreviewed 3rd party php code is running on the public facing app servers. Anything that has a debian package is ok. Don't know about PEAR...
I probably misspoke in that conversation.
There are two main review processes to get external dependencies installed on the Wikimedia cluster. One way is by checking it in somewhere in the source, and going through our code review process. The other way is to get it deployed as part of the base operating system.
If you're going to go the source control route, then it needs to go through code review.
If you're going to go the operating system route, then TechOps will make the call. I don't know everything that goes into their thought process, but having a Debian package is a necessary (but not always sufficient) means of getting it deployed. The value of relying on packaging goes way down if you aren't prepared to use the version that comes with the Ubuntu LTS versions. So, if you're thinking that "oh, there's a package, great, let's now get them to upgrade to the bleeding edge!", you're likely to be disappointed. Also, TechOps is pretty stingy about what they accept responsibility for.
TechOps tends to be skeptical of language specific tools such as PEAR, Composer, npm, pip, CPAN, etc. When we use those things, we tend to use them in conjunction with source control and the review process there.
Hope this helps.
Rob
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Am 09.09.2014 19:20, schrieb Daniel Kinzler:
Hi Rob, thanks for clarifying!
I guess I just oversimplified what was said in our discussion. I'll try to summarize what you now wrote:
If there is a package for dbal/symfony/whatever in Ubuntu LTS, we have a good chance, but no guarantee, that TechOps is fine with deploying it.
Quick update on that: If I understand correctly, the cluster is running Ubuntu 12.04, which doesn't have the packages in question, but an upgrade to 14.04 is in the pipeline.
So, there are two things we need to know in order to make an informed decision:
1) can we use the Ubuntu LTS packages for symfony and dbal?
2) when is 14.04 going to be rolled out?
Who can answer these questions? How do we poke TechOps?
-- daniel
On Tue, Sep 9, 2014 at 10:25 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
I guess I just oversimplified what was said in our discussion. I'll try to summarize what you now wrote:
If there is a package for dbal/symfony/whatever in Ubuntu LTS, we have a good chance, but no guarantee, that TechOps is fine with deploying it.
Hi Daniel,
"Good chance" is overstating it. Merely better chance than if there was no package. In the case of something that is clearly a PHP library, I'm guessing that Ops would ask Platform for a recommendation, and then you're kinda back where you started.
I'm kinda regretting bringing up the apt repo case, because basically a loophole in our review strategy, not something I'm sure we want to encourage or something I believe will get a great deal of traction. There's an outside chance that people shrug and say "sure, why not?", but I don't think it's going to be generally attractive.
That said, I'll go ahead and answer the stuff below, just in case it's still of interest....
Quick update on that: If I understand correctly, the cluster is running Ubuntu 12.04, which doesn't have the packages in question, but an upgrade to 14.04 is in the pipeline.
So, there are two things we need to know in order to make an informed decision:
- can we use the Ubuntu LTS packages for symfony and dbal?
Per above, probably not likely.
- when is 14.04 going to be rolled out?
Concurrently with the HHVM upgrade. In the coming weeks. We don't have a set end date for when all machines will be converted, but we should be well underway by the end of this month. There's probably going to be a stubborn service or two that will stick around quite a bit longer than that, though.
Who can answer these questions? How do we poke TechOps?
The Wikimedia Operations list is a good place to ask about this type of thing.
Rob
Am 04.09.2014 20:01, schrieb Jeroen De Dauw:
Hey,
- Symphony
The Symfony Console component, no the Symfony framework.
sloccount gives me: Total Physical Source Lines of Code (SLOC) = 10,570
That's a lot for a little bit of conveniance when deailing with command line arguments...
- Split the command line interface into a separate component (WQE-CLI
perhaps) that would not be deployed.
This is easy to do, though will be less convenient for users of those scripts.
What exactly is the scope and purpose of the cli interface, anyway? You can use it to re-generate the tables, and populate them, right? The two or three main use cases could easily be re-implemented without symfony, don't you think?
- Go back to using MediaWiki's DB abstraction for running queries.
We cannot directly use it without making QueryEngine depend on MediaWiki, and without losing the ability to run the tests with an in-memory SQLite database. Luckily we can easily switch back to using the interface originally written for this task: https://github.com/wmde/WikibaseDatabase/blob/master/src/QueryInterface/Quer...
Why would we not be able to use sqlite? For queries, MW's abstraction layer should handle that just fine...
The dependency on MediaWiki could be optional: we could have implementations of the query interface based on MW's DatabaseBase, and in addition, dbal. So you'd need to have one of the two, but nither would be a hard dependency.
- For generating DB tables (aka schema creation), we create a separate component
The schema definition code should not be moved into it's own component. It belongs very close to the code using the schema.
True. Schema *generation* might, though. But I agree that it's not nice to split this.
So I don't see a sane way of getting rid of all code that depends on DBAL in a sane way. We could create our own implementation of this functionality again, though I do not consider that to be a sane approach.
indeed
The code in question will not be executed on the WMF cluster though. And if you want to be really paranoid about it, you can just delete dbal from the build before it goes onto the cluster, and be sure it is indeed not executed.
How about making an optional dependency: if you install dbal, you can generate new schema creation scripts based on your setup (may include extra tables for additional value types or whatever). If you don't, you only get the pre-generated standard files. Would that work?
-- daniel
Hey,
What exactly is the scope and purpose of the cli interface, anyway? You can
use it to re-generate the tables, and populate them, right? The two or three main use cases could easily be re-implemented without symfony, don't you think?
The implementation of the functionality accessible via the CLI does not depend on the CLI. So as I've already told you, it's trivial to use another CLI UI, or whatever type of UI you want for that matter. So I'm not sure why we are still discussing this, we've already established this is easily to resolve.
Why would we not be able to use sqlite? For queries, MW's abstraction layer
should handle that just fine...
I never said we'd not be able to use SQLite. Please read again what I wrote.
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Just had my longest face palm in a while.
Isn't Symphony a large framework backed by another huge open source community? A framework used by hundreds of programmers and tested heavily? Did the foundation ever code review Ubuntu or the PHP interpreter or any proprietary software which might be running on their servers?
This doesn't sound encouraging in terms of code reusability and cooperation with other open source projects at all. Please think about this again, causing the Wikidata team more work for probably no good reason.
Cheers, Danwe
05.09.2014 00:43, Daniel Kinzler:
Hi. The call with the WMF today turned up a pretty big issue.
WikibaseQueryEngine (WQE) depends on two big 3rd party libraries:
- Symphony (used for implementing a command line client for the query engine,
which we use to generate database tables)
- doctine/dbal (a database abstraction layer we use to run queries and create
database tables)
In order to deploy these, the WMF would require a line-by-line review of something around 50thousand lines of code (or maybe it was 30thousand, or 80thousand - a lot, in any case). This is not feasible.
We (Katie, Jeroen, Chris, Nik, me, etc) have come up with a plan to get rid of these dependencies:
- Split the command line interface into a separate component (WQE-CLI perhaps)
that would not be deployed. Symphony is then out of the picture.
- Go back to using MediaWiki's DB abstraction for running queries. This should
be easy.
- For generating DB tables (aka schema creation), we create a separate
component (WQE schema generator or something) that would use dbal to generate staic sql files for the supported db systems (most importantly, mysql and sqlite). This would be part of our build step, but the code relying on dbal would not be deployed. Only the generated sql files would be used for deployment (either through the update script, or, in the case of WMF, manually).
In the end, WQE could use either MW or dbal for running queries, so we could deploy it without dbal on the WMF cluster, but it could also be used outside MediaWiki.
If there are no objections, I propose to take this on for the next sprint. Perhaps we can start to split this into tasks on bugzilla already.
Cheers, Daniel
wikidata-tech@lists.wikimedia.org