Hi folks,
in our quest to simplify the creation of production-quality services we have recently made some modest progress with a service-runner module [1] and an early service-template-node prototype [2]. The latter deserves its own discussion (and is at a very early stage), so I'm focusing on service-runner in this thread.
Service-runner [1] is a small module that we moved out of restbase. It generalizes some simple start-up, monitoring and supervision facilities that we refined while building services like Parsoid, Mathoid or RESTBase:
- commandline option parsing & yaml config loading (with a standard format) - worker pool management using the cluster module; graceful restarts - logging via gelf to logstash, with option to configure other backends - generic metric reporting (txstatsd, statsd or simple logging for development) - general worker monitoring & debugging: V8 heap metrics & limiting, support for heap dumps
For small (third party) installs with limited memory, we also added the capability to cleanly run multiple services in a single process.
The uniform way to run and configure services provided by service-runner should let us create a shared puppet module to manage most of the per-service tasks [3]. Another possibility is to automatically build packages for these services [4], which can help to distribute these services to third-party users.
So, please have a look & let us know what you are missing / would like to see. There is a service-runner tag on phabricator [5] that we can use to track specific tasks.
Thanks,
Gabriel
[1]: https://github.com/wikimedia/service-runner [2]: https://github.com/wikimedia/service-template-node [3]: https://phabricator.wikimedia.org/T89901 [4]: https://phabricator.wikimedia.org/T89900 [5]: https://phabricator.wikimedia.org/tag/service-runner/
Refactoring out useful components ftw!
On Mon, Feb 23, 2015 at 4:48 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
Hi folks,
in our quest to simplify the creation of production-quality services we have recently made some modest progress with a service-runner module [1] and an early service-template-node prototype [2]. The latter deserves its own discussion (and is at a very early stage), so I'm focusing on service-runner in this thread.
Service-runner [1] is a small module that we moved out of restbase. It generalizes some simple start-up, monitoring and supervision facilities that we refined while building services like Parsoid, Mathoid or RESTBase:
- commandline option parsing & yaml config loading (with a standard format)
- worker pool management using the cluster module; graceful restarts
- logging via gelf to logstash, with option to configure other backends
- generic metric reporting (txstatsd, statsd or simple logging for
development)
- general worker monitoring & debugging: V8 heap metrics & limiting,
support for heap dumps
For small (third party) installs with limited memory, we also added the capability to cleanly run multiple services in a single process.
The uniform way to run and configure services provided by service-runner should let us create a shared puppet module to manage most of the per-service tasks [3]. Another possibility is to automatically build packages for these services [4], which can help to distribute these services to third-party users.
So, please have a look & let us know what you are missing / would like to see. There is a service-runner tag on phabricator [5] that we can use to track specific tasks.
Thanks,
Gabriel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, Feb 23, 2015 at 1:48 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
Service-runner [1] is a small module that we moved out of restbase. It generalizes some simple start-up, monitoring and supervision facilities ...
I'm surprised there isn't something like this already in nodejs that you get "for free" when you use forever[6] to run a node command. Did you consider forever-service [7] ? It sounds similar:
- Make an universal service installer across various Linux distros
and other OS. 2. Automatically configure other useful things such as Logrotation scripts, port monitoring scripts etc. 3. Graceful shutdown of services as default behaviour.
It's _great_ that https://github.com/wikimedia/service-runner#see-also
mentions similar packages, I made a pull request to add forever-service to the README though I didn't compare its features.
For small (third party) installs with limited memory, we also added the
capability to cleanly run multiple services in a single process.
Yay. Should MediaWiki-Vagrant use this, or does it only benefit when you're running more -oids than just Parsoid?
I'm so excitoid
[6] https://github.com/foreverjs/forever [7] https://github.com/zapty/forever-service -- =S Page WMF Tech writer
On 23 February 2015 at 16:46, S Page spage@wikimedia.org wrote:
I'm so excitoid
S, you just made my day!
Dan
S,
On Mon, Feb 23, 2015 at 4:46 PM, S Page spage@wikimedia.org wrote:
I'm surprised there isn't something like this already in nodejs that you get "for free"
the closest in terms of feature set that I'm aware of is https://github.com/strongloop/strong-agent. It has slightly more features than service-runner even, and we would likely have used it a while ago if there wasn't this catch about it being commercial software that also requires a subscription.
when you use forever[6] to run a node command. Did you consider forever-service [7] ? It sounds similar:
I knew about forever, but not forever-service. Thanks for the link!
Both are solving a different problem: forever basically tries to be a replacement for daemon managers like start-stop-agent, while forever-service hooks it up with the actual init system. Both could potentially come in handy as part of a distribution solution on less common platforms, especially where there isn't a good init system. On Linux we now have a very modern init system with systemd though, and I think just using it makes a lot of sense.
While forever (like an init system or start-stop-daemon) is all about managing a subprocess, system-runner actually becomes part of the node service process. It parallelizes web services by managing workers all listening on the same socket, sets up non-blocking remote logging and automatically collects heap metrics in each worker it spawns. It is complementary to init systems or forever in many ways. For example, forever or init systems typically send a SIGTERM to ask for a graceful restart, while service-runner has a handler for SIGTERM that triggers the actual graceful restart by instructing its workers to stop accepting new requests & exiting after ongoing requests have finished.
It's _great_ that https://github.com/wikimedia/service-runner#see-also mentions similar packages, I made a pull request to add forever-service to the README though I didn't compare its features.
Thank you! I merged & tweaked it a little in a follow-up.
For small (third party) installs with limited memory, we also added the
capability to cleanly run multiple services in a single process.
Yay. Should MediaWiki-Vagrant use this, or does it only benefit when you're running more -oids than just Parsoid?
The memory saving comes from only starting one process (which parsoid already supports). Service-runner additionally lets us run multiple services in a single process, which can save memory over having one process per service. You lose isolation and parallelism of course, but that's an okay price to pay if you are set on running all those services in 100mb of RAM. Also, Parsoid does not use service-runner yet. First commit was only last weekend.
I'm so excitoid
Ha! And I thought we had just freed ourselves from the *oids ;)
Gabriel
Hi Gabriel,
you asked for my feedback, so here it is:
I think the general idea of creating such a standardized "service template" is good, and we probably would need to have something similar for other languages we use as well. I have taken a look at the code and I think the general direction you're taking (conventions, declarative config, etc) are all very good and sensible ideas. This kind of scaffolding/standardization is what I asked for at the dev summit, so it's all welcome - we'd be able to add features/plugins to existing services just by updating the service-runner.
The only critique I feel is needed here is to the whole idea of having a node process acting as a supervisor. Can we really resolve what is in general one delicate problem with 10 lines of javascript? I think there are better ways to spin up node processes and deliver HTTP requests to them - I think using this supervisor is perfectly correct and good in a dev environment or when prototyping your service, but I'd like to have the option of thinking at alternatives when we run it in production. So, is there a way to run a single worker without forking out? If not, I guess it would be easy to add this option ('run as a single worker on port XXX') to the service-runner.
Thanks a lot for working on this!
Cheers,
Giuseppe
P.S. Why can't I find this project on git.wikimedia.org or gerrit, and just on a closed-source, proprietary platfrom like github?
On Tue, Feb 24, 2015 at 7:02 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
S,
On Mon, Feb 23, 2015 at 4:46 PM, S Page spage@wikimedia.org wrote:
I'm surprised there isn't something like this already in nodejs that you get "for free"
the closest in terms of feature set that I'm aware of is https://github.com/strongloop/strong-agent. It has slightly more features than service-runner even, and we would likely have used it a while ago if there wasn't this catch about it being commercial software that also requires a subscription.
when you use forever[6] to run a node command. Did you consider forever-service [7] ? It sounds similar:
I knew about forever, but not forever-service. Thanks for the link!
Both are solving a different problem: forever basically tries to be a replacement for daemon managers like start-stop-agent, while forever-service hooks it up with the actual init system. Both could potentially come in handy as part of a distribution solution on less common platforms, especially where there isn't a good init system. On Linux we now have a very modern init system with systemd though, and I think just using it makes a lot of sense.
While forever (like an init system or start-stop-daemon) is all about managing a subprocess, system-runner actually becomes part of the node service process. It parallelizes web services by managing workers all listening on the same socket, sets up non-blocking remote logging and automatically collects heap metrics in each worker it spawns. It is complementary to init systems or forever in many ways. For example, forever or init systems typically send a SIGTERM to ask for a graceful restart, while service-runner has a handler for SIGTERM that triggers the actual graceful restart by instructing its workers to stop accepting new requests & exiting after ongoing requests have finished.
It's _great_ that https://github.com/wikimedia/service-runner#see-also mentions similar packages, I made a pull request to add forever-service to the README though I didn't compare its features.
Thank you! I merged & tweaked it a little in a follow-up.
For small (third party) installs with limited memory, we also added the
capability to cleanly run multiple services in a single process.
Yay. Should MediaWiki-Vagrant use this, or does it only benefit when you're running more -oids than just Parsoid?
The memory saving comes from only starting one process (which parsoid already supports). Service-runner additionally lets us run multiple services in a single process, which can save memory over having one process per service. You lose isolation and parallelism of course, but that's an okay price to pay if you are set on running all those services in 100mb of RAM. Also, Parsoid does not use service-runner yet. First commit was only last weekend.
I'm so excitoid
Ha! And I thought we had just freed ourselves from the *oids ;)
Gabriel _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I haven't looked into feature sets and/or requirements at all, but has anyone looked into PM2 ? https://github.com/Unitech/pm2
I know we use it internally at my company and that folks are reasonably happy with it (compared to the other stuff that is out there).
DJ
On Tue, Feb 24, 2015 at 9:02 AM, Giuseppe Lavagetto < glavagetto@wikimedia.org> wrote:
Hi Gabriel,
you asked for my feedback, so here it is:
I think the general idea of creating such a standardized "service template" is good, and we probably would need to have something similar for other languages we use as well. I have taken a look at the code and I think the general direction you're taking (conventions, declarative config, etc) are all very good and sensible ideas. This kind of scaffolding/standardization is what I asked for at the dev summit, so it's all welcome - we'd be able to add features/plugins to existing services just by updating the service-runner.
The only critique I feel is needed here is to the whole idea of having a node process acting as a supervisor. Can we really resolve what is in general one delicate problem with 10 lines of javascript? I think there are better ways to spin up node processes and deliver HTTP requests to them - I think using this supervisor is perfectly correct and good in a dev environment or when prototyping your service, but I'd like to have the option of thinking at alternatives when we run it in production. So, is there a way to run a single worker without forking out? If not, I guess it would be easy to add this option ('run as a single worker on port XXX') to the service-runner.
Thanks a lot for working on this!
Cheers,
Giuseppe
P.S. Why can't I find this project on git.wikimedia.org or gerrit, and just on a closed-source, proprietary platfrom like github?
On Tue, Feb 24, 2015 at 7:02 AM, Gabriel Wicke gwicke@wikimedia.org wrote:
S,
On Mon, Feb 23, 2015 at 4:46 PM, S Page spage@wikimedia.org wrote:
I'm surprised there isn't something like this already in nodejs that you get "for free"
the closest in terms of feature set that I'm aware of is https://github.com/strongloop/strong-agent. It has slightly more
features
than service-runner even, and we would likely have used it a while ago if there wasn't this catch about it being commercial software that also requires a subscription.
when you use forever[6] to run a node command. Did you consider forever-service [7] ? It sounds similar:
I knew about forever, but not forever-service. Thanks for the link!
Both are solving a different problem: forever basically tries to be a replacement for daemon managers like start-stop-agent, while forever-service hooks it up with the actual init system. Both could potentially come in handy as part of a distribution solution on less
common
platforms, especially where there isn't a good init system. On Linux we
now
have a very modern init system with systemd though, and I think just
using
it makes a lot of sense.
While forever (like an init system or start-stop-daemon) is all about managing a subprocess, system-runner actually becomes part of the node service process. It parallelizes web services by managing workers all listening on the same socket, sets up non-blocking remote logging and automatically collects heap metrics in each worker it spawns. It is complementary to init systems or forever in many ways. For example,
forever
or init systems typically send a SIGTERM to ask for a graceful restart, while service-runner has a handler for SIGTERM that triggers the actual graceful restart by instructing its workers to stop accepting new
requests
& exiting after ongoing requests have finished.
It's _great_ that https://github.com/wikimedia/service-runner#see-also mentions similar packages, I made a pull request to add forever-service
to
the README though I didn't compare its features.
Thank you! I merged & tweaked it a little in a follow-up.
For small (third party) installs with limited memory, we also added the
capability to cleanly run multiple services in a single process.
Yay. Should MediaWiki-Vagrant use this, or does it only benefit when
you're
running more -oids than just Parsoid?
The memory saving comes from only starting one process (which parsoid already supports). Service-runner additionally lets us run multiple services in a single process, which can save memory over having one
process
per service. You lose isolation and parallelism of course, but that's an okay price to pay if you are set on running all those services in 100mb
of
RAM. Also, Parsoid does not use service-runner yet. First commit was only last weekend.
I'm so excitoid
Ha! And I thought we had just freed ourselves from the *oids ;)
Gabriel _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
DJ,
On Tue, Feb 24, 2015 at 3:04 AM, Derk-Jan Hartman < d.j.hartman+wmf_ml@gmail.com> wrote:
I haven't looked into feature sets and/or requirements at all, but has anyone looked into PM2 ? https://github.com/Unitech/pm2
I know we use it internally at my company and that folks are reasonably happy with it (compared to the other stuff that is out there).
I looked at pm2 a week ago, and found it interesting (it's listed in the 'see also' section). It does offer clustering as well, so in that respect it's closer to service-runner. It also has startup scripts for different environments, similar to forever-service.
There are a few things I dislike about it:
- It tries to replace init for node services, and puts a lot of effort into the interactive UI. I'm not convinced that this is warranted or useful. I think services should normally be started and stopped just like any other system service, without the need to learn a tool that's specific to nodejs. It is also quite a bit larger than the 380 lines or so in service-runner, with most of that code spent on interactive things we don't necessarily want / need.
- Logging is using stdout and stderr, which are both blocking. This means that a full disk can bring down a service (happened before with Parsoid). We have since moved to structured JSON logging with logstash over UDP/gelf, and are careful not to block on logging or metrics.
- There seems to be no metrics reporting apart from the interactive shell. We want to systematically monitor services and encourage developers to further instrument their services internally, so this is important to us.
- Its heap limiting feature simply sets v8's old space limit, which means that processes will have high latency as they spend most of their time in GC when approaching the limit. We avoid the latency penalty by periodically monitoring v8's internal memory stats and gracefully restarting the worker if the limit has been breached several times in a row (and complain loudly about it).
Gabriel
Giuseppe,
thanks for having a look.
Regarding 10 lines of JS: The node cluster module http://nodejs.org/api/cluster.html is part of nodejs core and runs a bit longer than that. It's actually a fairly elegant way to implement prefork style servers with support for graceful restarts, sane signal handling etc without requiring changes in the individual services. It is also not specific to HTTP, but works with arbitrary socket servers.
On Tue, Feb 24, 2015 at 12:02 AM, Giuseppe Lavagetto < glavagetto@wikimedia.org> wrote:
So, is there a way to run a single worker without forking out? If not, I guess it would be easy to add this option ('run as a single worker on port XXX') to the service-runner.
Yes, you can either set num_workers to 0 in the config, or pass in -n 0 on the commandline. This is especially useful for small installs and development / profiling.
What you seem to be hinting at though is a preference for running each worker on a different port & then using iptables or LVS to distribute requests across the workers. This model can be supported with service-runner as well (with -n 0 or 1), but would involve a lot more moving parts and require solutions for coordinated graceful restarts. Which compelling benefit do you see in going down that route?
Gabriel
wikitech-l@lists.wikimedia.org