Hi!
I have opened a new task [1] to decide on setting (or not) an upgrade cadence for our Ceph cluster [2].
Your input is more than welcome on the task itself or on this email thread.
There's no deadline, but if there's not a lot of discussion this could be decided right after the holidays.
You can find this one and other ongoing proposals here [3].
Thanks!
[1] https://phabricator.wikimedia.org/T325223
[2] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Ceph
[3] https://phabricator.wikimedia.org/project/board/5263/
--
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
Hi!
As we have been working on gathering and defining user stories for the Toolforge Build Service and Toolforge itself, I
have been thinking about the next steps for both of them, and surroundings and I wanted to share them and have some
discussion to try to give a bit more direction to our work in those areas.
== Tl;Dr;
Let's think without constraints on what we want toolforge to become.
My opinion:
* Move towards full Platform as a Service
** this means users only interface with our platform
** this might mean offering k8s as a service on top of CloudVPS if needed
* Simple thin client
* Simple thin UI (for people that can't/don't want to use the client)
* API that supports both the above
== Long description
I think that this is somewhat a popular idea, but I would like to but I would like toolforge to be as easy to use as
digitalocean and heroku, that is, a PaaS platform.
This means:
* No need for ssh
* Very simple cli (from the user's computer)
* Simple web UI (same capabilities as the cli, for anyone that can't install the cli)
This also means:
* No k8s as a service (discussed later)
* Detaching the users from the underlying implementation
I know that this might require lots of changes, and those are not easy, but let's focus on the features we want, not the
design underneath yet.
What I would like is to have some set of "components" that I can use and combine to create my tool:
Storage:
* Store structured data somewhere (db)
* Store unstructured data somewhere (storage/file-like?/s3?)
Compute:
* Something that runs periodically (cron-like)
* Something that runs once (one-off)
* Something that runs continuously (daemon)
Network:
* Create a public entry point for a web service
* Connect between my components
So inspired by the digitalocean[1] and heroku[2] clis, the toolforge cli could just do:
* toolforge run
* toolforge run-once
* toolforge run-every
* toolforge db
* toolforge storage
* toolforge expose-port (--public|--local)
Some side-commands could be:
* toolforge tool -> to manage tools themselves, (create/add-maintainer/remove-maintainer/...)
* toolforge get-all -> to list all my components
* toolforge logs -> get the logs for a component
* toolforge shell -> start a shell inside a component container (similar to heroku bash), for debugging
* toolforge edit-config -> to allow to do the above trough some kind of structured spec
This is not an exhaustive list, but this should cover most of the usecases.
You might be asking now, what about people that needs some extra features from k8s?
For those, we can offer k8s as a service (through CloudVPS + terraform for example), so they have full control of their
k8s instances.
Note that I have tried to refrain myself from adding any implementation details yet, as I think that we should do the
exercise of thinking what we want without limiting ourselves on how we think it could be done.
The limitations will come later :)
== Some random stats for current k8s toolforge usage
Total number of namespaces:
3163
Of which, namespaces that are empty:
1496
That means that only 1667 have something, for those, number of k8s webservices:
1276
Number of grid webservices:
307
Number of tools with cronjobs:
71
Number of tools with >1 cronjob
47
Number of tools with >10 cronjob
6
Number of tools with manually defined resources:
51
Of which I checked a few, and could be sorted out with "continuous jobs", as in daemons, though I have not reviewed all
of them in detail.
[1] https://docs.digitalocean.com/reference/doctl/reference/apps
[2] https://devcenter.heroku.com/categories/command-line
--
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."