Cloud-admin December 2022

cloud-admin@lists.wikimedia.org

2 participants
2 discussions

Decision request - ceph upgrade cadence
by David Caro 17 Dec '22

17 Dec '22

Hi! I have opened a new task [1] to decide on setting (or not) an upgrade cadence for our Ceph cluster [2]. Your input is more than welcome on the task itself or on this email thread. There's no deadline, but if there's not a lot of discussion this could be decided right after the holidays. You can find this one and other ongoing proposals here [3]. Thanks! [1] https://phabricator.wikimedia.org/T325223 [2] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Ceph [3] https://phabricator.wikimedia.org/project/board/5263/ -- David Caro SRE - Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3 "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment."

1 1

About making toolforge a platform
by David Caro 13 Dec '22

13 Dec '22

Hi! As we have been working on gathering and defining user stories for the Toolforge Build Service and Toolforge itself, I have been thinking about the next steps for both of them, and surroundings and I wanted to share them and have some discussion to try to give a bit more direction to our work in those areas. == Tl;Dr; Let's think without constraints on what we want toolforge to become. My opinion: * Move towards full Platform as a Service ** this means users only interface with our platform ** this might mean offering k8s as a service on top of CloudVPS if needed * Simple thin client * Simple thin UI (for people that can't/don't want to use the client) * API that supports both the above == Long description I think that this is somewhat a popular idea, but I would like to but I would like toolforge to be as easy to use as digitalocean and heroku, that is, a PaaS platform. This means: * No need for ssh * Very simple cli (from the user's computer) * Simple web UI (same capabilities as the cli, for anyone that can't install the cli) This also means: * No k8s as a service (discussed later) * Detaching the users from the underlying implementation I know that this might require lots of changes, and those are not easy, but let's focus on the features we want, not the design underneath yet. What I would like is to have some set of "components" that I can use and combine to create my tool: Storage: * Store structured data somewhere (db) * Store unstructured data somewhere (storage/file-like?/s3?) Compute: * Something that runs periodically (cron-like) * Something that runs once (one-off) * Something that runs continuously (daemon) Network: * Create a public entry point for a web service * Connect between my components So inspired by the digitalocean[1] and heroku[2] clis, the toolforge cli could just do: * toolforge run * toolforge run-once * toolforge run-every * toolforge db * toolforge storage * toolforge expose-port (--public|--local) Some side-commands could be: * toolforge tool -> to manage tools themselves, (create/add-maintainer/remove-maintainer/...) * toolforge get-all -> to list all my components * toolforge logs -> get the logs for a component * toolforge shell -> start a shell inside a component container (similar to heroku bash), for debugging * toolforge edit-config -> to allow to do the above trough some kind of structured spec This is not an exhaustive list, but this should cover most of the usecases. You might be asking now, what about people that needs some extra features from k8s? For those, we can offer k8s as a service (through CloudVPS + terraform for example), so they have full control of their k8s instances. Note that I have tried to refrain myself from adding any implementation details yet, as I think that we should do the exercise of thinking what we want without limiting ourselves on how we think it could be done. The limitations will come later :) == Some random stats for current k8s toolforge usage Total number of namespaces: 3163 Of which, namespaces that are empty: 1496 That means that only 1667 have something, for those, number of k8s webservices: 1276 Number of grid webservices: 307 Number of tools with cronjobs: 71 Number of tools with >1 cronjob 47 Number of tools with >10 cronjob 6 Number of tools with manually defined resources: 51 Of which I checked a few, and could be sorted out with "continuous jobs", as in daemons, though I have not reviewed all of them in detail. [1] https://docs.digitalocean.com/reference/doctl/reference/apps [2] https://devcenter.heroku.com/categories/command-line -- David Caro SRE - Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3 "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment."

2 2

2024

2023

2022

2021

2020

2019

2018

2017

Cloud-admin December 2022