Well, if we need to have better support for multimedia, first we need to give some attention to the existing system that is basically falling apart. Let me give you some examples.
Thumbor, the software that builds small sizes of the images is on deprecated infrastructure, on EOL python version (python2), uses an extremely old fork of the upstream and does not have an owner. And this is a pretty critical software, if it goes down, virtually no image can be shown in all of Wikipedia (including all SVG files). Because of that, we can't move it to a newer infrastructure (kubernetes), make it use a more modern python version or upstream code, to make it use a more modern version of svg converter to fix countless svg bugs the current system has . It in itself is blocking adding more features on all of Wikipedia. For example, as a certified science nerd, I want to add support for chemical markup files (.bxr, etc.) that would enrich our chemistry articles  but well, it's blocked on thumbor being unmaintained.
The old video player, kultura, is still in production and used quite heavily. The replacement media player exists but has some bugs that are rather easy to fix and unblock further rolling out. But because no one is on this task, it's basically a group of volunteers (including yours truly) struggling to find the time to work on it. . It would give a slightly more modern look to our media player.
This is mostly fixed but worth mentioning, the image table in commons was bigger than 300GB compressed (and 600GB uncompressed), it would take 15 hours to take a simple backup and basically a ticking bomb given how heavily it is used. Commons went readonly and caused a big outage so technically it was a bomb that exploded already once. The problem was metadata of pdf files and djvu files were massive, the pdf files got fixed by Tim Starling and I (I did it in my volunteer capacity) which in turn reduced 200GB from it. And now we are working on fixing djvu.  Again in volunteer capacity. This work is actually blocking redesign of the image table to make it more useful  or practically any change that would impact size of tables in commons.
The problems have passed the point of blocking improvements and adding more features, they are reaching the point of actually bringing down our systems and bleeding to the rest of our systems. And it all boils down to not having a dedicated team on multimedia but in all fairness, it's not something you can fix overnight. You need to grow, hire, plan, etc. etc.