Theoretically multitext could be replaced, but I would not like to do that. A property like "Tagline" for a movie or motto for a country might make sense to be a multitext. Yes, you could make the tagline of a movie an item -- but do we really want to require it to be an intermediary item? The subtitle of a book? The ring name of a wrestler?
In one of the examples on the wikidata notes, the role played by an actor is already assumed to be an item. But I think the other examples are good cases where an item is not full convining. The only caveat might be that such examples are pretty rare. It clearly is a trade-off between structural simplicity (no extra type) and optimality in term of access and storage.
Monotext is irreplaceable, though, and it means a simple string without a language designation. Something like "Chemical symbol", I guess, would be a monotext, or ISO 3166 code. A intermediary item could not do the job in that case.
I think this would be xsd:String in the wikidata model (which has 3 String types). Monotext was defined to be with language designation. (Note: I have not fully understood the use cases for Monotext, see comment on wiki, perhaps they can be elaborated on the data model page and contrasted with language-neutral/zxx String and multilingual text).
With language-neutral/zxx string I mainly see the problem that as soon as you want to provide audio pronounciation, chemical symbols, the ISO codes become language dependent again.
So one may need either: String (lang:xzz) and nested within * Audio (lang:en) * Audio (lang:fr) * Audio (lang:it)
Or a multilingual String-Audio combination * String + Audio (lang:en) * String + Audio (lang:fr) etc.
At present neither seems optimal - I clearly don't have the solution
Gregor