Great, Purodha, GerardM and Wikidatans,
I've gathered together some "Language Code" standardization sources, all potentially helpful for unfolding good design, here ...
Language Code
Ethnologue
(Ethnologue now uses ISO 639 codes)
ISO 639
(International Organization for Standardization)
ISO-639-3
(International Organization for Standardization)
ISO-639-6 (International Organization for Standardization)
(This aims to include any and all language variants and it is not that interested in using the political term what language has become).
Language Subtag Lookup
(A nice tool maintained by W3C corroborator Richard Ishida to look up current IANA defined language tags, and their constituents (subtags)).
I've also added these initially to some CC wiki WUaS "Language" pages (see below), which 7,106+ MIT OCW-centric wiki-school plans will allow for many more language additions with time.
As one Wikidata focus, probably already explored, it seems to make sense to engage the ISO 639 codes and standards, since ISO-639-3 and ISO-639-6 seem to address some of both of your concerns.
Does anyone know how ISO-639-6, for example, allows for, or encodes, invented, "dead," animal/species' communication (or even computer languages as "human languages")?
Cheers,
Scott