> It's not clear how the validation part works.
For us, we use the schema to validate the incoming data before accepting and persisting it. We have various disparate producers of data (client side javascript, internal PHP, other services and languages), and we need to ensure that the data we receive is consistent and useable in both loosely and strongly typed systems. Since we are using JSONSchema for event streams, we have a separate service (EventGate) that receives the events over HTTP and validates them before sending them downstream (to Kafka).
> Beyond that, what else do you do with JS? Is there some sort of code generation aspect to it, as with the proto compiler?
We also use the schema to do downstream integration between data stores. The JSONSchemas are used to create RDBMS tables into which we can parse and insert the JSON event data. We also use the JSONSchema to do language integration. If not code generation directly, some auto deserializers between a JSON event and Java (or whatever) objects. E.g. map from JSONSchema to Spark's schema format, or (one day?) from JSONSchema to Kafka Connect's schema format.
> Does a production data consumer validate every incoming JSON object it receives?
It could, but usually not. In our usage, we assume that data has been validated before it enters the system, so consumers can be sure the data is valid.
> The biggest advantage I can see to protos (outside of the immersive google infrastructure) is efficiency.
I'm not sure how PB works here, but an advantage of Avro was its schema evolution features. This makes it easier to allow consumers and producers to work with different versions of the same schema, without having to upgrade their code. We accomplish this by only allowing a very strict type of change to JSONSchemas: only optional field additions are allowed (no renames, no field removals, no type changes, etc.).
> Is mongo used inside of WMF?
Not that I know of. I could be wrong, but I think most application state for WMF services is either in MariaDB (MediaWiki uses this) or in Redis or Cassandra (which are really derivative caches of data canonically stored in MariaDB).