I saw in the patch for

https://phabricator.wikimedia.org/T266200 a strategy was devised to base64-encode page prop values that aren't strictly UTF-8. If I understand correctly, this means TemplateData extension code and page props interfaces require no change while the JSONification of Parser Cache output proceeds. Is that right? It's a clever solution.

Now, one thing I've been wondering about: might there be ways to query the database component of Parser Cache with relatively fresh results at the command line without deployer rights? And will it be possible, if not encouraged, to drop stringified JSON into the Parser Cache values?

The page props table tends to be useful for content analysis for UX interventions, and part of its usefulness has stemmed from being able to do simple MySQL queries (when the payload is encoded for JSON and even if it were compress()'d, it can also be trivial to use MySQL JSON built-ins). The more, shall we say, creative, uses of page props I'm told aren't great for scaling, but I'm wondering, how can we get some of the capabilities of querying derived data via another straightforward SQL mechanism on a replicated persistence store off the serving code path?

I hope those questions made sense! Maybe something exists already in Hadoop or the replicas, but I couldn't quite figure it out. I do look forward to other application layer and firehouse mechanisms in the works from different teams, although am most interested right now in the content analysis use case for some of our forthcoming Wikifunctions / Wikilambda and Abstract Wikipedia work.

Thanks!

-Adam

On Fri, Nov 6, 2020 at 3:24 PM Dan Andreescu <dandreescu@wikimedia.org> wrote:

I don't know enough about the parser cache to give Daniel good advice on his question:

That's another issue I wanted to raise: Platform Engineeing is working on switching ParserCache to JSON. For that, we have to make sure extensions only put JSON-Serializable data into ParserOutput objects, via setProperty() and setExtensionData(). We are currently trying to figure out how to best do that for TemplateData.

TemplateData already uses JSON serialization, but then compresses the JSON output, to make the data fit into the page_props table. This results in binary data in ParserOutput, which we can't directly put into JSON. There are several solutions under discussion, e.g.: [...(see Daniel's original message for the list of ideas or propose your own)...]
But I see some people hiding in the back who might have some good ideas :) This is just a bump to invite them to respond.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l