I saw in the patch for
https://phabricator.wikimedia.org/T266200 a strategy was devised to
base64-encode page prop values that aren't strictly UTF-8. If I understand
correctly, this means TemplateData extension code and page props interfaces
require no change while the JSONification of Parser Cache output proceeds.
Is that right? It's a clever solution.
Now, one thing I've been wondering about: might there be ways to query the
database component of Parser Cache with relatively fresh results at the
command line without deployer rights? And will it be possible, if not
encouraged, to drop stringified JSON into the Parser Cache values?
The page props table tends to be useful for content analysis for UX
interventions, and part of its usefulness has stemmed from being able to do
simple MySQL queries (when the payload is encoded for JSON and even if it
were compress()'d, it can also be trivial to use MySQL JSON built-ins). The
more, shall we say, creative, uses of page props I'm told aren't great for
scaling, but I'm wondering, how can we get some of the capabilities of
querying derived data via another straightforward SQL mechanism on a
replicated persistence store off the serving code path?
I hope those questions made sense! Maybe something exists already in Hadoop
or the replicas, but I couldn't quite figure it out. I do look forward to
other application layer and firehouse mechanisms in the works from
different teams, although am most interested right now in the content
analysis use case for some of our forthcoming Wikifunctions / Wikilambda
and Abstract Wikipedia work.
Thanks!
-Adam
On Fri, Nov 6, 2020 at 3:24 PM Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
I don't know enough about the parser cache to give
Daniel good advice on
his question:
That's another issue I wanted to raise:
Platform Engineeing is working on
switching ParserCache to JSON. For that, we have to make sure extensions
only put JSON-Serializable data into ParserOutput objects, via
setProperty() and setExtensionData(). We are currently trying to figure out
how to best do that for TemplateData.
TemplateData already uses JSON serialization, but then compresses the
JSON output, to make the data fit into the page_props table. This results
in binary data in ParserOutput, which we can't directly put into JSON.
There are several solutions under discussion, e.g.: [...(see Daniel's
original message for the list of ideas or propose your own)...]
But I see some people hiding in the back who might have some good ideas
:) This is just a bump to invite them to respond.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l