Otherwise i'm sure we can do as suggested earlier and pull the data from hive directly and stuff into a temporary structure we can query while building the completion indices.
Do you think that temporary structure might be useful to others? If so, we could add that as a data source, and add an endpoint to query it. Either way, happy to help with the query / temp structure.