Otherwise i'm sure we can do as suggested earlier and pull the data from hive directly and stuff into a temporary structure we can query while building the completion indices.

Do you think that temporary structure might be useful to others?  If so, we could add that as a data source, and add an endpoint to query it.  Either way, happy to help with the query / temp structure.