Wikimedia, 

 

Hello. After receiving and listening to the feedback from our previous discussion, I have revised the Wikianswers proposal: https://meta.wikimedia.org/wiki/WikianswersI would like to also call your attention to its technical discussion section: https://meta.wikimedia.org/wiki/Wikianswers#Technical_discussion . A current version of this section is available below. 

Per the feedback, the revised proposal includes, in addition to an option for a sister project at a new domain, e.g., https://en.wikianswers.org , an option for integration into the search systems of Wikipedia, Wikidata, and Commons. With respect to this latter option, AI systems' (LLMs') responses to end-users' questions would still be URL-addressed, human-editable content, e.g.: https://en.wikipedia.org/qa/2b106ea8-4d1b-441f-9dc8-4555a9999ae9

Thank you for checking out the revised proposal and for any feedback. 


Technical discussion

Overview 

Relevant artificial intelligence topics include retrieval-augmented generation, retrieval-augmented generation with guardrails, and agent-based approaches. 

As presently considered, those parts of the question-and-answer data which could be human-editable include: (1) the template of the prompts, (2) the task, (3) the retrieved context data, (4) the questions, and (5) the answers. 

The template is the overall structure of the prompts to the LLM. It includes some natural language and slots where the other parts will be placed. This should be locked so as to be editable only by administrators. Editing this would invalidate every cached and unlocked answer, meaning that every unlocked answer would be updated, refreshed, or regenerated. 

The task is an instruction, e.g., "You are a helpful system which will answer the user's question using the following information". This should be locked so as to be editable only by administrators. Editing this would invalidate every dependent cached and unlocked answer, meaning that every unlocked answer would be updated, refreshed, or regenerated. 

The retrieved context data are chunks or excerpts, e.g., of Wikipedia articles, which enhance the answering of a particular question. Users could edit them, resulting in the cascading invalidations of dependent cached and unlocked answers. With respect to user experiences, editors might click on these displayed chunks or excerpts of content to navigate to them as they occurred in source pages and edit them there, these updates to the underlying pages resulting in updates to the chunks and dependent unlocked answers. 

The questions would be unusual to edit, except in the cases of typographical errors. 

The answers, abstractly, result from processing the other ingredients. These could be edited by humans but, as shown above, they could be subsequently revised by the system per cascading updates, refreshes, or regenerations. In some cases, editors might want to edit an answer and then to lock it from subsequent revisions by the system. 

In conclusion, as presently considered, users would ordinarily tend to want to edit the retrieved chunks of content drawn from Wikipedia pages, these chunks augmenting the prompts to the LLMs, the cascading of these page revisions updating dependent unlocked answers automatically. 

Database schemas  

Wikianswers database schemas would include one or more tables with vector columns for embedding vectors. A project goal, then, would be to efficiently combine into a database schema the existing concepts of revision tables, page tables, and text tables with the newer concepts of embedding vectors and vector databases. Relevant tools include pgvector, a database extension which provides open-source vector-similarity search to PostgreSQL.  

URL-addressability  

Instead of requiring a new domain, e.g., https://en.wikianswers.org/ , Wikianswers features could be integrated into the search systems of Wikipedia, Wikidata, and Commons. In this case, human-editable responses could still be URL-addressable, e.g.: https://en.wikipedia.org/qa/2b106ea8-4d1b-441f-9dc8-4555a9999ae9 .  

Datetime encoding  

Some questions have impermanent answers and others are volatile, meaning that their answers could vary each time that the question was asked. In these regards, date and time data could be encoded into URLs in a human-readable manner, e.g., https://en.wikipedia.org/qa/2023/09/21/21/29/00/2b106ea8-4d1b-441f-9dc8-4555a9999ae9 . Some questions and answers might involve different granularities of time. For example, a natural-language question "Which teams are in the Super Bowl?" might have a number of URLs, one for each year, e.g., https://en.wikipedia.org/qa/2022/40a7338d-fe75-4897-aee6-ec87141020a6 and https://en.wikipedia.org/qa/2021/40a7338d-fe75-4897-aee6-ec87141020a6 .  

User experience  

In the approach where Wikianswers features are integrated into Wikipedia, Wikidata, and Commons search, user experiences could utilize the existing text search boxes atop pages. Perhaps the "magnifying glass" icon in those search boxes could be accompanied by a "question mark" icon. One of these two icons would be selected, or activated, by end-users. Which such icon was activated would toggle between using the existing keyword-based content search and the described Wikianswers human-editable question-answering subsystem. Still under consideration is whether and how end-users could specify whether they desire for their question to have their current page, or selections thereof, as focal when responding to their question.  


 

Best regards, 

Adam Sobieski