In addition to all content being available verbatim versus all content being unavailable verbatim, developers might desire for some content to be available verbatim while having other content available only indirectly.
While AI systems can automatically determine which content to usefully store verbatim, if we desire for content authors to be able to provide hints, we could consider new HTML markup elements or some clever uses of
existing elements and attributes or schema.org Web schemas.
In these regards, consider the following example, where an HTML document author desires to hint that a topic sentence of a paragraph is desired to be quoted verbatim while the remainder of that paragraph is desired
only to be indirectly available. Perhaps the markup could resemble something like the following rough-draft sketch:
<p><span id="anchor123" role="quoteable">This is some text, a topic sentence.</span> This is a secondary sentence in the paragraph.</p>
This sketch shows some overlapping markup approaches. Perhaps all elements with IDs, URL-addressable content, should be considered to be verbatim quotable. Or, perhaps some HTML attribute, e.g., role, could be of use. Again, schema.org
Web schemas could also be of use.
Also, I hope that you find interesting the following discussion thread:
https://github.com/microsoft/semantic-kernel/discussions/108 about
Educational Applications of AI in Web Browsers. There, I ask some questions about modern LLMs and APIs, about
referring to documents by URLs in prompts, about prioritizing some documents for utilization over others when answering questions, and so forth. A “Web browser Copilot” would have educational applications. It could allow students to ask questions pertinent
to the specific HTML, PDF, and EPUB documents that they are browsing and, perhaps, AI components could navigate to pages, scroll to content, and highlight document content for end-users while responding.
Best regards,
Adam Sobieski