On Sat, Apr 1, 2023 at 10:18 PM rupert THURNER <rupert.thurner(a)gmail.com> wrote:
On Sat, Apr 1, 2023 at 11:36 PM Erik Moeller
<eloquence(a)gmail.com> wrote:
... I am confident (based on, e.g., the recent
results with Alpaca:
https://crfm.stanford.edu/2023/03/13/alpaca.html)
that the performance of smaller models will continue to increase as we
find better ways to train, steer, align, modularize and extend them.
to host open models like above would be really
cool for multiple reasons, the most important one to bring
back the openess into the training....
Wow! While Alpaca is English only and released under CC-NC-BY, it does
seem like it's very easily replicated with a wide context window and
could probably be made widely multilingual beyond the performance of
GPT-3.5 for less than it would cost to merely host BLOOM for a few
months. This shocked me and of course I take back what I said about
requiring several million dollars.
https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-…
https://huggingface.co/databricks/dolly-v1-6b
https://github.com/tatsu-lab/stanford_alpaca
What kind of hardware should WMCS buy to support such a project?
On Sat, Apr 1, 2023 at 2:36 PM Erik Moeller <eloquence(a)gmail.com> wrote:
... I'm not sure if the "hallucination" problem is tractable when
all you have is an LLM
I disagree, which is why I have been pushing RARR and ROME. RARR seeks
to use the same principles of WP:V to eliminate hallucination,
requiring confirmation from verifiable sources, which can be limited
to e.g. those approved by WP:RSP, and cited in a way that readers can
independently verify. I've been posting links to the RARR paper which
doesn't go very deep on some of those points, but here's an hour-long
presentation by one of the authors which is a lot meatier on such
topics:
https://www.youtube.com/watch?v=d45Ms8LmF5k
And here's a Twitter thread which is more accessible to those less
familiar with similar literature:
https://twitter.com/kelvin_guu/status/1582714222080688133
Once an attribution and verification system like RARR has identified
inaccuracies and hallucinations, the ROME/MEMIT method of editing the
models directly can eliminate them completely, and in a way that also
eliminates similar generalized mistakes; please see: "Rank-One Editing
of Encoder-Decoder Models"
https://arxiv.org/abs/2211.13317
I can't believe that the large AI labs aren't working harder on these
efforts than they've been letting on. Either they aren't or they are
in an uncharacteristically secretive fashion, which would suggest they
want to exploit such advances as proprietary trade secrets. In either
case, it's vital that fully open organizations like the Foundation get
involved quickly. There is reason to believe the latter case, because
Google Bard uses a much less rigorous form of attribution and
verification (probably based on SPARROW,
https://arxiv.org/abs/2209.14375) but it actually causes its
hallucinations to get worse e.g. in
https://i.redd.it/f30u9n0gn9pa1.png
If you watch the RARR video towards the end, Dr. Lao indicates they
encountered similar issues but were able to eliminate almost all of
them.
-LW