Repeating exactly what has been in the training data is not overfitting. Overfitting is when the model fails to recognize the underlying pattern in the training data leading to inaccurate or false results when used on new data. Getting the exact same prediction from the training data set is actually how the models train (to minimize the difference). Supervised ML in its core is a regression function,

Andrew Ng examples a model that predicts the price of a house based on number of rooms and such. Overfitting is when a completely new house results in a wildly inaccurate price range. Coming with the exact price as an existing house in the training set is not overfitting nor a bug.

Speaking as someone who built many ML models for Wikimedia.

Am Do., 23. März 2023 um 11:23 Uhr schrieb Kimmo Virtanen <kimmo.virtanen@wikimedia.fi>:
Hi,

I just noticed that OpenAI has fixed the Wikidata property and item mappings so now it can generate working SPARQL. Example:


Prompt : Search finnish female journaists using SPARQL from Wikidata?

GPT-3.5 (default)
To search for Finnish female journalists using SPARQL from Wikidata, you can use the following query:
SELECT ?journalistLabel
WHERE
{
  ?journalist wdt:P106 wd:Q1930187 .
  ?journalist wdt:P27 wd:Q33 .
  ?journalist wdt:P21 wd:Q6581072 .
  ?journalist wdt:P569 ?birthdate .
  ?journalist wdt:P21 ?gender .
  FILTER(YEAR(?birthdate) < 2005) .
  FILTER(?gender = wd:Q6581072) .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
}
This query will retrieve the labels (names) of all female journalists who were born before the year 2005, are Finnish citizens, and have a profession (P106) of journalist (Q1930187). You can execute this query on the Wikidata Query Service (https://query.wikidata.org/) by pasting the query into the query editor and clicking the "Run" button. The results will be displayed in a table format.

GPT-4

To search for Finnish female journalists using SPARQL on Wikidata, you can use the following query:
 
SELECT ?journalist ?journalistLabel WHERE {
  ?journalist wdt:P31 wd:Q5;         # instance of human
              wdt:P106 wd:Q1930187;  # occupation: journalist
              wdt:P21 wd:Q6581072;   # gender: female
              wdt:P27 wd:Q33.        # country of citizenship: Finland
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fi,en". }
}
This query searches for entities that are instances of human (Q5), have an occupation as a journalist (Q1930187), are female (Q6581072), and have Finnish citizenship (Q33). The query results will include the journalist's Wikidata ID and their label (name) in Finnish or English. You can run this query using the Wikidata Query Service (https://query.wikidata.org/).

Br,
-- Kimmo Virtanen, Zache 

On Thu, Mar 23, 2023 at 8:07 AM Erik Moeller <eloquence@gmail.com> wrote:
On Wed, Mar 22, 2023 at 11:53 AM Lauren Worden <laurenworden89@gmail.com> wrote:
> BARD also produces lengthy passages from its training data verbatim
> without elicitation:
> https://old.reddit.com/r/Bard/comments/11xxaxj/bard_copied_user_text_from_a_forum_word_for_word/jd58764/

Very true. I tested the "Mr. Ripley" example with Bard, and it
instantly reproduced the exact language of the Wikipedia article's
lede section; only the remainder of the generation was substantially
different. [1]

Warmly,
Erik

[1] https://social.coop/@eloquence/110071078132245483
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/V55NYC4RY5O53TABK4O625AJ4HYAAFJA/
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IZVYC6IYGCKCAH5AEJBQNEFMGSVKPR5B/
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org


--
Amir (he/him)