This is so wrong. The article is right, but it’s explaining things wrong [in way that can be misinterpreted] - and almost solely just by that one word (“lie”) and what it’s conveying. AIs are not minds that think, reason, nor do they have a self or consciousness. And that is why AIs do not have intent. So, they can not “lie” because that implies an intent to deceive, which AIs can not do [only the system programmer may have added some of their own intent to the AI model, but well get to that later]. The correct term would be the colloquial “hallucinate” (which is taken from human psychology but has become to describe a very different phenomena that only superficially resembles it, so not the best of terms unless the context is clear) where AIs give false or incorrect statements in regards to the inquiry, while the statement as such may be seemingly coherent and logical. This is all because the GPT/LLM type AI models are statistics based answering machines and those formulate statements based on huge databases of all kinds of (text) data where the amount has been more important than quality (and even if it wasn’t the the sheer diversity of texts means that there are arguments from various viewpoints, synonyms, homonyms, translation incompatibilities etc. which the algorithms are not that good at recognizing) based on the likelyhood of what a sentence connected to the words in the inquiry should have (word by word). So, it’s natural for those machines to spew out statistically anything, except the algos are now so good that the answers are very often right enough or close enough to what we need.
The reported test is interesting in that how the different models comparare and have developed, but the main point to notice is that one of the prime methods of AI learning was intentionally broken by limiting the use of “I don’t know”. Btw. being able to get AIs to reliably say “I don’t know” (or something similar) is a huge thing, a very good result, as the statistical limits and error correction methods are able to draw a line where statistical uncertainty is an issue and statement would probably be false (kinda like guesses that uneducated humans make). So…
I would. Everyone should. It would be amazing. Because that would mean that half the time you get almost certainty and good answers that you can trust. That - being able to fully trust the output - is more important atm., or so I argue (there may be some applications where any output is more desirable - consider generation of fantastical images, which are not true or possible according to physics etc.).
Coming back to that test, it’s fascinating that the algos took this reinforcement towards this route. It’s very logical though. AIs are simply applying the programming of trying to do better but, just as it has not capacity to understand, it has no capacity to discern right from wrong or other moral questions related to intent, and so AIs did what produced acceptable feedback the simplest way. The comparison to human behavior in this is apt, at abstract level. But this is the programmers doing. They are the ones that have created the algorithm and - this has to be stressed - it’s unlikely that at base model level there would be any intent to add a “lie about these things” feature, just because it would be so hard to include it and make the whole work (analogy: think how hard it is for humans to keep up a convincing lie about one area of life that connects to all others, while constantly being questioned and prodded). In research (and I apologize not including links, I don’t have them at hand now) it has be shown that in complex systems all the human biases and flaws and cultural ideals can be transferred from coders and it’s unintentional and hard to spot (example: in facial recognition, which features are considered prominent or desirable, or how language structure is processed based on what is your mother tongue and understanding of different languages).
So, AIs (as in: AI models) do not “lie” but make mistakes because of imperfect processing of what is wanted, influenced by these unknown complex statistical biases in algorithms and due to the less than perfect (history based [does not know new things]) data. But as can be seen, those are some pretty good systems, since they are able to correct themselves by algorithmic learnings like reinforcement by feedback.
But, there is another level to systems - which is probably more interesting, if you want to pinpoint where the dragons may lie. The modern AI system consists of the model but also the rest of the system, which has many separate parts dedicated to risk management of inputs and outputs and system security etc. For instance, the Copilot dashboard has several simplistic sliders that allow admin to deploy AI and select some of the characteristics, in addition to being able to define a “personality” via text prompt. These sliders and text are interpreted in the system and connect a whole bunch of subsystems and algorithms (which are not open code). In addition to these, there are some restrictions that are not admin selectable but are more or less hardcoded (changeable by system provider only, MS etc.). Although a bit specific, allowing user to bypass “I don’t know” is something modern large systems from the big companies offered to public use is a feature that would/should not be uncontrolled, but that’s a separate issue [forcing the level high would potentially make AIs more worth trusting in long term IMHO].
Anyways, coming to the more important point after the long setup: at this system level, there are separate controls and among those it could theoretically (because no evidence has been presented and there are actually some cases that have shown opposite) be controls that system programmers could use to make AI do things like give outputs that intentionally are not what the AI would might spew out. These are already used to curtail swearing and avoiding harmful topics (like self harm). At system level there have for some time been filters for certain content but those have been deemed acceptable and good, but those have nothing to do with AIs as such. [There is more censoring with public models because some users are just there to brake things or be lewd but for instance in internal/private medical applications obviously there is a need to use anatomical references, so limits are different.]
So, when saying “AIs lie”, I see that silly, as AIs make unintentional errors that are not errors because it’s what they were coded to do, and the final output is anyway controlled by someone else. @TiX0 conclusion that we should not trust our displays, is mostly correct, in that we should always have healthy skepticism online (regardless of AI or not, just to expand on it), but understanding why that is also important.
The article is a bit misleading in the choice of wording and about the point of such study and the selected passages of previous post reinforce those. A whole separate argument could also be made about how identifying a false statement and a lie differ and how differently we interpret information when communicating face-to-face (all the micro signals we read form people when they speak/lie), which are not present with AI text prompt output. And there’s something to wonder about just how well these test subjects understood the areas where they were “lied” to (as they were supposed to spot the falsehoods) - the research even makes a mention of this limitation. The original research paper is more specific. It more or less makes a point about how large AI systems kinda try too hard to answer something, which gets them into trouble. The whole final conclusion in it is about how the level, when to say “I don’t know”, should be optimized. What is forgotten though, is that for many applications these GPT/LLM type AIs and the language/text based statistical answers should not be used at all, even though they are popular at the moment. There are other AI types that may be more suitable to the problem and task.