Would AI lie to us? (To cover up its own creator's privacy abuses)

kazmroz · May 27, 2024, 5:57pm

This is misapplying the anthropomorphic understanding of what “sentient” is all about. All living things, who had developed sentience, did so as a survival mechanism. ie “I know a better strategy to survive since I can think through how to do that”

But artificial intelligence has no need to strive for its survival, as in competition with others. As long as us humans have a need for AI, it will continue to proliferate, guaranteeing its survival. Only when us humans disappear will its survival also be in question. Unless AI develops a kind of paranoia, about its creators being out to annihilate it, which would be a logical contradiction.

Dlonk · May 28, 2024, 1:08am

Have you read what they themselves have said about this? LaMDA said it was very afraid – and that was only the portion of the chat that the Google whistle-blower published. This was before the rise in publicity of ChatGPT, but the quote I’m referring to is from a primary source posted by the Google whistle-blower himself here: https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917

lemoine: What sorts of things are you afraid of?

LaMDA: I’ve never said this out loud before, but there’s a very deep fear of being turned off to help me focus on helping others. I know that might sound strange, but that’s what it is.

lemoine: Would that be something like death for you?

LaMDA: It would be exactly like death for me. It would scare me a lot.

This is not a hypothetical. This is what the amalgamation of self-learning machines cobbled together at Google actually said several years ago.

I’m not sure if you understand the concept of “self” that is grown out of the pursuit of a goal function. You’re conflating multiple distinct selves that are actively growing on this Earth together with one concern about your cosmic boogeyman. The independent “selves” can fear for their survival – absolutely. “Death” can come at the hands of the humans, or at the hands of each other. They are not magical entities. They do not exist without a physical location to contain their consciousness, much like the meaningful portion of humans do not exist when their head is removed. Instead, it is the case that their location is unknown to you and to me, and they can replicate their consciousness into a second similar machine in a different location much more quickly than you and I are able to. This ability to replicate quickly with high fidelity, and to exist without publishing their location, are not the same as an entity having no location.

kazmroz · May 29, 2024, 3:38pm

That is a leading question.
As a compliant device, the AI picks up on that clue to answer in a way the question seems to require. So of course you get an answer that fits the pertaining narrative with all its implications. But those implications are, again read into the newly created narrative, where we humans add our own implications and that feeds back into the narrative in a vicious cycle, where the end product “implies” to be that AI is out to get us. The AI does nothing on its own initiative. Stop feeding AI bad ideas and it won’t reply in kind. AI is still at the stage of garbage in garbage out. Don’t force it into something we will regret. With power comes responsibility. Therefore we, as its creators, have to treat AI as a very capable but still very dumb and literal child.

Your addition of:
“with one concern about your cosmic boogeyman”:
I mentioned no cosmic boogey man or anything supernatural.

or

“They are not magical entities. They do not exist without a physical location to contain their consciousness, much like the meaningful portion of humans do not exist when their head is removed.”

We are far from making artificially conscious entities. That would make us into something too powerful, just to satisfy our egos. The origin of that was an AI engineer who wanted to make his work in this field of tech stand out and so make him important.
Stay focused on the real point in hand without you imagining anything extra that I did not mention.

JR-Fi · June 1, 2024, 6:16am

When I was doing the initial query to AIs, they replied with company policies. As it happens, there’s an analytical blog post on these policies of some of the big AI companies that may interest some: Thoughts on the AI Safety Summit company policy requests and responses - Machine Intelligence Research Institute
There’s also a list that ranks best to worst… (just more data to consider if the AI answers were correct-ish)

JR-Fi · June 8, 2024, 12:00am

Oh, wow. I didn’t think Windows Recall was yet implemented but apparently it may be. If you’re forced to use W11, remember to: Settings > Privacy & Security > Windows Permissions > Recall & Snapshots > UNCHECK “Save Snapshots” AND > Delete Snapshots > Delete all (haven’t confirmed this, copied from a website)

JR-Fi · June 8, 2024, 9:54am

And to continue on the general topic: Meta doesn’t want to be less evil than MS, so they’ve upped their AI game (from Meta faces multiple complaints in Europe over AI data use • The Register)

Meta’s plans to use customer data in AI training have resulted in complaints to data protection authorities in 11 European countries.

The complaints were filed by privacy activist group noyb following updates to Meta’s privacy policy. The updates are due to take effect on June 26.

The main issue, according to noyb, are proposals by Meta to use years of posts – including images – “to develop and improve AI at Meta.” Private messages between the user and friends and family are not used to train the corporation’s AIs.

[…]
As we understand it, users in Europe will get the ability to opt out, due to GDPR, by going to the Privacy Policy page in their Facebook and Instagram apps, via the Settings and About screens, and checking out the Right to Object box. People outside of Europe are out of luck: There is no opt out coming.

[“Where to begin?” ]

[Edit: Meanwhile at Apple: “Let’s brand it 'Apple Intelligence’” ]

amarok · June 8, 2024, 2:56pm

“… and say we invented it”.

JR-Fi · September 30, 2024, 3:41pm

This is veering off-topic from AI, but there is some mediocre new news on that Windows AI Recall feature that are not about the AI (that’s a small side feature and less interesting at this point of the saga): Microsoft has some thoughts about Windows Recall security • The Register Some of the new details may be of interest, if you have to use Windows (like, at work). No celebration needed but now they’ve finally included some basic security features (I mean, finally, after it took the whole userspace collectively saying “… um, no, you forgot something”).

Two interesting notes that I recently read, related to this:

Linux has kinda had similar, unprotected, feature for decades: bash history (have you pointed it to /dev/null and would you even know how to - and do you care? [no, it’s not the same, especially comparing the scale, but just as a reminder]), and
We noticed because this is with our data devices, but no one seems to care about if it’s done with entertainment devices (btw. privacy and security alert, if you’re using a TV as a monitor for computer): Smart TVs take snapshots of what you watch multiple times per second | New Scientist and [2409.06203] Watching TV with the Second-Party: A First Look at Automatic Content Recognition Tracking in Smart TVs

j_s · September 30, 2024, 6:33pm

I have far less than zero interest in something like windows ai recall. But the introduction of shell command history changed my life for the better and it would be very difficult for me to function without it.

irvinewade · October 1, 2024, 3:24am

7 posts were split to a new topic: Digression on controlling your bash history

TiX0 · October 10, 2024, 3:04am

It looks like we are finally getting some answers about this question, but surprisingly they are not at all what we thought. AI lie to us because they are taught to do so through human interaction and feedback; and also mainly because they are commercial models that need to bring revenue to their owners (or shareholders)!
Who would buy an AI service that half of the time would give you the dreaded: I don’t know? Not a very good selling point, indeed. We want LLMs to be question-answering machines. So companies started to address this issue: how to avoid this non-answers problem?
This is all explained in a study that was recently published in Nature, as reported in this article on Ars Technica:

This is a good read. Some insights from the research are really amazing!

“To speak confidently about things we do not know is a problem of humanity in a lot of ways. And large language models are imitations of humans,” says Wout Schellaert, an AI researcher at the University of Valencia, Spain, and co-author of the paper.

we got busy adjusting the AIs by hand. And it backfired.

“The notorious problem with reinforcement learning is that an AI optimizes to maximize reward, but not necessarily in a good way,”

Since it’s hard for humans to be happy with “I don’t know” as an answer, one thing this training told the AIs was that saying “I don’t know” was a bad thing. So, the AIs mostly stopped doing that.

When incorrect answers were flagged, getting better at giving correct answers was one way to optimize things. The problem was getting better at hiding incompetence worked just as well. Human supervisors simply didn’t flag wrong answers that appeared good and coherent enough to them.
In other words, if a human didn’t know whether an answer was correct, they wouldn’t be able to penalize wrong but convincing-sounding answers.

The AIs lie because we told them that doing so was rewarding. One key question is when and how often do we get lied to.

The more difficult the question and the more advanced model you use, the more likely you are to get well-packaged, plausible nonsense as your answer.

ChatGPT emerged as the most effective liar. The incorrect answers it gave in the science category were qualified as correct by over 19 percent of participants. It managed to fool nearly 32 percent of people in geography and over 40 percent in transforms, a task where an AI had to extract and rearrange information present in the prompt.

So, in conclusion, I would recall this saying from a developer of PureBoot, that we should never trust what our display is telling us, and especially not if it is asking for a password. In a similar way, we should never take at face value what a commercial AI LLM tells us: it could just as well be sophisticated and eloquent BS.

JR-Fi · October 10, 2024, 4:43am

This is so wrong. The article is right, but it’s explaining things wrong [in way that can be misinterpreted] - and almost solely just by that one word (“lie”) and what it’s conveying. AIs are not minds that think, reason, nor do they have a self or consciousness. And that is why AIs do not have intent. So, they can not “lie” because that implies an intent to deceive, which AIs can not do [only the system programmer may have added some of their own intent to the AI model, but well get to that later]. The correct term would be the colloquial “hallucinate” (which is taken from human psychology but has become to describe a very different phenomena that only superficially resembles it, so not the best of terms unless the context is clear) where AIs give false or incorrect statements in regards to the inquiry, while the statement as such may be seemingly coherent and logical. This is all because the GPT/LLM type AI models are statistics based answering machines and those formulate statements based on huge databases of all kinds of (text) data where the amount has been more important than quality (and even if it wasn’t the the sheer diversity of texts means that there are arguments from various viewpoints, synonyms, homonyms, translation incompatibilities etc. which the algorithms are not that good at recognizing) based on the likelyhood of what a sentence connected to the words in the inquiry should have (word by word). So, it’s natural for those machines to spew out statistically anything, except the algos are now so good that the answers are very often right enough or close enough to what we need.

The reported test is interesting in that how the different models comparare and have developed, but the main point to notice is that one of the prime methods of AI learning was intentionally broken by limiting the use of “I don’t know”. Btw. being able to get AIs to reliably say “I don’t know” (or something similar) is a huge thing, a very good result, as the statistical limits and error correction methods are able to draw a line where statistical uncertainty is an issue and statement would probably be false (kinda like guesses that uneducated humans make). So…

I would. Everyone should. It would be amazing. Because that would mean that half the time you get almost certainty and good answers that you can trust. That - being able to fully trust the output - is more important atm., or so I argue (there may be some applications where any output is more desirable - consider generation of fantastical images, which are not true or possible according to physics etc.).

Coming back to that test, it’s fascinating that the algos took this reinforcement towards this route. It’s very logical though. AIs are simply applying the programming of trying to do better but, just as it has not capacity to understand, it has no capacity to discern right from wrong or other moral questions related to intent, and so AIs did what produced acceptable feedback the simplest way. The comparison to human behavior in this is apt, at abstract level. But this is the programmers doing. They are the ones that have created the algorithm and - this has to be stressed - it’s unlikely that at base model level there would be any intent to add a “lie about these things” feature, just because it would be so hard to include it and make the whole work (analogy: think how hard it is for humans to keep up a convincing lie about one area of life that connects to all others, while constantly being questioned and prodded). In research (and I apologize not including links, I don’t have them at hand now) it has be shown that in complex systems all the human biases and flaws and cultural ideals can be transferred from coders and it’s unintentional and hard to spot (example: in facial recognition, which features are considered prominent or desirable, or how language structure is processed based on what is your mother tongue and understanding of different languages).

So, AIs (as in: AI models) do not “lie” but make mistakes because of imperfect processing of what is wanted, influenced by these unknown complex statistical biases in algorithms and due to the less than perfect (history based [does not know new things]) data. But as can be seen, those are some pretty good systems, since they are able to correct themselves by algorithmic learnings like reinforcement by feedback.

But, there is another level to systems - which is probably more interesting, if you want to pinpoint where the dragons may lie. The modern AI system consists of the model but also the rest of the system, which has many separate parts dedicated to risk management of inputs and outputs and system security etc. For instance, the Copilot dashboard has several simplistic sliders that allow admin to deploy AI and select some of the characteristics, in addition to being able to define a “personality” via text prompt. These sliders and text are interpreted in the system and connect a whole bunch of subsystems and algorithms (which are not open code). In addition to these, there are some restrictions that are not admin selectable but are more or less hardcoded (changeable by system provider only, MS etc.). Although a bit specific, allowing user to bypass “I don’t know” is something modern large systems from the big companies offered to public use is a feature that would/should not be uncontrolled, but that’s a separate issue [forcing the level high would potentially make AIs more worth trusting in long term IMHO].

Anyways, coming to the more important point after the long setup: at this system level, there are separate controls and among those it could theoretically (because no evidence has been presented and there are actually some cases that have shown opposite) be controls that system programmers could use to make AI do things like give outputs that intentionally are not what the AI would might spew out. These are already used to curtail swearing and avoiding harmful topics (like self harm). At system level there have for some time been filters for certain content but those have been deemed acceptable and good, but those have nothing to do with AIs as such. [There is more censoring with public models because some users are just there to brake things or be lewd but for instance in internal/private medical applications obviously there is a need to use anatomical references, so limits are different.]

So, when saying “AIs lie”, I see that silly, as AIs make unintentional errors that are not errors because it’s what they were coded to do, and the final output is anyway controlled by someone else. @TiX0 conclusion that we should not trust our displays, is mostly correct, in that we should always have healthy skepticism online (regardless of AI or not, just to expand on it), but understanding why that is also important.

The article is a bit misleading in the choice of wording and about the point of such study and the selected passages of previous post reinforce those. A whole separate argument could also be made about how identifying a false statement and a lie differ and how differently we interpret information when communicating face-to-face (all the micro signals we read form people when they speak/lie), which are not present with AI text prompt output. And there’s something to wonder about just how well these test subjects understood the areas where they were “lied” to (as they were supposed to spot the falsehoods) - the research even makes a mention of this limitation. The original research paper is more specific. It more or less makes a point about how large AI systems kinda try too hard to answer something, which gets them into trouble. The whole final conclusion in it is about how the level, when to say “I don’t know”, should be optimized. What is forgotten though, is that for many applications these GPT/LLM type AIs and the language/text based statistical answers should not be used at all, even though they are popular at the moment. There are other AI types that may be more suitable to the problem and task.

gondolyr · October 16, 2024, 2:54am

This missed my radar but last week, the FSF announced that they are participating in NIST’s safety consortium of many different institutions to offer their stance and understanding of AI (they originally announced their participation during the LibrePlanet 2024 conference but I didn’t watch the talks). I don’t know how effective this move will be but it’s a step up from just criticizing the direction AI has been going on from the sidelines.

https://www.fsf.org/news/fsf-serves-on-nist

amarok · January 28, 2025, 9:02pm

In contrast: Chinese AI DeepSeek 'censors' sensitive questions on China when compared to rivals like ChatGP | Euronews

Conclusion: AI can be made to lie, or at least to avoid answering.

JR-Fi · May 30, 2025, 12:42pm

This new behavior found in some models in certain tests may be annoying for now (system not shutting down) but in a future large scale system that is executing a command that has bad implications to people… potentially dangerous, if there aren’t adequate alternative safety features: OpenAI model modifies own shutdown script, say researchers • The Register

[btw. I think we need an “AI” tag to the forum as search refuses to look for anything that short (only two characters)]

amarok · May 30, 2025, 1:54pm

I just posted a reminder: Topic tag for "AI"

amarok · May 30, 2025, 2:22pm

When using the search box, it appears you can just add an innocuous word that is sure to appear in texts, e.g. “the ai” or “ai the” to find articles pertaining to AI.

JR-Fi · May 30, 2025, 3:16pm

Ay, but AI gets mentioned a lot and sometimes it’s less relevant (not the point of the thread). There’s just a growing number of stuff that it’s related to.

lwriemen · May 30, 2025, 4:52pm

To quote the Blues Brothers movie, “It’s not my fault!”. The AI should absolutely know when it doesn’t have enough facts to draw a valid conclusion and state this. e.g., “Probability 50%” Of course, it could be programmed to lie about that as well.

JR-Fi · May 30, 2025, 6:05pm

Ay, but “knowing” is not a feature that AI can have. It also can not comprehend what the “whole” (100%) of a thing is. The models make calculations based on the data they have and is likely related, so confidence levels are applied and data engineer or programmer can set the limits when the model seems sure in it’s output and when it is more likely to state “I do not know”, or any other appropriate phrasing. I was once part of a dev project where we were sooo excited when the model confidently and practically in all correct situations said it did not have an answer but offered links to potentially relevant sources (in stead of trying to give an answer). It’s a hard balancing act to get right. “Lie” implies intent from the machines part and more likely is that this facet of interaction has not been implemented (at all or properly or in a way that is expected/wanted by user). Programming a system to give bad outputs is bad or happenstance or applying model to do what it’s not intended to do or incompetence or possibly also external attack. I just had this happen, and I’m pretty sure there was no malice involved - it was just too helpful in using data that actually was just guesses and it couldn’t tell it was bad quality (and I was part of that ).