Sacrifice HyperThreading to smash Spectre?

Above mine too.

As a side note, on skill levels, I highly suggest reading Sandworm. This timely book by Andy Greenberg of Wired magazine will enlighten you as to who’s bad, and how bad they are. We read it this spring, and I was glued to it from beginning to end.

@whistler, All very interesting.

This last idea I think can be broken too easily by just getting another way to generate the machine code.

There is an old saw that the most secure data storage was a chalkboard in a windowless locked room.

Update: With today’s technology, it has to be a lead-lined, temperature controlled, vibration modified, room.

i see that lord Amazon has reared his head again … lol

Depends on what software you want to run. It’s quite easy to disable programs from writing to executable pages or executing writable pages. Easy enough to restrict changing the writable/executable flag on pages. Only problem is you don’t get any JIT compiling. Which means, at best, terrible performance out of Java, Javascript, C#, and any other language using a JIT. Also no virtualization (qemu/kvm), and no instruction set translation (qemu user mode). At least some implementations of these languages won’t handle having the JIT fail gracefully, so you’ll get program crashes or other poor behaviour.

Still, for a kiosk type machine, it has some use.

@reC, Actually I found it on display at the local library here. (Here’s another summary without the zon).

1 Like

@reC You don’t necessarily have only a brief duration in which to execute the exploit. Potentially, if the data is longterm (like a password that doesn’t often change) and you know how to evade address space randomization, you might be able to gather statistics over repeated sessions, such as web logins. Most mitigations to date focus on thwarting narrowly defined threats which yield ample statistical evidence. It’s much harder to defend from attacks which extract fractions of a bit at a time over a period of days or weeks.

It probably wouldn’t require AI to execute a Spectre exploit. At most, one might need disassembling and fuzzing tools to identify attractive branches to attack. But your point underscores how deep the rabbit hole goes; this is why mitigations aren’t solutions, so the industry actually needs a definitive fix. This isn’t going to go away just because the patches get complex enough that we finally do need AI to find a way through it (but still can). Give me a hammer. I’ll fix it for good.

For us users, the bottom line is that we’re sitting ducks. The best we can probably do, apart from stacking up a bunch of half baked mitigations, is to disable Hyperthreading. This isn’t a silver bullet, but it does decrease the sideband bitrate from attacked program to attacking program. Kudos again to @lperkins2 for posting the code to do that, above.

One concern I have is that the CPU vendors may have concluded that it’s best to look at Spectre like the pharma giants look at HIV: ameliorate, but don’t cure, because only the former provides a longterm revenue stream. It’s not a conspiracy. It just makes economic sense, absent any class action lawsuit (which AFAIK isn’t happening). It also makes engineering sense because it’s a lot easier to slap on a bandaid than to delete all the speculation machinery added since the Pentium Pro.

I get the sense that we should all just move to ARM, for various reasons but most importantly Spectre. It’s not fixed there, either, but my understanding is that it’s harder to exploit.

Extensive, complex and buggy logic for speculative execution. FTFY

Even so, this is I think where we might have to defer to Intel. They know far better than we do what the opportunity cost in silicon is. Presumably they think, or at least thought, that speculation is a net win.

I could argue, alternatively, that I would rather the silicon be devoted to larger caches - and my opinion would be relatively poorly informed as far as use of silicon is concerned because the number of CPU chips that I have designed is exactly zero. :slight_smile:

Once again though your trade-off only makes sense where adequate parallelism exists. I would suggest it would be a multi-year project to refine, say, all core Linux distro code to work best when executing in parallel on lots of cores.

1 Like

Hey, I would be happy with a power-on kill switch for speculation that would require no inter-processor synchronization at all. (That means … latest speculation vulnerability is announced and I have to reboot to disable it … but that’s OK for me. Might not be OK for 24x7 sites.)

Another power-on option would be that speculation is off by default and it has to be enabled by boot code, if chosen. That likely would require some inter-processor communication although I don’t know that it is too difficult - since if you want speculation at all, it shouldn’t be a problem if some cores are doing it and some aren’t.

My assumption is that this is a temporary mitigation. It will optionally be reversed when enough hardware is out there where all these problems are fixed. Maybe.

I think you might be wrong about that. Branch mis-predicts are expensive.

Not necessarily.

It can be that there are static hint bits (in the instruction stream) that provide a default when the branch is not in the branch prediction cache.

It can also be that forward conditional branches are default predicted not taken while backward conditional branches are default predicted taken. (The logic of this is, for example, that loops typically execute several times.)

However neither of those paragraphs undermines your point since a hypothetical attacker will know which of the above, if any, applies.

See also https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)#Retpoline

GCC team hard at work, and did that work many many months ago. An ongoing battle of course.

This isn’t accurate. It’s a low level attack in the sense of requiring someone to understand how CPUs operate at and below the instruction level. However definitely doesn’t need AI to carry out the attack. Also, no matter what level of sophistication is required, eventually it becomes available in a toolkit for script kiddies, requiring no greater sophistication than that required to download and run the toolkit.

Not necessarily. For example, if I had a computer used by multiple users concurrently and those users were not all at the same security level and the entire computer was airgapped, these kinds of exploits would still be useful for a user at a low security level to get an “upgrade”. :wink:

1 Like

@kieran Yeah, a default-enabled speculation kill switch requing a reboot would be fine. Perhaps not the optimal solution, but certainly one which is conceptually simple and actually solves the problem. I would pay a fat premium for that improvement, alone. The more I think about it, the alpha version of this could just implement all branch instructions in microcode, forcing them to do the same serialization as a retpoline (but without pushing the instruction pointer to the stack). (I knew about retpolines long ago but somehow forgot. Thanks for the reminder. I wonder if (SETcc CL) followed by LOOP(N)Z would be faster and equally safe, without memory side effects, at the cost of locking up RCX for the purpose.) In any event, your kill bit would allow code to run as is, with zero software engineering effort. Then, eventually, this “branch synchronization” could be done in hardware, stalling the instruction pipe at each branch until the correct behavior was known, while otherwise allowing speculation. And, no, enabling the control bit for this should not require any interprocessor communication since, as you said, it’s perfectly fine if some cores enable it and others don’t.

Branch misprediction is only expensive on the Pentium 4 architecture because it was designed with the assumption that most of the intensive future tasks would be deeply pipelined and not branchy. As far as the CPU is concerned, this never really happened outside of audio or video decoding, so Intel reverted to the Pentium III architecture in this respect, which evolved into the grotesque monsters you see today. At least, they have relatively short pipes which can recover rapidly from misspeculation. But all this is increasingly irrelevant as more cores compete for essentially the same memory bandwidth, so really, misprediction is cheap.

And as you said, the overriding issue is the stupid BTB, which in any case induces deterministic behavior for an as-yet-unseen branch. To be clear, though, I don’t think BTB randmization is any more than a mitigation because it might be that you have a situation like: if the coin flip is heads, I don’t get any information, but if it’s tails, I get a little.

Back to your kill bit, speculation definitely doesn’t take up much space, especially with all the frankengarbage (https://www.extremetech.com/computing/312673-linus-torvalds-i-hope-avx512-dies-a-painful-death) in today’s CPUs, so removing it isn’t going to provide you with reems of cache or a free brawny core. But it might give you, say, a bigger TLB, which could provide ~5% speedups under heavy load. That would seem to make the economic case in an industry where competitors fight over breadcrumbs.

Well yeah—like everything it’s a compromise. There was a time not long ago that attacks like Spectre were hardly relevant, at least on customer hardware. But the need for security/isolation even on general-purpose computing hardware is tilting that. So it’s not really clear where this will go.