Hardware-supported Instruction Set Randomization

dzulla · April 18, 2022, 10:45am

Hi.

There was some discussion around RISC V vs. POWER9 as the future direction of privacy- & security focused compute products (by Purism).

I would like to raise attention to developments in the field of Instruction Set Randomization, protecting against Memory Corruption / Shellcode Execution, which lead to malicious actors taking control over userland processes or the kernel. By doing such successfully, they can pursue further privilege escalation & even escape virtualization containment attacking the CPU, which we have seen before, as to compartmentalization.

Since Purism products are meant for people who are very risk- and security conscious, there may be an oppertunity here to lead, by keeping attackers out & protecting the CPU at any privilege level in virtualization context, by using a mathematically fool-proof approach to render assembly level exploitation meaningless.

If there is a way to embed this feature, turning ISA CPU opcodes known to the attacker into randomized nonsense, preventing a breach, by adding it to any chips you know of, already? The microcode layer of RISC V or Power9, or even a replacement of Intel’s current microcode, if that’s possible, may be a way to support this, besides a modified PureOS kernel. Please let me know if you have any ideas, other than going from FPGA emulated ISA PoC to manufacturing a new 14nm chip.

The follwing proof of concept runs a modified kernel with randomized instructions, also protects each userland process being randomized, adds a new register to hold the crypto secret for decoding, and embeds that into a modified SPARC processor run on an FPGA for demonstrating purposes and lacking the performance impact of software-level ISA by emulation/binary translation. The work was done with very little additional VHDL code extending SPARC, and can also be done entirely in software, as to current PureOS hardening, but slower.

Cheers,
Dan

dcz · April 18, 2022, 12:14pm

Crazy. I like it!

irvinewade · April 19, 2022, 1:32am

Is this really security through obscurity though?

As a practical measure it may reduce the chances of an RCE exploit working - and that’s good but …

I didn’t see any detailed explanation of how this works but I would assume that rather than describe this as “instruction set randomization” it may be more accurate to describe it as “instruction stream decryption”.

I have some reservations about this general idea because it is a small step from this to … you can’t run open source code at all any more because the manufacturer of the CPU is in possession of the decryption key and that key is only shared with blackbox vendors. (That might only apply to the early boot code / kernel code but it is still a serious problem.)

Likewise, it is a small step to … you can’t disassemble / reverse engineer / audit the code any more.

(It is unstated in the general case whether we are talking about symmetric encryption or asymmetric encryption. I think the above link is mostly focussed on the former but perhaps the latter is not ruled out.)

Intel tried something like this with SGX.

irvinewade · April 19, 2022, 2:30am

One complication with this is likely to be code sharing. If each user process, for example, has an encrypted instruction stream with a unique key then two such user processes cannot share code. That could have the effect that overall memory usage increases.

If this kind of idea ever became mainstream, it might be that you would want to associate an encryption key with a memory segment, rather than with a user process.

Alternatively, there could be a single encryption key, generated at boot-time, and applying to all instruction segments i.e. kernel and all user processes. Even that should defend fairly well against RCE exploits, given a strong key and a strong encryption algorithm.

PS None of this would seemingly defend against an RCE exploit that seeks to conduct a DoS, particularly an RCE exploit against kernel code. Randomising the instruction stream doesn’t change the fact that the system is going to crash very quickly.

irvinewade · April 19, 2022, 5:10am

Another consideration is how the kernel is handled.

In the actual research paper they say that they statically encrypt the kernel. That is fine for research but completely useless for a distro, since “every” computer would end up with the same encryption key for the kernel, which is no good at all. Consequently, in real world use you would need

the initial installer software to generate a new random key and reencrypt the kernel, and
on each kernel upgrade the same thing happens again.

(This is the same basic problem as flashing a Librem 5 with the LUKS variant.)

It goes without saying that the kernel file ought to be protected from read access.

The alternative, that they did not apparently actually use, but which they discuss, is to encrypt the kernel dynamically at boot-time. The only real challenge there is that there may not be sufficient entropy available to generate the new random kernel encryption key. (I think kernels partially mitigate this by saving the entropy pool at shutdown but that requires a little bit of handling in the initial installer software.)

dcz · April 19, 2022, 5:52am

I haven’t read the full paper, but all that’s needed is to apply some key to the process when copying code from storage to working memory. Then, if the process gets universal code injected, the injection will trigger an error instead of executing. No need for embedded keys.

irvinewade · April 19, 2022, 6:04am

Not sure precisely which of my comments you were responding to but, from the paper itself:

We decided to statically encrypt the kernel’s code so as to not add extra delay to the boot process.
Due to this, the key is decided at the time the kernel image is built and encrypted, and it cannot
change without re-encryption.

The kernel’s key is stored in the executable.

dcz · April 19, 2022, 6:26am

I think that applies to the kernel executable only. Note that nothing here says the key is impossible to retrieve. I am seeing the threat actor addressed by this as something remote without access to the executable executing on the given machine, but able to read/write to its process memory. So no access to keys. This matches a network-based attack, or a sandbox break-out.

irvinewade · April 19, 2022, 6:43am

Yes, I think that is the basic threat that this purports to contend with.

For user executables, they are probably leaning towards dynamic encryption - so the key is not in the executable anyway.

For the kernel, I was more concerned about blended attacks where file contents might leak (from a poorly written user-level server) so the kernel file ought to be protected rather than must be.

One other comment:

This doesn’t fully deal with run-time generated code e.g. bytecode interpreter doing JIT compilation. The paper does discuss this and they provide an API to convert a buffer of plaintext newly generated code into a buffer of encrypted code (using the user’s key). This however raises some points for the above-quoted comment:

Not all code comes from storage.
If run-time generated code gets swapped out subsequently then it does then come from storage - so it needs handling, presumably to make sure that such code gets written encrypted to the swap area and, more importantly, not encrypted when it comes back from swap.
It would be as well not to swap normal code to the swap area. Maybe Linux never does this anyway?? i.e. code areas assumed to be read-only and identical to the original that came from the original executable - so to swap code out, you just throw it away and read it back from the original executable if needed. But if code is swapped to the swap area then see previous point.

The paper also talks about self-modifying code (which run-time generated code is sort of a special case of) but in my opinion if you actually use self-modifying code then you get what you deserve.

dzulla · April 19, 2022, 9:09am

Hi.
The only concern I have too, also as to RISC V and Power9, is JIT. Mostly by v8 (Chrome).

If I remember correctly, Google blacklisted certain instructions in the context of Meltdown/Spectre and Rowhammer.

And getting JIT to work, with or without LLVM, in an ISR (Instruction Set Randomization) context, is good, but there may be suprises as to the specifics of the ISAs.

Best,
Dan

irvinewade · April 19, 2022, 11:40am

PS I doubt this can be added by microcode to an existing x86 CPU.

PPS One thing I couldn’t tell is what cipher mode they are using for AES. I half suspect ECB mode but that leaves me a little edgy for this application.

dzulla · April 19, 2022, 12:11pm

It wouldn’t be a strong protection to add that to a blackbox chip nobody really knows what it is doing internally anyways, while it is suspected to be government backdoored. Even if such a hack could be pulled off by researchers who are reversing Intel microcode, it would be a much better approach to target RISC V and Power9, which can be formally verified from Firmware to Manufacturing.
I’m unhappy about the need for ‘encryption’ as opposed to ‘randomization’ as you have pointed out in your first post, anyways. When I refer to cryptography in this context, I am suggesting a XOR bitshift unguessable secret/seed. I do not like the AES approach. I was thinking about a (software only) design that generates a randomized/XORed ISA decoder / emulator (QEMU backend, Valgrind) and binary translates / randomizes the binary at the same time, lacks any remaining seed or key anywhere in memory or register, and just works for that executable. I intended to also generate a LLVM JIT for this, spawning a browser/v8 doing so. but I don’t know how to do that built into the CPU, in terms of design. It may be impossible to create a native hardware performance zero-knowledge randomized architecture.

FYI:

dcz · April 19, 2022, 1:54pm

You can’t XOR data in an unguessable way unless you have a one-time pad as long as the data. Encryption has the advantage that it’s practical to generate a new key periodically, and is not limited by the one-time pad (JITs would need that).

dzulla · April 19, 2022, 1:59pm

Let me rephrase: Perform arithmetic changes on the hexadecimal opcodes of a binary executeable and generate the decoder (equally modified ISA emulator) at the same time, without storing the mulitiplier/bitshift/arithmetic operation, which was used on the opcodes, ) (which was randomly chosen) anywhere.

irvinewade · April 19, 2022, 11:19pm

The paper acknowledges that weakness.

The simplest, and probably the fastest, encryption algorithm is to XOR each bit of the code with the respective bit of the key. Since code is much larger than a typical key, the bits of the key are reused.

(my emphasis)

they can be easily exploited when the attacker has knowledge of both the plain-text and cipher-text (e.g., due to a memory leak)

My translation: XOR with a short repeating key (e.g. 128 bits) is little better than not encrypting the instruction stream at all.

Random access is required to the instruction stream. As such, I can’t see any easy, efficient way of using a one-time pad with XOR in this use case.

Just my opinion but … if someone is going to do this then it needs to be done properly and that would mean AES or something comparably solid.

irvinewade · April 19, 2022, 11:38pm

One feature from that paper that hasn’t been discussed here yet is Return Address Encryption i.e. if an exploit purports to clobber the stack, corrupt the return address and cause control to transfer to somewhere unexpected then … at least the “unexpected” will be unexpected both to hacker and process alike i.e. the process is very likely to crash rather than allow the hacker to compromise it.

This is an important aspect because otherwise instruction stream encryption only prevents controlled use of instructions provided by the hacker but does not prevent the hacker maliciously (ab)using existing, validly encrypted code. (ASLR is an obscurity feature designed to frustrate this kind of attack.)

However the downside is that Return Address Encryption is platform-specific. Whereas instruction stream encryption with a decrypter between memory and the I-cache is basically agnostic as far as the ISA goes, Return Address Encryption depends on the calling conventions in use, which in turn depends on both the ISA and the runtime environment (e.g. typically the operating system but could be worse).

dzulla · April 20, 2022, 8:10am

I think Return Address handling is interesting to important, considering the many ROP (“Return Oriented Programming”) cases that were seen. It would be good to consult with someone who is known for successful ROP exploitation, and also corrupting Browser memory.

As to financials, the last chip I witnessed which was built privately, by a couple of dudes with an idea, going from a draft, to hardware design, writing of compiler infrastructure and finally silicon manufacturing (SoC), cost $USD 2.5 million and multiple years of work. Maybe more money, but that’s all I know of. And that was in Silicon Valley, a 4 people team, including an experienced ASIC / Electrical Engineer. It never turned into a product, but was operational.

I would be thrilled to see browser vulnerabilities like Heap Spraying and Sandbox escapes go away, and running Linux on a chip that can be trusted from ISA specification to fabrication and supply chain. The ISR should be built into an operating system like PureOS that, in its final state, doesn’t just virtualize (QubesOS) or harden (GrapheneOS) but also integrates the Instruction Set Randomization, as a final defense.

Can someone estimate the effort of adding this to an existent, mobile-ready, Chip design like Titan.M2 (RISC V, Google) or upcoming Power9 (Raptor Computing), including patching PureOS and making Firefox/v8 JIT work?

irvinewade · April 20, 2022, 10:03am

Not me! Why not contact the authors of that paper? They have at least done most of that.

(I don’t think they did any work on the JIT producer side. They just provided a simple API for the JIT producer to use. It’s not just a browser or other Javascript user that needs to accommodate such a change. It’s also any bytecode environment that does JIT native code generation e.g. Java, C#.)

On the CPU side (existent, mobile-ready), you could throw in ARM too. If you don’t trust an existing ARM CPU manufacturer, you can license an existing design and fab your own ARM CPUs, implementing the necessary additional functionality as described in the paper.