ARM64 vs ARM32: A primer for Linux programmers
Keywords:ARM Linux programmers just-in-time RISC cryptographic
I had originally planned to call this article "What's NEW in ARMv8 for Linux Programmers?" However, I think "what's different" is much more apt. And, just for the record, by "ARMv8-A" I mean AArch64, with the A64 instruction set, also known as arm64 or ARM64. I've used AArch64 registers in the examples, but many of the issues I've described also happen in the ARMv8-A 32-bit execution state.
To help frame the problems discussed here, let me start by giving a little background on the sort of codebase we have here at Undo. Our core technology is a record and replay engine, which works by recording all non-deterministic input to a program and uses just-in-time compilation (JIT) to keep track of the program state. Our technology started on x86 (32 and 64-bit) and had progressed to have fairly complete, maturing support on ARM 32-bit when we began adapting it to work on AArch64. I joined the company after almost all of the low hanging fruit had been grabbed (as well as many rather higher up the tree, to be fair) leaving us with some tricky problems to tackle when it came to moving to ARMv8.
This leads me to my first simple, but possibly helpful, observation: ARM64 is much more similar to ARM 32-bit (aka AArch32) than it is to x86. ARM64 is still quite RISC (though the cryptographic acceleration instructions do lead to raised eyebrows in a RISC architecture). So I don't intend to try to cover the many differences between x86 and either ARM version. Nor do I want to rehash the differences between AArch32 and AArch64—there are already good resources to explore those differences.
Also, a lot of ARM versus ARM64 resources focus on the instruction set and architectural differences. These differences are not really relevant to most Linux user space application developers, beyond the very obvious, such as "your pointers are bigger." But, as we discovered, there are differences important to Linux user space developers, four of which I'll discuss here. These differences fall into several categories, some falling into more than one category. The categories are:
Differences due to migrating to use a fairly new kernel version.
Differences due to the architecture and instruction set (where this is relevant to user space programmers).
Ptrace differences. We use ptrace a lot, so this was very important to us.
I will try to use the following format in the next sections:
A brief explanation of the area.
What is the difference? Why is this different? (Sometimes it is easier to understand a change in behaviour by looking at a few assembly instructions than it is from a wordy description, so I'll provide that code.)
How did we encounter it?
How did we overcome it?
Where to find out more information.
Changes to ptrace
ptrace provides process tracing capabilities to user space programs.
There have been a number of changes to the requests accepted by ptrace(). These changes produce the most pleasant of all incompatibilities to analyse: compilation errors. Our error reports were for undefined symbols PTRACE_GETREGS (for general registers), PTRACE_GETFPREGS (for floating point and SIMD registers), and PTRACE_GETHBPREGS (for hardware breakpoint registers), as well as the SET versions of these requests.
The man page for ptrace was no help at all in resolving these errors, so we dug deeper. We had a look at the kernel source, and it turns out that usually there is an architecture-independent ptrace code path (ptrace_request() in kernel/ptrace.c), and separate architecture-dependent paths (e.g. arch_ptrace() in arch/arm/kernel/ptrace.c). Although the arm64 version has a compat_arch_ptrace for AArch32 applications, the arm64 arch_ptrace() directly calls ptrace_request() and does not add any additional ptrace request types.
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.