The Nitty Gritty of “Hello World” on macOS

Published November 8, 2014 By Joe Savage

Recently I decided that I knew too little about how executables interface with the Operating System. I write some C code, it gets compiled, assembled, and statically linked, and then some magic happens and the stuff I wrote gets loaded and run somehow. This post is about somewhat demystifying that magic — in particular, dissecting the macOS Mach-O ABI.

I started this exploration process by writing a simplified "Hello World" program which I figured might produce an easy to interpret output file. Sure, I could learn about all this stuff by doing an enormous amount of reading, but that's no fun. I'd much prefer to explore things myself and see where it takes me, doing research when I get stuck. The two lines are as follows:

#include <stdio.h>
int main() { fwrite("Hello, world!\n", 1, 15, stdout); return 0; }

Next up I ran gcc hello-world.c -o hello.out (which if I remember correctly is actually clang in disguise on macOS) on my Late 2013 Macbook running Yosemite, and went ahead and opened up the result in a hex editor to start analysis. Honestly, I never expected two lines of C code to consume so much of my time. I don't want to explain each of the 8548 output bytes in detail here — it would take far too long and wouldn't be very interesting to read. Instead, I'm going to attempt to give a relatively brief outline of my findings. Feel free to play along at home with your own binary if you have a similar environment to mine.

The first four bytes of the generated file are cf fa ed fe — no doubt some kind of standard file header. Running `file hello.out` quickly reveals that this is indeed the header for a little endian 64-bit Mach-O binary. That's a good start! It turns out that the Mach-O format is the standard file format to store programs and libraries on macOS (and iOS) — there's even an official reference document which should be of great use here.

So with a better idea of what we're dealing with, let's step back for a second and get an overview of the layout of the file. Here's a Cortesi-style visualisation of the resulting binary (generated by a Mac app I wrote). If you aren't familiar, each byte is plotted on a space-filling curve and coloured according to its value such that bytes with similar locations display in visually related regions, and bytes with similar values have related colours.

We can see from this that there seem to be a few distinctly separated regions along with a lot of black space (bytes of value zero). Taking a look a the file format reference, Apple provides the following diagram to describe the layout of the Mach-O format:

With little knowledge of the format, you can begin to draw parallels between the above two diagrams. The data at the beginning of the file is the Mach-O file header and load commands, and then we have some data in 'segments' (usually in sections within segments) within a sea of 0x00 bytes which pad the segments out to page boundaries (in this case, 4096 bytes).

In particular, the yellow region above is the header, the red contains the load commands, and the green, blue, and purple regions are (portions of) segments. Let's talk about each of these in some detail.

Mach-O header

The Mach-O header is relatively simple to understand — it consists of the first 32 bytes of the file here, and can be understood byte-by-byte by looking at the 'mach_header_64' structure. The otool utility does a good job of automating this for us, however. Running otool -h hello.out reveals all the important information about the file's header:

$ otool -h hello.out
hello.out:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x80          2    16       1376 0x00200085

Apple's open source code (including otool's source) provides a lot of detail about all this stuff, and was useful to me throughout this process, but to summarise the output here:

  • 0xfeedfacf (reordered from its little-endian representation in the file) is the 'magic' constant for the 64-bit header (MH_MAGIC_64/MH_CIGAM_64 in loader.h).
  • The 'cputype' is the x86_64 CPU type value (CPU_TYPE_X86_64 in machine.h).
  • The 'cpusubtype' is the value for all x86_64 processors (CPU_SUBTYPE_X86_64_ALL), plus the 'capability bits' requiring compatibility with 64-bit libraries (CPU_SUBTYPE_LIB64 in machine.h).
  • The 'filetype' is that of a "demand paged executable file" (MH_EXECUTE in loader.h).
  • The 'ncmds' and 'sizeofcmds' fields indicate that 16 load commands follow, of total size 1376 bytes.
  • The 'flags' field sets a bunch of flags about our file: our file has no undefined references (MH_NOUNDEFS), is meant for the dynamic linker (MH_DYLDLINK), uses two-level name bindings (MH_TWOLEVEL), and should be loaded at a random address (MH_PIE).

Load Commands

According to the file format reference, load commands "specify both the logical structure of the file and the layout of the file in virtual memory". They're central to the Mach-O file format, and our header says we have 16 of them coming right up, so let's take a look at them.

Once again, you could read through the bytes in accordance with the file format reference (this time looking at the 'load_command' structure and its close friends), but otool makes things easy. Running otool -l hello.out provides detailed information about all the load commands in the file — I won't go through all these details in this post, however. The Mach-O format reference provides summaries of a few load command types, but not all of the ones present in our file, so I'll provide an overview of the types present myself.

  • LC_SEGMENT_64: Defines a (64-bit) segment which will get mapped into address space when the file is loaded. Includes definitions of sections contained within the segment.
  • LC_SYMTAB: Defines the symbol table ('stabs' style) and string table for this file. These are used by linkers and debuggers to map certain symbols (e.g. in the original source files) to regions of the compiled binary. In particular the symbol table defines local symbols which are used only for debugging, as well as both defined and undefined external symbols.
  • LC_DYSYMTAB: Provides additional symbol information to the dynamic linker about symbols present in the symbol table that the dynamic linker should handle. Includes definition of an indirect symbol table specifically for this purpose.
  • LC_DYLD_INFO_ONLY: Defines an additional compressed dynamic linker information section which contains metadata for symbols and opcodes for dynamic binding amongst other things . The stub binder ('dyld_stub_binder') which deals with dynamic indirect linking makes use of this to do its linking. The '_ONLY' extension to the name indicates that this load command is required for the program to run, so older linkers that don't understand this load command should stop here.
  • LC_LOAD_DYLINKER: Load a dynamic linker. Usually "/usr/lib/dyld" on macOS.
  • LC_LOAD_DYLIB: Load a dynamically linked shared library. For example "/usr/lib/libSystem.B.dylib" which is an implementation of the C standard library plus a bunch of other things (syscalls and kernel services, other system libraries, etc.). Each library is loaded by the dynamic linker and contains a symbol table linking symbol names to addresses, which is searched for matching symbols.
  • LC_MAIN: Specifies the entry point of the program. In our case, this is the location of the main() function.
  • LC_UUID: Provides a unique random UUID, usually produced by the static linker.
  • LC_VERSION_MIN_MACOSX: The minimum OS version on which this binary was built to run.
  • LC_SOURCE_VERSION: The versions of the sources used to build the binary.
  • LC_FUNCTION_STARTS: Defines a table of function start addresses, for debuggers and other programs to easily see if an address lies within a function.
  • LC_DATA_IN_CODE: Defines a table of non-instructions in code segments.
  • LC_DYLIB_CODE_SIGN_DRS: Defines code signing designated requirements for linked dynamic libraries.

Wow, that got technical really fast! We haven't even looked in depth at the load commands in our executable here, just the types of load commands present! Don't worry if you don't quite get all of the theory here. Essentially the load commands just provide a bunch of varied information either about data in the rest of the file (defining/referencing chunks of data which occur), or about the executable directly. This information pretty much underpins the entirety of the rest of our file.

Sections & Segments

Looking at the load commands in more detail, a certain segment/section structure is defined via the 'LC_SEGMENT_64' commands and is referenced by many other load commands. The remainder of the file essentially populates this structure with meaningful data. All the segments and sections defined in our file are described as follows:

  • __PAGEZERO: A segment generally full of zeros to catch NULL pointer dereferences. This generally occupies no space on disk (or in RAM) as it's mapped to zeros at runtime. As an aside, this segment can be a good place to hide malicious code.
  • __TEXT: A segment for executable code and other read-only data.
    • __text: A section for executable machine code.
    • __stubs: Indirect symbol stubs. These jump to the value of (writable) locations (e.g. entries in '__la_symbol_ptr' which we'll see shortly) for non-lazy ("load with the executable") and lazy ("load when first used") indirect references. For lazy references, the address being jumped to will first point to a resolution procedure, but after the initial resolution will point to the resolved address. For non-lazy references, the address being jumped to will always point to the resolved address as the dynamic linker will fix the address as it loads the executable.
    • __stub_helper: Provides helpers to resolve lazy loaded symbols. As described above, lazy loaded indirect symbol pointers will point inside here before they are resolved.
    • __cstring: A section for constant (read-only) C-style strings (like "Hello, world!\n\0"). The linker removes duplicates on building the final product.
    • __unwind_info: A compact format for storing stack unwind information for exception handling. This section is generated by the linker from information in '__eh_frame' for exception handling on macOS.
    • __eh_frame: A standard section used for exception handling which provides stack unwind information (and sometimes additional debug information) in the DWARF debugging data format.
  • __DATA: A segment for readable and writable data.
    • __nl_symbol_ptr: A table of pointers to non-lazy imported symbols.
    • __la_symbol_ptr: A table of pointers to lazy imported symbols. This section starts out with its pointers pointing to resolution helpers, as previously discussed.
    • __got: The global offset table — a table of pointers to (non-lazy) imported globals.
  • __LINKEDIT: A segment containing raw data for the linker ('link editor'), in this case including symbol and string tables (the contents of which can be revealed via `nm`), compressed dynamic linking information, code signing DRs, and the indirect symbol table — all of which occupy regions as specified by the load commands.

The Bigger Picture

With knowledge of the load commands, segments, and sections along with all their purposes, it shouldn't be too difficult to see the bigger picture of what's going on when the binary is being run. We've already discussed many of the processes that happen in dynamically linking and running the binary. Essentially:

  1. Our Mach-O output from building and static linking becomes input for the dynamic linker, which uses data in our file specified by load commands to link dependencies in various ways.
  2. Segments of the executable are mapped into memory as specified in the load commands.
  3. Execution begins at the point specified by 'LC_MAIN', which in this case is the start of __TEXT.__text.

To go into a little more detail (with the help of this document), here's a rough outline of the entire process specifically for our 'Hello World' binary:

  1. The user indicates that they wish to run the binary.
  2. It is determined that the file is a valid Mach-O file, so the kernel creates a process for the program (fork) and begins the program execution process (execve).
  3. The kernel examines the Mach-O header and loads the program, along with the specified dynamic linker ("/usr/lib/dyld"), into some allocated address space as specified in the load commands. Segment virtual memory protection flags are also applied as specified (e.g. __TEXT is read-only).
  4. The kernel executes the dynamic linker, which loads any referenced libraries — in this case, "/usr/lib/libSystem.B.dylib" — and performs the symbol binding necessary to start the program (i.e. non-lazy references), searching the loaded libraries for matching symbols.
  5. Assuming the symbols have been correctly resolved, the dynamic linker places the resulting addresses into the sections which take ownership (as specified in their load command entries) over the corresponding entries in the indirect symbol table (defined by 'LC_DYSYMTAB'). In this case, the resolved addresses are placed into '__nl_symbol_ptr' and '__got'.
  6. Some initialization code is executed to setup the runtime state, after which the entry point as specified by LC_MAIN is called!
  7. When a lazy-bound reference is used for the first time (via '__stubs'), the '__la_symbol_ptr' entry should point to a resolution routine (due to preparation by the static linker in the build process) in '__stub_helpers', which invokes the 'dyld_stub_binder' (which was linked dynamically when our program was loaded) to perform the resolution and update the address in '__la_symbol_ptr'.

Of course, there are many more specific details in this process if you wish to look into them. For Mach-O exploring I recommend the use of otool and MachOView, along with a healthy dose of open source code, specifications, and obscure online resources. There are a bunch of parts to Mach-O and dynamic linking that we haven't even touched on. Weak binding, for example, is a different type of symbol binding which links symbols only if they are available on the system.

If you're interested in seeing the assembly code in __TEXT.__text for the 'Hello World' program, objdump makes easy work of the disassembly — though of course we can also get this output first-hand from the compiler: gcc -S hello-world.c -o hello.s. There are already a number of resources going through the specifics of C 'Hello World' assembly, so I'm not interested in covering that part of things here. What we have covered here, however, is some specifics of a seemingly oft forgotten layer of binary execution on modern systems.