This post is part of a series on generating basic x86 Mach-O files
with Ruby. The
first post introduced CStruct, a Ruby class used to serialize
simple struct-like objects.
Please note that the best way to learn about Mach-O properly is to
read Apple's
documentation on Mach-O, which is pretty good combined with the
comments in /usr/include/mach-o/*.h. These posts will only cover
the basics necessary to generate a simple object file for linking with
ld or gcc, and are not meant to be comprehensive.
Mach-O File Format Overview
A Mach-O file consists of 2 main pieces: the header and
the data. The header is basically a map of the file describing
what it contains and the position of everything contained in it. The
data comes directly after the header and consists of a number of
binary blobs of data, one after the other.
The header contains 3 types of records: the Mach header,
segments, and sections. Each binary blob is described
by a named section in the header. Sections are grouped into one or
more named segments. The Mach header is just one part of the header
and should not be confused with the entire header. It contains
information about the file as a whole, and specifies the number of
segments as well.
Take a quick look at Figure 1 in
Apple's Mach-O overview, which illustrates this quite nicely.
A very basic Mach object file consists of a header followed by single
blob of machine code. That blob could be described by a single
section named __text, inside a single nameless segment. Here's a
diagram showing the layout of such a file:
,---------------------------,
Header | Mach header |
| Segment 1 |
| Section 1 (__text) | --,
|---------------------------| |
Data | blob | <-'
'---------------------------'
The Mach Header
The Mach header contains the architecture (cpu type), the type of
file (object in our case), and the number of segments. There is more
to it but that's about all we care about. To see exactly what's in a
Mach header fire up a shell and type otool -h /bin/zsh (on a
Mac).
Using
CStruct we define the Mach header like so:
Segments
Segments, or segment commands, specify where in memory the
segment should be loaded by the OS, and the number of bytes to
allocate for that segment. They also specify which bytes inside the
file are part of that segment, and how many sections it contains.
One benefit to generating an object file rather than an executable is
that we let the linker worry about some details. One of those details
is where in memory segments will ultimately end up.
Names are optional and can be arbitrary, but the convention is to
name segments with uppercase letters preceded by two underscores,
e.g. __DATA or __TEXT
The code exposes some more details about segment commands, but should
be easy enough to follow.
Sections
All sections within a segment are described one after the other
directly after each segment command. Sections define their name,
address in memory, size, offset of section data within the file, and
segment name. The segment name might seem redundant but in the next
post we'll see why this is useful information to have in the section
header.
Sections can optionally specify a map to addresses within their
binary blob, called a relocation table. This is used by the
linker. Since we're letting the linker work out where to place
everything in memory the addresses inside our machine code will need
to be updated.
By convention segments are named with lowercase letters preceded by
two underscores, e.g. __bss or __text
Finally, the Ruby code describing section structs:
macho.rb
As much of the Mach-O format as we need is defined in
asm/macho.rb. The Mach header, Segment commands, sections,
relocation tables, and symbol table structs are all there, with a few
constants as well.
I'll cover symbol tables and relocation tables in my next post.
Looking at real Mach-O files
To see the segments and sections of an object file, run
otool -l /usr/lib/crt1.o. -l is for load commands.
If you want to see why we stick to generating object files instead of
executables run otool -l /bin/zsh. They are complicated
beasts.
If you want to see the actual data for a section otool provides a
couple of ways to do this. The first is to use
otool -d <segment> <section> for an arbitrary
section. To see the contents of a well-known section, such as __text
in the __TEXT segment, use otool -t /usr/bin/true. You can
also disassemble the __text section with
otool -tv /usr/bin/true.
You'll get to know otool quite well if you work with Mach-O.
Take a break!
That was probably a lot to digest, and to make real sense of it you
might need to read some of the
official documentation.
We're close to being able to describe a minimal Mach object file
that can be linked, and the resulting binary executed. By the end of
the next post we'll be there.
(You can almost do that with what we know now. If you
create a Mach file with a Mach header (ncmds=1), a single unnamed
segment (nsects=1), and then a section named __text with a segment
name of __TEXT, and some x86 machine code as the section data, you
would almost have a useful Mach object file.)
Till next time, happy hacking!