JOS - Compile and Link

compile
 
OS
 
lab
 

JOS contains two binaries, the bootloader and the kernel. We had a high level overview of JOS’s Makefile and now explore in more details how each binary is compiled, and how they are linked together in an image that can be loaded onto a disk.

Compilation Overview

It is important to point out the roles of source files, object files, ELF binary, compiler, assembler and linker to fully understand how the bootloader is created. Generally, there are four steps needed to convert some source files to an executable binary file: Let’s take a look at the artifacts:

  • Source files: files containing high-level code usually written/read by humans;
  • Assembly files: files with instructions that resembles machine code. Compilers (and preprocessors) convert high-level sources files into lower-level assembly code.
  • Object files: files with relocatable machine code. Assemblers convert assembly files into object files that contains machine codes and information for debugging and linking, such as section addresses and symbols. Object files can be inspected with objdump utility;
  • ELF binary: the executable. Linkers link the object files into a single binary into a format that is recognizable by the OS. The kernel binary follows the same format. And the bootloader, similar to how an OS handles a binary, needs to recognize it properly, load the sections to the main memory according to the ELF instructions, and transfer control to the place specified by the ELF binary.

JOS Image

Now let’s take a look at how the final image is generated:

# kern/Makefrag
$(OBJDIR)/kern/kernel.img: $(OBJDIR)/kern/kernel $(OBJDIR)/boot/boot
	$(V)dd if=/dev/zero of=$(TMPIMG) count=10000 2>/dev/null
	$(V)dd if=$(OBJDIR)/boot/boot of=$(TMPIMG) conv=notrunc 2>/dev/null
	$(V)dd if=$(OBJDIR)/kern/kernel of=$(TMPIMG) seek=1 conv=notrunc 2>/dev/null
	$(V)mv $(TMPIMG) $(OBJDIR)/kern/kernel.img

The image is generated by writing two binaries onto the disk image with dd: boot is written to its first sector and kernel to its consecutive sectors.

The former one, boot is generated by linking the two object files boot.o and main.o:

# boot/Makefrag
$(OBJDIR)/boot/boot: $(BOOT_OBJS)
	$(V)$(LD) $(LDFLAGS) -N -e start -Ttext 0x7c00 -o $@.out $^
	$(V)$(OBJDUMP) -S $@.out >$@.asm
	$(V)$(OBJCOPY) -S -O binary -j .text $@.out $@
	$(V)perl boot/sign.pl $(OBJDIR)/boot/boot

Notice that the linker specifies the text (a.k.a. code) section to start at 0x7c00, which is reflected in the addresses of the resulted binary:


The above example shows the differences of the boot binary when linked the text section to different addresses (0x0000 vs 0x7c00): the linker adjusted the addresses in the instructions according to the text section address.

Now for the kernel binary, not only it is linked to a specific address, it is loaded at another address different from the link address:

# kern/Makefrag
KERN_LDFLAGS := $(LDFLAGS) -T kern/kernel.ld -nostdlib
$(OBJDIR)/kern/kernel: $(KERN_OBJFILES) $(KERN_BINFILES) kern/kernel.ld
	...
	$(V)$(LD) -o $@ $(KERN_LDFLAGS) $(KERN_OBJFILES) $(GCC_LIB) -b binary $(KERN_BINFILES)
/* kern/kernel.ld */
ENTRY(_start)
SECTIONS {
	/* Link the kernel at this address: "." means the current address */
	. = 0xF0100000;
	/* AT(...) gives the load address of this section, which tells
	   the boot loader where to load the kernel in physical memory */
	.text : AT(0x100000) {
...
# kern/entry.S
.text
# ... headers skipped ...

.globl		_start
# RELOC(x) maps a symbol x from its link address to its actual
# location in physical memory (its load address).	 
_start = RELOC(entry)

# The entry symbol.
.globl entry
entry:
	movw	$0x1234,0x472			# warm boot

It can be further attested by the kernel binary itself:

$ objdump -h obj/kern/kernel
obj/kern/kernel:     file format elf32-i386
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00001bc9  f0100000  00100000  00001000  2**4
...
$ readelf -a obj/kern/kernel
ELF Header:
  ...
  Entry point address:               0x10000c
  ...

All above leads us to a few conclusions about kernel’s entry location:

  1. _start denotes boot binary’s entry code in assembly, which is defined as RELOC(entry), the physical address of the entry code: LMA + entry_offset = 0x00100000 + 0xc = 0x00100000c;
  2. kernel’s code is linked at 0xf0100000, which is also reflected in Virtual Memory Address. It means:
    • kernel code is relocated to this address (same way as how boot is relocated to 0x7c00, where the addresses in the assembly are adjusted accordingly);
    • PC counter should be have VMA values if any relocated code is executed. kernel starts execution with LMA values in the counter, but switches to VMA before it reaches any relocated instruction;
  3. kernel code is loaded to the physical memory address 0x00100000, which is its Load Memory Address. In most cases, LMA and VMA are identical. Exceptions are
    • XIP mode in embedded systems where the program runs from the ROM, and because the data section have its initial values in the ROM but needs to be updated during the execution, it needs to be copied from LMA (ROM) to VMA (RAM);
    • kernel leaves the lower memory space (in virtual address space) for user space programs, so it sets up a high VMA. Because the page mapping that maps the high virtual address to lower physical address is not set up before kernel is loaded, it needs a low LMA, and thus the discrepancy between the two.

Ending notes:

  • Assembly files don’t have absolute address awareness. ORG specifier in assembly files specify the relative address to the section start address;
  • Object files don’t know about absolute address either. However, they are set up in such a way that it is easy for the linker the relocate the sections;
  • The ELF binaries have all the information about locations of sections. The binaries alone however, cannot be executed on a processor. An ELF binary needs ELF-compliant programs to correctly load its sections according to its specifications, and if necessary copy code around when LMA!=VMA, and finally transfer execution control to its entry location. The programs can be the OS when executable user programs, or bootloader when executing the OS kernel.