x86 Segmentation

OS
 
x86
 

Segmentation is the first and most fundamental memory management system in x86 processors, and it’s the first level of of x86 address mapping. Although obsolete in modern system, it is still highly important to understand it to fully comprehend x86 memory management.

Segmentation is essentially the culprit of x86 address complexity. It has been largely phased out from modern OSes, where the bare minimum is set up for Segmentation.

Intel 8086

Memory Segmentation Mechanism existed since Intel 8086 (release in 1978), in spite of the fact that its successors employ a drastic different implementation and target different problems.

Intel 8086 operates in a different address mode than its successors Real Address Mode that uses segment:offset pairs to generate 20-bit addresses with two 16-bit registers. The 20-bit address is generated by multiplying segment by 16 and adding offset to it.

Retrospectively speaking, its choice of design offers no protection mechanism between segments, e.g. 0x0000:0xf000 and 0x0f00:0x0000 points to the exact same location in the memory, and besides that it’s unintuitive to have a total of 32 bits to describe a 20-bit address. To understand their design choice, it is important to identify the problems it’s tackling:

  1. Processors at the time were limited in address width. The 16-bit registers were required for backwards compatibility, and also that the databus being 16-bit wide would result into 2 memory cycles if it were to read a 20-bit value;
  2. It was designed to support modular software design(8086 Manual Chapter 2.3). In this design, segments of code or data are stored separately, and programs can use different segments by simply updating the segment registers;
  3. Memory was considered a scarce resource. This led to the design choice of using two 16-bit register to locate a memory. This design allows variable length segments (with a max size of 16-bit address equivalent of 64KiB) and overlapping segments. In the example below from the manual, if Segment B is only 32KiB, we can then place Segment C starting 32KiB after if to fully utilize the memory. The 4 bits shifting of the segment register guarantees a maximum of waste of 16 Bytes due to segment alignment.
  4. The real world story also involves business decisions.

Overall, it was a carefully-thought design that made sense its time, but traded off scalability.

Lastly, a quick overview of Intel 8086 Segmentation:

  1. It provides four 16-bit segment registers CS (code), DS (data), SS (stack) and ES (extra, usually also used for data, );
  2. The segment registers are generally implicitly used, e.g. cs:ip for program counter, ss:sp for stack pointer, ds:offset for data, es:offset for string. However

    It is possible for a programmer to explicitly direct the BIU to access a variable in any of the current addressable segments. This is done by preceding an one-byte instruction with a segment override prefix.

  3. Segment registers are generally updated automatically by special instructions, e.g. cs is updated for long jmp, long call, etc. However, Intel 8086, unlike its successors, is more lenient for restricting user code from updating the segment registers, allows normal instructions like pop cs to update segment registers.

Intel 80286

Intel 80286 (also called iAPX286 or Intel 286, release in 1982) introduced Protected Mode and brought a major overhaul to the Segmentation Mechanism, which is now commonly recognized as the de facto version of it.

At its core, 80286 is still largely similar to its predecessor:

  • It is still a 16-bit processor;
  • The Segmentation Mechanism preserves the four segments: code, stack, data and extra.

The differences are

  • Protected Mode was introduced, with the original mode in 8086 renamed to Real Mode. This introduced a completely different addressing mode;
  • With the increase of memory availability, the address bus has increased from 20-bit (1 MiB) to 24-bit (16 MiB).

80286 introduces

  • Protection mechanism to segments from unauthorized access. 8086 relies on faithfully execution from software to prevent issues;
  • Processor level segment size regulation. 8086 segments have no notion of segment length. Basically, every segment has access to the full 64KiB maximum size, and it relies on software to honor the contracts when there are overlapping segments.

80286 Manual Chapter 5 documents the disadvantages of Real Address Mode.

Segment Selectors and Segment Registers

Segment selectors take the place of segment bases in segment register 80286. The visible of the segment register remains 16 bits long and holds a segment selector; the new hidden part, holding 48 bits, serve as a cache for a descriptor, which we discuss later.

They are still the same registers, same length (visible part) and four (CS, DS, SS and ES) in total, but repurposed to support the new Segmentation Mechanism.

Segment selectors contain an index of how to find the segment information, and some additional information for Protection Mechanism:

Segment offset has the same meaning as in 8086, so the maximum segment size is also capped at 64KiB in 80286.

Segment selectors differ from 8086 segment bases in the sense that it’s no longer the actual segment base address, but a combination of

  1. An index into a Segment Descriptor Table, which contains the segment base address and some other information about the segment;
  2. Table Indicator used to select between different descriptor tables;
  3. Requested Privilege Level used for the protection mechanism.

Descriptors and Descriptor Tables

Segment descriptors is an fixed-sized data structure that contains key information of a segment:

  • Base is starting address of the segment. It contains 24 bits in total, which is address bus width of 80286;
  • Limit is the new concept that specifies the size of a segment. It contains 16 bits matching the maximum size of a segment in 80286 (64 KiB). The segment offset is checked against the limit by the processor;
  • DPL, short for Descriptor Privilege Level, supports the protection mechanism in 80286.

Descriptor Table is an array with each of its entry containing a descriptor of one segment. It records all available segments in the system.

There are mainly three such tables, GDT (Global Descriptor Table), LDT (Local Descriptor Table) and IDT (Interrupt Descriptor Table) , with their starting addresses denoted by special registers GDTR, LDTR and IDTR, respectively.

GDT, as its name suggests, is generally not switched and always points the same table. GDT represents a global address space shared by all different processes. Intel mandates that its first descriptor has to be a Null Descriptor. The rest of the table stores system segments, as well as descriptors for LDTs among other things.

LDT, on the other hand, is swapped every time the processor switches to a different process. This allows processes to have their private address space.

IDT is also a global table, and as its name suggests, it holds descriptors for interrupts. It replaces the IVT (Interrupt Vector Table) used in the Real Mode.

Privilege Levels and Protection Mechanism

Segment selectors and segment descriptors provide the backbone of a complex Protection Mechanism. For example,

  • TYPE is checked to make sure the segment selector/descriptor is compatible with the segment register that it loads to. Another example of TYPE check is to make sure the segment has the correct read/write attributes when performing on it;
  • LIMIT is automatically checked against the segment offset on every access to the segment;
  • Instruction set protection. Certain instruction sets are only allowed to execute in privileged mode.

Of all its protections (see more in Intel Manual Chapter 5), the privilege level check is likely the most complex one. A privilege level is a 2-bit value, with the lowest value being the highest privilege. A privilege level check happens when a segment selector is being loaded to the segment register. Before going into details, the terms used for privilege levels are

  • CPL (Current Privilege Level) is the lower two bits of the visible part of CS register, which is the privilege level of the selector currently loaded for the code segment. It represents the processor’s current executing privilege;
  • RPL (Request Privilege Level) is lower two bits of the selector to-be-loaded into a segment register. It represents the requester’s privilege level. RPL is normally same as CPL, with the exception of when a procedure is using a segment on behalf of some other procedure, and in this case, RPL represents the privilege level of the original requester. A typical scenario is when a user process (CPL=3) request a system procedure (CPL=0) to perform some operations that requires accessing a segment. In this case, user procedure (CPL=3) calls the system procedure (CPL=0) with a selector (RPL=3) to its target segment. For safety, the system procedure also uses the instruction ARPL (Adjust RPL) to ensure the user’s selector is properly set;
  • DPL (Descriptor Privilege Level) is the privilege level bits in the descriptor of the target segment. It declares the minimum privilege level required to use the segment.

The exact rules depends on the segment register type (and other factors), below is a simplified version:

Code Segment: loading the CS register means a control transfer (typically the ‘far’ version of call, jmp, int, iret instructions and exceptions/interrupts).

  • Without Call Gate: Depending on whether the target code segment is CONFORMING, CPL <= DPL or CPL == DPL is required;
  • With Call Gate: The actual check varies depending on CONFORMING and whether it’s JMP or CALL, but generally speaking CPL <= call gate DPL (DPL specifies the privilege level required by the caller), RPL <= call gate DPL (RPL is the privilege level of the call gate selector), target code segment DPL <= CPL (the direction must be towards a more privileged level). After the callee finishes, RET pops the registers from the stack and performs validation checks, e.g. make sure values are valid and that code segment CPL <= data segment DPL.

Data Segment: loading DS/ES register is subject to max(CPL, RPL) <= DPL. RPL is usually the same as CPL, except on circumstances where a procedure loads a segment register on behalf of its caller. On this cases, it issue an instruction ARPL to update the data segment selector to its caller’s privilege level before loading the data segment register.

Stack Segment: the SS register is required to reflect on the CPL, so CPL == DPL is required.

Any failed protection check mentioned in this section leads to a General Protection error.

i386 and x86

Segmentation Mechanism remained largely unchanged in Intel i386 (officially known as Intel 80386, release in 1985), with its biggest changes being adjusts for 32-bit support. Succeeding 32-bit Intel processors (Intel x86) mostly reuses this version of Segmentation Mechanism.

Also introduced in Intel i386 is Paging Mechanism, which became the preferred choice for memory management, and led to the eventual demise of Segmentation Mechanism. Major Operating System distribution shift to Paging and the use of Segmentation are mainly for compatibility reasons as it can not be turned off.

Changes in Segmentation Mechanism since Intel 80286

  • Support for 32-bit addressing. The unused portion extends the BASE from 24 bits to 32 bits; LIMIT is also extended from 16 bits to 20 bits. And with Granularity set, the unit of it become 4 KiB, which shifts LIMIT to the left by 12 bits creating segments with 32-bit size (4 GiB max segment size vs 64 KiB max segment size in Intel 80286);
  • Two additional general purpose segments are added: FS and GS. These are commonly used by OSes for thread-specific memory.

In real world, most x86 OSes use what Intel called “Flat Model” (Section 2.2.1). In this model, the code segment and the data segment both have BASE=0,LIMIT=4GiB, essentially disabling most of the segmentation checks.

x64

Segmentation has its place in the history books, and served its purpose well when memory was a scarce resource. As memory becomes abundant, Segmentation Mechanism’s disadvantages are magnified:

  • With its flexibility in segment sizes comes the side effect of external fragmentation, which is costly to memory allocation. In contrast, Paging simply allocates pages from its free list;
  • Similarly, swapping segment is also costly with non-fixed sized segments. Whereas with Paging, page swapping is simplified by offering fixed sized pages at the cost of sacrificing memory usage in internal fragmentation.

For x64 processors, the segmentation support is limited on the processor level

In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. Note that the processor does not perform segment limit checks at runtime in 64-bit mode.

References

Intel manuals:

Other references: