Booting into our kernel

In the last post, we managed to create a rust binary that is independent of any OS dependencies

We now need to figure out how we will take that binary and convert it to a bootable disk image.

What happens when a computer turns on?

When a computer powers on, it does not know what an executable is.

All it knows is the following:

The CPU itself has its instruction pointer set to a hardwired address called the reset vector.

The Intelยฎ 64 and IA-32 Architectures Software Developer’s Manual states (on Page 3455):

๐Ÿ“š First instruction Executed (Volume 3A. Section 11.1.4)

The first instruction that is fetched and executed following a hardware reset is located at physical address 0xFFFFFFF0H.

This address is 16 bytes below the processor’s uppermost physical address.

The EPROM containing the software-initialization code must be located at this address.

That’s a lot of jibber jabber but all we need to focus on right now is that the CPU starts execution at that address.

The motherboard has a fixed set of instructions present at that address called the motherboard firmware.

What does the motherboard firmware do?

That firmware first makes sure that there isn’t anything in your computer that can blow up, using a process called the Power-On-Self-Test (or POST for short). Then it initializes a bunch of stuff in your computer such as RAM, hardware devices, PCIe devices and after all that, it figures out which of the disks can be used to boot i.e are “bootable”.

After this, a firmware be either one of two kinds.

Modern systems usually ship with UEFI support only, along with a separate “Compatibility Support Module” (CSM) that allows the firmware to pretend to be a legacy BIOS based firmware.

We’ll first talk about legacy BIOS and then talk about UEFI.

In case of BIOS, when all the startup routines are done, the firmware goes to the selected bootable disk and loads a single sector from LBA-0 into RAM at address 0x7c00

Sidebar: Address vs LBA

In order to refer to a location in memory, we usually use the term “address” however, in order to refer to a location on a disk, we use the term “Logical Block Address” (LBA).

The LBA is a number that represents the sector number/index on the disk. For example, LBA-0 refers to the first sector on the disk.

Circling back now.

That sector we have read has a special name. It is called the Master Boot Record and it is exactly 512 bytes (Disks are usually divided into 512 byte sectors)

The Master boot record (MBR) is laid out like this:

MBR Layout | Figure 3.A
MBR Layout | Figure 3.A

The firmware first checks if the 16 bit value present at last 2 bytes of the MBR is equal to 0x55aa

(at address 0x7c00 + 0x1fe i.e 0x7dfe and 0x7c00 + 0x1ff i.e 0x7dff.

0x55aa is a magic number that basically tells the BIOS that the sector is a valid boot sector

Then, it jumps back to the start (to address 0x7c00, i.e the “Section A” in Figure 3.A) where our bootloader code can start executing


Alright, that was a *LOT* of information. So let’s take a breather and recap what we’ve learned so far. Here’s what happens when you press the power button on your computer:

  1. CPU instruction pointer reset to 0xFFFFFFF0H
  2. Motherboard contains fixed instructions at that address (firmware)
  3. Control is handed over to the firmware

Then, the firmware does the following:

  1. Perform the Power-On-Self-Test
  2. Initialize a bunch of stuff like, RAM, PCIe devices, disks etc
  3. In case we’re using BIOS firmware, it reads 512 bytes from disk at LBA-0 and loads them into memory at 0x7c00
  4. Checks if the last 2 bytes of that sector are equal to 0x55aa
  5. If yes, then jump back to the start at address 0x7c00
  6. Start executing code there.

Cool.

Now we can finally start writing some actual code! and run our operating system (at least some semblance of an operating system).


A bootloader

We’ll start with a basic bootloader of our own to at least see **some** output on our screen. Then, we’ll move to a more concrete and permanent bootloader in a bit.

However, we won’t be rolling our own bootloader just yet and follow Linux’s lead to use a battle tested bootloader instead otherwise, it’ll distract us from our main goal of writing an OS without providing any meaningful insight besides “This is how the manufacturers set things up, so write this magic number to this address, write these instructions verbatim” etc.

Our basic bootloader will be written in Assembly so, you’ll need an assembler to assemble our code.

I’m using nasm, which uses the Intel Syntax, so if you’re using some other assembler which uses another syntax (e.g AT&T Syntax), you need to make changes to your code accordingly).

Why assembly? Because C/Rust etc only emit protected mode binaries, and when we boot, we will be in real mode which means, we can’t use rust code there.

This is one of the things that these languages simply can’t do.

If you’re not too familiar with Assembly, that’s fine. We won’t be using it too extensively throughout this series. And whenever we *do* use it, I’ll be sure to explain every bit of it, so we’re all on the same page.

We also need an emulator to actually run it (I’m using qemu-system-x86-64). We will pass our compiled binary to it and it’ll treat it as a disk image.

You can either copy this code as is, or head on over to this post ๐Ÿ‘‰ Minimal MBR in Assembly for more details

mbr.asm
 1[org 0x7c00]
 2
 3bits 16
 4
 5FOREGROUND equ 0x2 ; Green
 6BACKGROUND equ 0x0 ; Black
 7
 8NORMALIZED_BACKGROUND equ (BACKGROUND << 4)
 9
10ATTRIBUTE equ FOREGROUND | NORMALIZED_BACKGROUND
11
12VGA_BUFFER_WIDTH equ 80
13VGA_BUFFER_HEIGHT equ 25
14
15CELL_COUNT equ VGA_BUFFER_WIDTH * VGA_BUFFER_HEIGHT
16
17.set_initial_memory_addresses:
18    mov ax, cs
19    mov ds, ax
20
21    mov ax, 0xb800
22    mov es, ax
23
24    xor di, di
25
26    cld
27
28.initialize_data_source:
29    mov si, message
30
31.setup_write_text:
32    mov cx, message_len
33    mov ah, ATTRIBUTE
34
35.perform_write_text:
36    lodsb
37    stosw
38    loop .perform_write_text
39
40.setup_clear:
41    mov cx, (CELL_COUNT - message_len)
42    mov al, ' '
43
44.perform_clear:
45    stosw
46    loop .perform_clear
47
48.hang:
49    hlt
50    jmp .hang
51
52message db "Hello world!"
53message_len equ $ - message
54
55times 510 - ($ - $$) db 0
56
57dw 0xAA55

Now assemble it with

1# Tell nasm to assemble it in a raw binary format
2nasm -f bin mbr.asm -o mbr.bin

And finally, run it using QEMU

1# We'll be using qemu to emulate a computer
2qemu-system-x86_64 mbr.bin

If all goes well, you should see the following screen

Successful First Boot
Successful First Boot


๐ŸŽ‰ Woohooo!! Our first kernel code. *IT RUNS!*

It’s not much, but whatever it is, it is ours. We made it. From scratch (unless you’re one of the Carl Sagan followers ๐Ÿ˜‰)

This is a huge step in the journey towards building our own operating system.

In the next post, we’ll move over to a much more battle tested bootloader instead of our primitive one and also try to integrate our rust code into it.

Our goal will be to use the bootloader to load our rust code and leave assembly behind so that we don’t have to constantly fiddle with registers, memory addresses, and instruction flow.

Next: Integrating a concrete, battle tested bootloader