Crumb, Getting Linux to Start
Getting Linux running on Crumb is a major part of the development as categorized here. Doing so is not so easy, and as one part of the long processes, is at least getting Linux to somewhat run on the emulator.
This involves fixing many bugs, and so for a start my first aim was to get Linux to be able to log out to the console. This all takes place during early printk and the Linux kernel should eventually reach a road block as there is not interrupt controller on the system.
Despite how early in the boot process this is, there wany many hard bug that had to be sorted out. This took about 3 days of effort once the debugger was built.
Here I will list out the issues, and how I fixed them.
Issue 0 - set_cpu_online in cpu.c causes access error.
Source:
CRUMB Software Emulator cpu_core.c
Cause:
When CRUMB emulator does arithmetic shift right on 32 bit words, it would incorrectly sign extend the value when the MSB was not set, unlike the 64 bit versions that used a conditional and mask to check for this.
The code for SRAW and SRAIW had to be changed to include said conditional.
The result that was during taking the CPUs online, the address of __num_online_cpus would be incorrectly calculated due to large negative numbers being added to the address causing a trap.
Commented Source
// If a cpu is not marked as online, marks it online and
// atomicly increments the number of online CPUs.
void set_cpu_online(unsigned int cpu, bool online)
{
// Bug Conditions
// cpu = 0
// online = true
if (online) {
// Set bit in mask if not set, continue.
// Returns the masked bits before it was set.
if (!cpumask_test_and_set_cpu(cpu, &__cpu_online_mask))
atomic_inc(&__num_online_cpus);
} else {
if (cpumask_test_and_clear_cpu(cpu, &__cpu_online_mask))
atomic_dec(&__num_online_cpus);
}
}
Commented Disassembly
set_cpu_online:
# Function Header, save stack frame
addi sp,sp,-16
sd s0,8(sp)
addi s0,sp,16
# A4 = (int)cpu
sext.w a4,a0
# A5 = 0x3F
li a5,63
# if(online)
beqz a1,0x8000f1b0 NOT_ONLINE
# Assert A0 > 63, then error. (No more than 63 CPUs)
bltu a5,a0,0x8000f1f8 ASSERT_0
# A5 = ((int)cpu) >> 57
sraiw a5,a4,0x1f
srliw a5,a5,0x1a
# A5 = A5 + A4
addw a5,a5,a4
# A5 = A5 >> 6
sraiw a5,a5,0x6
# A3 = &__cpu_online_mask
auipc a3,0x62a
addi a3,a3,54 # 0x806391b8 <__cpu_online_mask>
# A5 = A5 << 3
slli a5,a5,0x3
# A4 = (1 << CPU)
li a4,1
sll a4,a4,a0
# A5 = A5 + &__cpu_online_mask
add a5,a5,a3
amoor.d.aqrl a3,a4,(a5)
and a4,a4,a3
bnez a4,0x8000f1aa
auipc a5,0x62a
addi a5,a5,-12 # 0x80639190 <__num_online_cpus>
li a4,1
amoadd.w zero,a4,(a5)
ld s0,8(sp)
addi sp,sp,16
ret
NOT_ONLINE:
bltu a5,a0,0x8000f1f4
sraiw a5,a4,0x1f
srliw a5,a5,0x1a
addw a5,a5,a4
sraiw a5,a5,0x6
li a4,1
sll a4,a4,a0
auipc a3,0x62a
addi a3,a3,-16 # 0x806391b8 <__cpu_online_mask>
slli a5,a5,0x3
add a5,a5,a3
not a3,a4
amoand.d.aqrl a3,a3,(a5)
and a4,a4,a3
beqz a4,0x8000f1aa
auipc a5,0x62a
addi a5,a5,-80 # 0x80639190 <__num_online_cpus>
li a4,-1
amoadd.w zero,a4,(a5)
ld s0,8(sp)
addi sp,sp,16
ret
ASSERT_1:
ebreak
j 0x8000f1b4
ASSERT_0:
ebreak
j 0x8000f174
Hand Traced
Expected Program Flow:
set_cpu_online:
# A0 = 0
# A1 = 1
# We will skip the header
addi sp,sp,-16
sd s0,8(sp)
addi s0,sp,16
# A4 = 0
sext.w a4,a0
# A5 = 63
li a5,63
# Should not branch
beqz a1,0x8000f1b0 NOT_ONLINE
# Should not branch
bltu a5,a0,0x8000f1f8 ASSERT_0
# A5 = 0
sraiw a5,a4,0x1f
# A5 = 0
srliw a5,a5,0x1a
# A5 = 0
addw a5,a5,a4
# A5 = 0
sraiw a5,a5,0x6
# A3 = 0x806391b8
auipc a3,0x62a
addi a3,a3,54
# A5 = 0
slli a5,a5,0x3
# A4 = 1
li a4,1
# A4 = 1
sll a4,a4,a0
# A5 = &__cpu_online_mask
add a5,a5,a3
# This should work
amoor.d.aqrl a3,a4,(a5)
What actually happened:
# A4 = 0
# Set A5 to -2. Very sus
sraiw a5,a4,0x1f
The shift right sign extended, when there was no 1 in the MSB to begin with. The troubled code:
case 0b101:
resultant = rs1 >> (imm & 0xFF);
if(imm & 0x400) {
#ifdef DEBUG_TRACE
Log(TAG, "sraiw X%d, X%d, %d", rd_i, rs1_i, imm);
#endif
resultant |= 0xFFFFFFFF ^ (0xFFFFFFFF >> imm);
} else {
#ifdef DEBUG_TRACE
Log(TAG, "srliw X%d, X%d, %d", rd_i, rs1_i, imm);
#endif
}
break;
The fixed code:
case 0b101:
resultant = rs1 >> (imm & 0xFF);
if(imm & 0x400) {
#ifdef DEBUG_TRACE
Log(TAG, "sraiw X%d, X%d, %d", rd_i, rs1_i, imm);
#endif
if(resultant & 0x80000000)
resultant |= 0xFFFFFFFF ^ (0xFFFFFFFF >> imm);
} else {
#ifdef DEBUG_TRACE
Log(TAG, "srliw X%d, X%d, %d", rd_i, rs1_i, imm);
#endif
}
break;
Same needed to be fixed for similar arithmetic shift right instructions, though only the 32 bit word ones were effected.
WARNING: A second bug appears in this code. Look to Issue 3 - Messages in Kernel Log Missing Formatting
Issue 1 - memblock_alloc_try_nid in memblock.c causes access error.
Source:
CRUMB Software Emulator system.c
Cause:
When a memory is allocated during early boot for the creation of the flattened device tree, the kernel correctly allocates memory at the end of the kernels memory space, in this case 0xBFFFFFFF. With memory starting at 0x80000000.
Here the issue was that Crumb accidently only mapped 1/16 of the 16 MB pages causes anything past them to come up as unmapped throwing an exception.
This crash would occur after the block has been allocated by a child function, and then crash as it attempts to memset it to zeros.
Original Code:
for(int i = 0; i < 0x70; i+=0x10) {
inst->io_level_table[0x80+i].instance = &inst->rom;
inst->io_level_table[0x80+i].on_write = memory_write;
inst->io_level_table[0x80+i].on_read = memory_read;
}
Fixed Code:
for(int i = 0; i < 0x70; i++) {
inst->io_level_table[0x80+i].instance = &inst->rom;
inst->io_level_table[0x80+i].on_write = memory_write;
inst->io_level_table[0x80+i].on_read = memory_read;
}
Issue 2 - pcpu_alloc_first_chunk crashes with access error in percpu.c
Source:
CRUMB Software Emulator cpu_core.c
Cause:
Missing bit when decoding the intermediate value of the c.lui instruction resulting in a crash due to heap corruption from a undersized buffer due to an error when calculating its size.
This would corrupt a pointer to 0xFFFFFFFFFFFFFFFF causing the access error.
pcpu_alloc_first_chunk Crash Analysis
Commented Source
Cut to include the part causing the trouble.
...
if (chunk->start_offset) {
offset_bits = chunk->start_offset / PCPU_MIN_ALLOC_SIZE;
bitmap_set(chunk->alloc_map, 0, offset_bits);
set_bit(0, chunk->bound_map); // CRASH
set_bit(offset_bits, chunk->bound_map);
chunk->chunk_md.first_free = offset_bits;
pcpu_block_update_hint_alloc(chunk, 0, offset_bits);
}
...
Broken Down Disassembly
eqz s2,0x803f93e8 # If chunk->start_offset
li s6,4
divw s7,s2,s6
ld a0,72(s1)
li a1,0
mv a2,s7
auipc ra,0xffead
jalr -750(ra) # bitmap_set
ld a5,56(s1) # Get address from stack frame (a invalid address)
amoor.d zero,s5,(a5) # CRASH
li a5,256
divw s2,s2,a5
ld a5,56(s1)
sll s5,s5,s7
slli s2,s2,0x3
add a5,a5,s2
amoor.d zero,s5,(a5)li a1,0
sw s7,44(s1)
mv a2,s7
mv a0,s1
auipc ra,0xffd42
jalr -1392(ra) # 0x8013ae70
lw s2,104(s1)
# End of if statement.
After following the code around jalr -750(ra) # bitmap_set
we can observe that s1
is never corrupted, but *($s1 + 56)
before it is a valid memory address, and changes to 0xFFFFFFFFFFFFFFFF after.
bitmap_set
must be corrupting the values at the stack frame.
It is apparent that it is likely setting bits to 1, and bleeding over into neighboring values.
Up to calling
s1 = 0x8fbe5cc0 # Stack Frame
a0 = 0x8fbe5b40 # chunk->alloc_map
a1 = 0 # Start Bit
a2 = 33704 # Length Bits (4,213 bytes)
chunk->bound_map = 0x8FBE5CF8 # s1 + 56
Based on this we can see there is 440 bytes between the physical locations of chunk->alloc_map and the pointer of chunk->bound_map
. Since we will be settings 4213 bytes worth of bits, it is clear that it will be well overridden by bitmap_set
causing the error.
This could be caused by a bad placements of the buffers, or the len bits being too large.
pcpu_alloc_first_chunk start_offset Analysis
Start Offset Calculation
/* region calculations */
aligned_addr = tmp_addr & PAGE_MASK; // # #define PAGE_MASK 0xFFFFF
start_offset = tmp_addr - aligned_addr;
region_size = ALIGN(start_offset + map_size, PAGE_SIZE);
This shows start_offset
nor aligned_addr
can be greater than 0xFFFFF000
.
That puts start_offset at 134816 bytes, thus how do the allocations get so small.
Allocation Size Calculation
chunk->base_addr = (void *)aligned_addr;
chunk->start_offset = start_offset;
chunk->end_offset = region_size - chunk->start_offset - map_size;
chunk->nr_pages = region_size >> PAGE_SHIFT;
region_bits = pcpu_chunk_map_bits(chunk);
alloc_size = BITS_TO_LONGS(region_bits) * sizeof(chunk->alloc_map[0]);
chunk->alloc_map = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
if (!chunk->alloc_map)
panic("%s: Failed to allocate %zu bytes\n", __func__,
alloc_size);
Here it seems faulty that region_bits
is only calculated to be 3072 bits.
pcpu_chunk_map_bits
expands to chunk->nr_pages * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE
.
Disassembly
A lot has been optimized out by the compiler, making this arithmetic hard to follow.
So we will just follow it and make sure we got the expected values.
mv s9,s3 # s3,s9 = 3
slli s3,s3,0xa # s3 = 3072
addiw a4,s3,63 # a4 = 3135
srli a4,a4,0x6 # a4 = 48
sd a0,0(a0)
subw a5,s2,s4
slli s2,a4,0x3 # s2 = 384
# Push saved registers to the stack
mv s1,a0
sd a0,8(a0)
sd s8,64(a0)
sw s7,100(a0)
sw a5,104(a0)
sw s9,120(a0)
# Call function
li a4,-1
li a3,0
li a2,0
li a1,64
mv a0,s2 # a0, alloc_size, = 384 bytes
jal memblock_alloc_try_nid
This matches the emulator's output 384 bytes. Algebraically then, the initial value of s3
must be wrong if a bad allocation size is to be expected.
pcpu_alloc_first_chunk, Tracing S3
Here is the code, (at the beginning of the function), where s3 is set before reaching the code from before. We will equate this out to the C code as well working backwards.
C Code
struct pcpu_chunk *chunk;
unsigned long aligned_addr;
int start_offset, offset_bits, region_size, region_bits;
size_t alloc_size;
/* region calculations */
aligned_addr = tmp_addr & PAGE_MASK;
start_offset = tmp_addr - aligned_addr;
region_size = ALIGN(start_offset + map_size, PAGE_SIZE);
/* allocate chunk */
alloc_size = struct_size(chunk, populated,
BITS_TO_LONGS(region_size >> PAGE_SHIFT));
Dissassembly
Disassembly, and stepped through by hand.
# a0 = 0x8fbf3ea0 = tmp_addr
# a1 = 8192 = map_size
pcpu_alloc_first_chunk:
# Begining of function stack push.
addi sp,sp,-96
sd s8,16(sp)
# aligned_addr = tmp_addr & PAGE_MASK;
# PAGE_MASK = 0xFFFFF000
lui s8,0xfffff # s8 = 0xFFFFF000
and s8,a0,s8 # s8 = 0x8FBF3000
sd s7,24(sp)
# = 4096
lui a5,0x1 # a5 = 4096
# start_offset = tmp_addr - aligned_addr;
subw s7,a0,s8 # s7 = EA0
sd s4,48(sp)
# A5 = 4095
addiw a5,a5,-1 # 0xfff # a5 = 4095
# s3 = ((start_offset + map_size + 4095) & 0xfffff000) >> 12
addw s4,s7,a1 # s4 = 11936
lui a4,0xfffff # a4 = 0xFFFFF000
addw a5,s4,a5 # a5 = 16031
and a5,a5,a4 # a5 = 16000
sd s3,56(sp)
sraiw s3,a5,0xc # s3 = 16
...
Here we get the value of 16, not 3. We will step through the code in the debugger until there is a difference.
and s8,a0,s8
returns 0x8fbd3000 instead of 0x8fbf3000. The LUI instruction sets s8
to 0xfffffffffffdf000
instead of 0xfffffffffffff000
. A 1 bit difference.
Resolution
This is a compressed, c.lui instruction.
Here is the cpu_core.c
's code for it:
...
} else { // C.LUI
uint64_t imm = ((op & (1<<12)) ? 0xFFFFFFFFFFFC0000 : 0) |
((op << 10) & 0b11111000000000000);
#ifdef DEBUG_TRACE
Log(TAG, "c.lui x%d, %d", rd_i, imm);
#endif
v = imm;
}
...
Here we can see the intermediate value is calculated, that C needs to be an E:
...
} else { // C.LUI
uint64_t imm = ((op & (1<<12)) ? 0xFFFFFFFFFFFE0000 : 0) |
((op << 10) & 0b11111000000000000);
#ifdef DEBUG_TRACE
Log(TAG, "c.lui x%d, %d", rd_i, imm);
#endif
v = imm;
}
...
Issue 3 - Messages in Kernel Log Missing Formatting
Source:
CRUMB Software Emulator cpu_core.c
Cause:
A bug with how the intermediate value is used to sign extend during a sraiw instruction resulted in no sign extension at all.
This caused a miscalculation in buffer sizes returned from vsnprintf causing corrupted kernel log messages.
Investigation
The kernel logging function kprintf
uses vsnprintf
to calculate the size of the buffer needed for the message before calling vsnprintf
again to fill in the message.
When formatting is used through, the size is too small, and the message is clipping, often making it appear there was no formatting to begin with.
This miscalculation occurs in widen_string
where the iterated buffer is set back by a large negative number.
Commented Source
char *widen_string(char *buf, int n, char *end, struct printf_spec spec)
{
// field_width = -1
// n = some normal small posative number
unsigned spaces;
// Should exit here, but it does not and keeps going!
if (likely(n >= spec.field_width))
return buf;
/* we want to pad the sucker */
// Results in negative number, since spaces
// is unsigned, it becomes very large
spaces = spec.field_width - n;
if (!(spec.flags & LEFT)) { // Branch not taken.
// Undoes all the movement to buf and undoes all string operatinons added before.
move_right(buf - n, end, n, spaces);
return buf + spaces;
}
while (spaces--) {
if (buf < end)
*buf = ' ';
++buf;
}
return buf;
}
Commented Assembly
addi sp,sp,-64
sd s0,48(sp)
sd s1,40(sp)
addi s0,sp,64
sd ra,56(sp)
# a3 is negative 32 bit number
# After shift a4 is weird posative number
sraiw a4,a3,0x8
sd a3,-56(s0)
mv s1,a0
# n >= spec.field_width inverted
# a1 is normal at this point, our n value
# a4 though is a weird posative number
blt a1,a4,0x8050def4
sraiw
must be the culprit here.
Resolution
Troubled code from cpu_core.c
As we can see the XOR operation's bit shift used to sign extend the shifted value is enabled when an arithmetic shift is selected, simultaneously though, we forget to mask the imitate value.
if(imm & 0x400) {
#ifdef DEBUG_TRACE
Log(TAG, "sraiw X%d, X%d, %d", rd_i, rs1_i, imm);
#endif
if(resultant & 0x80000000)
resultant |= 0xFFFFFFFF ^ (0xFFFFFFFF >> imm);
}
The corrected code:
if(imm & 0x400) {
#ifdef DEBUG_TRACE
Log(TAG, "sraiw X%d, X%d, %d", rd_i, rs1_i, imm);
#endif
if(resultant & 0x80000000)
resultant |= 0xFFFFFFFF ^ (0xFFFFFFFF >> (imm & 0xFF));
}
The second bug here is that we don't want to check the resultant for bits, we want to check the input.
if(imm & 0x400) {
#ifdef DEBUG_TRACE
Log(TAG, "sraiw X%d, X%d, %d", rd_i, rs1_i, imm);
#endif
if(rs1 & 0x80000000)
resultant |= 0xFFFFFFFF ^ (0xFFFFFFFF >> (imm & 0xFF));
}
Our Pay Off
Now that all those issues have been fixed, the Linux kernel is now able to boot and actually print out an error through earlycon
that we have a missing interrupt controller.
Maybe now we can focus more on Linux and not CEMU, our emulator, for now.
Not for long though, soon Linux will likely reach issues with how interrupts and CSRs are handled.
[ 0.000000] Linux version 6.11.0-rc3 (rainbain@debian) (riscv64-unknown-linux-gnu-gcc (gc891d8dc23e) 13.2.0, GNU ld (GNU Binutils) 2.42) #12 Mon Sep 9 00:49:02 UTC 2024
[ 0.000000] Machine model: Crumb RISC-V SoC
[ 0.000000] Forcing kernel command line to: earlycon=uart8250,mmio32,0x10000000
[ 0.000000] earlycon: uart8250 at MMIO32 0x0000000010000000 (options '')
[ 0.000000] printk: legacy bootconsole [uart8250] enabled
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000bfffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080000000-0x00000000bfffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000000bfffffff]
[ 0.000000] Falling back to deprecated "riscv,isa"
[ 0.000000] riscv: base ISA extensions acdfim
[ 0.000000] riscv: ELF capabilities acdfim
[ 0.000000] Kernel command line: earlycon=uart8250,mmio32,0x10000000
[ 0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 262144
[ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Invalid parameters for execmem allocator, module loading will fail
[ 0.000000] RCU Tasks Trace: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1.
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] Kernel panic - not syncing: No interrupt controller found.