All notes

Windows Architecture

1. WinAPI

1.1. Overview

In Linux the OS interface is provided to the programmer using pseudo-files (POSIX standard) and glibc standard functions (on top of the syscalls). In Windows there is a huge set of dedicated functions used to get info about the system. This set is called WinAPI or Win32. WinAPI DLLs are built in every Windows OS. WinAPI is used in the user-space but it's somehow similar to the Linux kernel-space API. In general, WinAPI is very old and messy. There is a lot of libraries (GUI libraries for example) which provide a layers of abstraction on top of the WinAPI to make a developer's life easier. E.g. no one uses WinAPI for GUI development directly because it's very low-level and C-based.

List of all Windows API subsystems.

1.2. Development

Windows.h

windows.h is a Windows-specific header file (C/C++) which contains declarations for all assets (functions, consts, classes, types, enums, macros) in the Windows API. It includes a number of child header files (e.g. windef.h, winreg.h, winsvc.h, winuser.h) which are mentioned in the documentation but should not be included separately. There is no equivalent for that in the Linux world.

Naming

It's common to see specific suffixes in the WinAPI functions' names.

  • A - 8-bit char ANSI strings (doc) - standard C strings (LPCSTR).
  • W - 16-bit char Unicode strings (LPCWSTR strings)
  • Ex - extended (better, newer) functionality

Also prefixes (full list):

  • Nt/Zw - native Windows kernel API (syscall wrappers basically).
  • Hal - Hardware Abstraction Layer functionality.

Data types

WinAPI uses a lot of custom defined data types. If you interact with WinAPI, you should use them.

Error codes

Windows API and Native API functions return non-zero integer error code in case of any error.

NTSTATUS has a set of useful macros to check the return code status.

1.3. WinAPI layers

  1. User process - it calls the functions exported from WinAPI, e.g. CreateFileA function from subsystem DLL kernel32.dll.
  2. Subsystem DLL (Windows API)- set of useful functions to work with a specific subsystem. It might export the CreateFileA function which calls low-level NtCreateFile function exported from ntdll.dll.
  3. ntdll.dll (Native API) - the lowest layer available in user mode. Functions exported from this DLL have the Nt prefix. They are basically wrappers for syscalls (syscall instruction is executed) and should not be called directly because their API might change at any time without warning. These functions are often undocumented.
  4. Kernel (syscalls) - syscalls do the actual things with OS.

2. Memory Management

2.1. Page states

Memory page (4kB in size) can be in one of 3 states:

  1. Free - the page is not accessible to the process but it's available to be reserved or commited.
  2. Reserved - the page is reserved for future use. It's not in use yet but it's available to be commited.
  3. Commited - the page is allocated and used by the process. Access to the page is controlled by one of the memory protection constatns. It has its physical space in RAM or disk.

2.2. Page protection

Commited pages are protected using CPU features. Each page have a set of constants which defines its protection settings. Full list of protection constants.

Some examples:

  • PAGE_READONLY - enables read-only access.
  • PAGE_EXECUTE - enables execute access.
  • PAGE_NOACCESS - disables all access.

2.3. Memory protection

Operating systems have built-in memory protections:

  • Data Execution Prevention (DEP) - a feature that allows to mark some pages as not executable. By default it's handled by the hardware NX bit of x86 CPU. When a CPU architecture doesn't support the NX bit, a software implementation is used. In both cases it's managed using some of the standard page protection constants.
  • Address Space Layout Randomization (ASLR) - an OS feature that arranges the process memory space in a random manner.

3. Portable Executable format

PE is the file format for executables on Windows. A few examples of PE file extensions are .exe, .dll, .sys, .scr and .efi.

3.1. Export Directory

Data structure that contains information about the addresses of functions exported from the executable (DLL).

3.2. Import Address Table (IAT)

Data structure that contains information about the address of functions imported from other DLLs. Looking at this table we can determine what DLLs and functions the executable uses (compile-time linked).

4. Dynamic-Link Library (DLL)

DLLs are used to export functions to be used by a process. Unlike EXE files, DLL files cannot execute code on their own. Instead, DLL libraries need to be invoked by other programs.

There is a command to run an exported function without using a programmatical method: rundll32.exe <DLL_FILE_NAME>, <FUNCTION_NAME>.

4.1. Linking

Compile-time linking

Some DLLs are automatically loaded into every process since their functions are necessary for the process itself to execute properly: ntdll.dll, kernel32.dll, kernelbase.dll. These system-wide DLLs are mapped at the same base address in a memory of every process.

Run-time linking

Most of benefits of using DLLs come with using run-time linking method.

// Load example.dll into memory and retrieve a pointer to an ExampleFun
ExampleFuncPtr pExampleFunc = (ExampleFuncPtr)GetProcAddress(LoadLibraryA("example.dll"), "ExampleFunc");
 
if (pExampleFunc != NULL) {
    // Call the ExampleFunc    
    pExampleFunc(1)
}

4.2. Stripped DLL

When a DLL is stripped, unnecessary symbols and debugging information are removed, but symbols (functions, variables, etc.) that are essential for the DLL to function as intended are preserved. These essential symbols are what other executables or DLLs will link to. The stripped DLL retains the necessary header information that describes its exported functions and data.

5. Processes

Each process has a distinct PID assigned by the OS when the process is created. Each thread also have a unique ID that is used to differentiate it from the rest of the threads on the system.

Process memory types:

  • Private memory - dedicated to a single process and cannot be shared. It's used to store process data.
  • Mapped memory - can be shared between processes. It's used for shared DLLs and shared files.
  • Image memory - contains the code and data of an executable file.

5.1. PEB

Process Environment Block (PEB) is a data structure that contains information about a process used by OS. Every spawned process has its own PEB. PEB is stored in the process user-mode memory what makes it accessible for the process.

5.2. TEB

Thread Environment Block (TEB) is a data structure that stores information about a thread. Every thread has its own TEB. TEB is stored in the process user-mode memory. It's used by the kernel to manage threads.