Let’s remember the function that we want to implement:
void traceMemory(void *Addr, uint64_t Value, bool IsLoad) {
if (IsLoad)
fprintf(_MemoryTraceFP, "[Read] Read value 0x%lx from address %p\\n", Value, Addr);
else
fprintf(_MemoryTraceFP, "[Write] Wrote value 0x%lx to address %p\\n", Value, Addr);
}
We will need to implement this in LLVM IR using llvm::IRBuilder
- which as we saw in the previous section is an extremely powerful tool for building arbitrary LLVM IR sequences.
Let’s write this function into a .c
file and compile it with clang -emit-llvm
to get a sense of the LLVM IR we should write (toggled away for brevity):
<aside> 💡 The decision to trace both memory stores and memory loads from the same function is not a trivial one.
Since our LLVM Pass gives us full control, we could easily define two separate functions - traceMemoryLoad
and traceMemoryStore
- and call the appropriate function for load and store instructions. This would save us from having to branch in our memory trace function - we’d be “offloading” the branching to compile time, which is pretty cool.
I chose to implement a shared function because I thought it would be a bit more interesting to implement a branched function in our pass - and saving on a branch seems like an unnecessary optimization given that we’re calling the very heavy fprintf
from our function anyways.
</aside>
We can see that our function centers on an icmp
instruction, followed by a br
- which will jump to either the load or store trace based on the comparison result. Each branch then ends with an unconditional jump to a ret void
.
Seems simple enough, let’s start coding this in our pass:
We’ve already seen usage of llvm::Module::getOrInsertFunction
back when we wanted to make sure that we had access to fprintf
. But whilst fprintf
is a C standard library function - with the C standard library available to us at link time - this time we’ll be inserting our very own function!
Once again, we’ll be inserting directly into our main
module - and as our function will be externally linked, we’ll be able to access it from all compilation modules.
We’ll add a function call to our run
method:
llvm::PreservedAnalyses run(llvm::Module &M,
llvm::ModuleAnalysisManager &) {
Function *main = M.getFunction("main");
if (main) {
addGlobalMemoryTraceFP(M);
addMemoryTraceFPInitialization(M, *main);
addTraceMemoryFunction(M);
errs() << "Found main in module " << M.getName() << "\\n";
return llvm::PreservedAnalyses::none();
} else {
errs() << "Did not find main in " << M.getName() << "\\n";
return llvm::PreservedAnalyses::all();
}
}
We’ll start by implementing an empty externally-linked function that just calls ret void
:
const std::string TraceMemoryFunctionName = "_TraceMemory";
void addTraceMemoryFunction(llvm::Module &M) {
auto &CTX = M.getContext();
std::vector<llvm::Type*> TraceMemoryArgs{
PointerType::getUnqual(Type::getInt8Ty(CTX)),
Type::getInt64Ty(CTX),
Type::getInt32Ty(CTX)
};
FunctionType *TraceMemoryTy = FunctionType::get(Type::getVoidTy(CTX),
TraceMemoryArgs,
false);
FunctionCallee TraceMemory = M.getOrInsertFunction(TraceMemoryFunctionName, TraceMemoryTy);
llvm::Function *TraceMemoryFunction = dyn_cast<llvm::Function>(TraceMemory.getCallee());
TraceMemoryFunction->setLinkage(GlobalValue::ExternalLinkage);
llvm::BasicBlock *BB = llvm::BasicBlock::Create(CTX, "entry", TraceMemoryFunction);
IRBuilder<> Builder(BB);
Builder.CreateRetVoid();
}
The first part of addTraceMemoryFunction
is familiar - it’s similar to how we included fprintf
- we just need to define the parameters we want and the function signature we’re defining, and then we can call getOrInsertFunction
.
The second part is more interesting - we need to explicitly create a basic block we can add opcodes to, because by default LLVM will just emit an empty function:
declare void @_TraceMemory(i8*, i64, i32)
By explicitly creating a basic block and adding our ret void
opcode, we get:
define void @_TraceMemory(i8* %0, i64 %1, i32 %2) {
entry:
ret void
}