Writing the Pass - Implementing the Memory Trace Function

Let’s remember the function that we want to implement:

void traceMemory(void *Addr, uint64_t Value, bool IsLoad) {
    if (IsLoad)
        fprintf(_MemoryTraceFP, "[Read] Read value 0x%lx from address %p\\n", Value, Addr);
    else
        fprintf(_MemoryTraceFP, "[Write] Wrote value 0x%lx to address %p\\n", Value, Addr);
}

We will need to implement this in LLVM IR using llvm::IRBuilder - which as we saw in the previous section is an extremely powerful tool for building arbitrary LLVM IR sequences.

Let’s write this function into a .c file and compile it with clang -emit-llvm to get a sense of the LLVM IR we should write (toggled away for brevity):

LLVM IR

<aside> 💡 The decision to trace both memory stores and memory loads from the same function is not a trivial one.

Since our LLVM Pass gives us full control, we could easily define two separate functions - traceMemoryLoad and traceMemoryStore - and call the appropriate function for load and store instructions. This would save us from having to branch in our memory trace function - we’d be “offloading” the branching to compile time, which is pretty cool.

I chose to implement a shared function because I thought it would be a bit more interesting to implement a branched function in our pass - and saving on a branch seems like an unnecessary optimization given that we’re calling the very heavy fprintf from our function anyways.

</aside>

We can see that our function centers on an icmp instruction, followed by a br - which will jump to either the load or store trace based on the comparison result. Each branch then ends with an unconditional jump to a ret void.

Seems simple enough, let’s start coding this in our pass:

Step One - Creating the Function

We’ve already seen usage of llvm::Module::getOrInsertFunction back when we wanted to make sure that we had access to fprintf. But whilst fprintf is a C standard library function - with the C standard library available to us at link time - this time we’ll be inserting our very own function!

Once again, we’ll be inserting directly into our main module - and as our function will be externally linked, we’ll be able to access it from all compilation modules.

We’ll add a function call to our run method:

llvm::PreservedAnalyses run(llvm::Module &M,
                        llvm::ModuleAnalysisManager &) {
    Function *main = M.getFunction("main");
    if (main) {
            addGlobalMemoryTraceFP(M);
            addMemoryTraceFPInitialization(M, *main);
            addTraceMemoryFunction(M);
            errs() << "Found main in module " << M.getName() << "\\n";
            return llvm::PreservedAnalyses::none();
    } else {
            errs() << "Did not find main in " << M.getName() << "\\n";
            return llvm::PreservedAnalyses::all();
    }
}

We’ll start by implementing an empty externally-linked function that just calls ret void:

const std::string TraceMemoryFunctionName = "_TraceMemory";

void addTraceMemoryFunction(llvm::Module &M) {
    auto &CTX = M.getContext();

    std::vector<llvm::Type*> TraceMemoryArgs{
        PointerType::getUnqual(Type::getInt8Ty(CTX)),
        Type::getInt64Ty(CTX),
        Type::getInt32Ty(CTX)
    };

    FunctionType *TraceMemoryTy = FunctionType::get(Type::getVoidTy(CTX),
                                                    TraceMemoryArgs,
                                                    false);

    FunctionCallee TraceMemory = M.getOrInsertFunction(TraceMemoryFunctionName, TraceMemoryTy);

    llvm::Function *TraceMemoryFunction = dyn_cast<llvm::Function>(TraceMemory.getCallee());
    TraceMemoryFunction->setLinkage(GlobalValue::ExternalLinkage);

    llvm::BasicBlock *BB = llvm::BasicBlock::Create(CTX, "entry", TraceMemoryFunction);
    IRBuilder<> Builder(BB);

    Builder.CreateRetVoid();
}

The first part of addTraceMemoryFunction is familiar - it’s similar to how we included fprintf - we just need to define the parameters we want and the function signature we’re defining, and then we can call getOrInsertFunction.

The second part is more interesting - we need to explicitly create a basic block we can add opcodes to, because by default LLVM will just emit an empty function:

declare void @_TraceMemory(i8*, i64, i32)

By explicitly creating a basic block and adding our ret void opcode, we get:

define void @_TraceMemory(i8* %0, i64 %1, i32 %2) {
entry:
  ret void
}