Encode Branches

The goal of these transformations is to make it harder for automatic analysis tools (such as disassemblers) to determine the target of branches.


Branch Functions

This transformation implements a simplistic version of Linn and Debray's branch functions. We doen't use perfect hash tables, as suggested in Linn and Debray's paper, since this is hard to do as a source-to-source transformation. Rather, we simply pass the offset to jump to as an argument to the branch function.

The generated code looks like this, where the call to the branch function bf actually results in a direct jump to lab2:

void bf(unsigned long offset) {
  __asm__  volatile   ("addq  %0, 8(%%rbp)": : "r" (offset));
}

int main() {
   bf((unsigned long)(&& lab2) - (unsigned long)(&& lab3));
   lab3: 
       __asm__  volatile   (".byte 0x76,0x9b,0x8e,0x1b,0x4d":);
   ...
   lab2: ...;
}

By default, a function is flattened prior to direct jumps being replaced by calls to a branch function. This creates more direct jumps and hence more opportunities to apply the branch function transformation. Turn this off with --BranchFunsFlatten=false.

Before branches can be replaced by calls to a branch function, at least one such function needs to be constructed, using the --Transform=InitBranchFuns transformation.

The branch function is not obfuscated and hence trivial to find. It's therefore a good idea to merge it with other functions in the program.

OptionArgumentsDescription
--Transform InitBranchFunctions Initialize so that branch functions can be insered at a later time.
--InitBranchFunsOpaqueStructs list, array, input, env, * Comma-separated list of the kinds of opaque constructs to use when obfuscating the branch function. Default=list,array.
  • list = Generate opaque expressions using linked lists
  • array = Generate opaque expressions using arrays
  • input = Generate opaque expressions that depend on input. Requires --Inputs to set invariants over input.
  • env = Generate opaque expressions from entropy. Requires --InitEntropy.
  • * = Same as list,array,input,env
--InitBranchFunsCount INTSPEC How many branch functions to add.
--InitBranchFunsObfuscate BOOLSPEC Whether to obfuscate the branch function. Default=false.

X86 Branch Obfuscations

We implement two standard branch obfuscations used by many packers:

      push target
      call lab
      ret
lab:
      ret

and

      push target
      ret


NOP sleds

The --AntiBranchAnalysisKinds=goto2nopSled switch turns this code

      goto L
      ...
   L:

into this code

      goto *(R+expression)
      ...
    R: 
      nop
      nop
      ...
      nop
    L:

The expression is opaque such that the branch falls somewhere within the nop sled. The intention is to combine this transformation with input-dependent opaque predicates so that the actual jump address will be random and input dependent:

tigress --Input=... \
        --Transform=InitOpaque 
           --InitOpaqueKind=Input \
        --Transform=AntiBranchAnalysis \
            --AntiBranchAnalysisKinds=goto2nopSled \
            --AntiBranchAnalysisOpaqueStructs=Input 

The current nop-sled is trivial, consisting of random lists of x86 bytes that have no effect:

   cmc
   std
   cld
   nop
   stc
   cmc
   clc
   stc
   wait
   ...

OptionArgumentsDescription
--Transform AntiBranchAnalysis Replace branches with other constructs.
--AntiBranchAnalysisKinds branchFuns, goto2call, goto2push, goto2nopSled, * Comma-separated list of the kinds of constructs branches can be replaced with. Default=branchFuns.
  • branchFuns = Generate calls to branch functions. --Transform=InitBranchFuns must be given prior to this transform
  • goto2call = Replace goto L with push L; call lab; ret; lab: ret
  • goto2push = Replace goto L with push L; ret
  • goto2nopSled = Replace goto L with goto *p where p is the address of a sequence of nop:s that eventually lead to L
  • * = Same as branchFuns,goto2call,goto2push
--AntiBranchAnalysisOpaqueStructs list, array, input, env, * Comma-separated list of the kinds of opaque constructs to use. Default=list,array.
  • list = Generate opaque expressions using linked lists
  • array = Generate opaque expressions using arrays
  • input = Generate opaque expressions that depend on input. Requires --Inputs to set invariants over input.
  • env = Generate opaque expressions from entropy. Requires --InitEntropy.
  • * = Same as list,array,input,env
--AntiBranchAnalysisObfuscateBranchFunCall BOOLSPEC Obfuscate the body of the branch function. Default=false.
--AntiBranchAnalysisBranchFunFlatten BOOLSPEC Flatten before replacing jumps. This opens up more opportunities for replacing unconditional branches. Default=false.
--AntiBranchAnalysisBranchFunAddressOffset integer The offset (in bytes) of the return address on the stack, for branch functions. May differ based on operating system, word size, and compiler. Default=8 on x86_64, 0 on Arm.
 

Issues

This transformation has many issues, and should only be used with great care:

  • Currently, only x86 assembly code is generated. If someone is interested in helping to port this to other platforms, let me know.
  • It appears as goto2push and goto2call will often cause clang to generate the wrong code.

    1

    gcc 4.6 appears to do the right thing.

    2

    gcc 4.8 appears to occasionally hang when compiling our generated code.

    The issue is that the generated inline assembly code contains jumps. Newer versions of gcc have an asm goto construct which ought to help with this. Clang lacks this feature.
  • Make sure you set the --Environment=... option appropriately if you are going to use goto2push and goto2call and test the generated code thoroughly. goto2push and goto2call are turned off by default.
  • Running this transformation on the same function twice seems to occasionally break.
  • The NOP sled currently uses only trivial instructions that do not modify registers. Eventually, we'll get around to implementing a nop-generator (similar to MetaSploit's) that also allows more complex instructions.
 

References

The Branch Function transformation implements a simplistic version of Linn and Debray's Obfuscation of Executable Code to Improve Resistance to Static Disassembly, Linn and Debray's algorithm replaces direct jumps with calls to a special branch function which sets the return address to the target of the original branch, and then returns.

There are many attacks published on branch functions, including Static Disassembly of Obfuscated Binaries by Christopher Kruegel, William Robertson, Fredrik Valeur and Giovanni Vigna, and Deobfuscation: Reverse engineering obfuscated code by Sharath Udupah, Saumya Debray, and Matias Madou.

Kevin A. Roundy and Barton P. Miller's survey paper Binary-code obfuscations in prevalent packer tools is a good source of information on techniques used by current obfuscation tools.