Random Function

free web templates

One of the uses of Tigress is as an educational tool. The --Transform=RandomFuns option will generate a random function that can subsequently be transformed using any combination of Tigress obfuscations, and then given to students as a cracking target.

free web templates

You can generate a program for the students to reverse engineer:

free web templates

or you can generate a program for them to crack:

free web templates

Depending on the sophistication of your students, you can vary the length of the transformation sequence, the difficulty of the transformations, the options to the transformations, the complexity of the generated challenge function, and either give them source to untangle (a good way to learn about particular transformations), or stripped compiled code (for a more real-world challenge).

free web templates

We typically give each student 3 programs, of different levels of difficulty, to attack. Thanks to the diversity of Tigress-generated programs, each challenge is unique, making it harder for students to cheat.

 

Examples

free web templates

Below is part of the script we use to generate take-home exams for our students. It contains two assets, a password check and an expired time check, and it's the students' job to disable these.

free web templates

Start by creating this file, empty.c:

#include "tigress.h"
#include <stdio.h> 
#include <time.h>
#include <pthread.h>
free web templates

Generate the cleartext challenge program. This is hidden from the students:

tigress --Verbosity=1 --Seed=$seed6 --Environment=x86_64:Darwin:Clang:5.1 \
      --Transform=RandomFuns \
         --RandomFunsName=SECRET \
         --Verbosity=5 \
         --RandomFunsFunctionCount=1 \
         --RandomFunsTrace=0 \
         --RandomFunsType=long \
         --RandomFunsInputSize=1 \
         --RandomFunsLocalStaticStateSize=1 \
         --RandomFunsGlobalStaticStateSize=0 \
         --RandomFunsLocalDynamicStateSize=0 \
         --RandomFunsGlobalDynamicStateSize=0 \
         --RandomFunsBoolSize=2 \
         --RandomFunsLoopSize=1000 \
         --RandomFunsCodeSize=100 \
         --RandomFunsOutputSize=1 \
         --RandomFunsTimeCheckCount=1 \
         --RandomFunsActivationCodeCheckCount=1 \
         --RandomFunsActivationCode=42 \
         --RandomFunsPasswordCheckCount=1 \
         --RandomFunsPassword=secret \
         --RandomFunsFailureKind=segv \
      --out=6-input.c empty.c
free web templates

Next, generate an empty program with the same interface as the challenge program for the students to fill out:

tigress --Verbosity=1 --Seed=$seed6 \
      --Transform=RandomFuns --RandomFunsName=SECRET \
         --RandomFunsType=long \
         --RandomFunsInputSize=1 \
         --RandomFunsLocalStaticStateSize=1 \
         --RandomFunsGlobalStaticStateSize=0 \
         --RandomFunsLocalDynamicStateSize=0 \
         --RandomFunsGlobalDynamicStateSize=0 \
         --RandomFunsOutputSize=1 \
         --RandomFunsCodeSize=0 \
      --out=6-answer.c empty.c
free web templates

Finally, obfuscate the challenge program:

tigress --Verbosity=1 --Seed=$seed6 --FilePrefix=obf \
      --Transform=InitEntropy \
         --Functions=main\
      --Transform=InitOpaque \
         --Functions=main --InitOpaqueCount=1 --InitOpaqueStructs=list,array\
      --Transform=InitBranchFuns \
         --InitBranchFunsCount=2\
      --Transform=EncodeLiterals \
         --Functions=SECRET --EncodeLiteralsKinds=string --EncodeLiteralsEncoderName=STRINGS\
      --Transform=Virtualize \
         --Functions=STRINGS --VirtualizeDispatch=switch --VirtualizeOperands=stack,registers \
         --VirtualizeMaxMergeLength=2 --VirtualizeSuperOpsRatio=1.0 \
      --Transform=AddOpaque \
         --Functions=SECRET --AddOpaqueKinds=call,bug,true --AddOpaqueCount=4\
      --Transform=Virtualize \
         --Functions=SECRET --VirtualizeDispatch=indirect --VirtualizeOperands=stack,registers \
         --VirtualizeMaxMergeLength=2 --VirtualizeSuperOpsRatio=1.0 \
      --Transform=Virtualize \
         --Functions=SECRET --VirtualizeDispatch=ifnest --VirtualizeOperands=stack,registers \
         --VirtualizeMaxMergeLength=2 --VirtualizeSuperOpsRatio=1.0 --VirtualizeNumberOfBogusFuns=1\
      --Transform=EncodeLiterals \
         --Functions=SECRET --EncodeLiteralsKinds=integer \
       --Transform=BranchFuns \
         --Functions=SECRET --BranchFunsFlatten=true \
      --Transform=CleanUp \
         --CleanUpKinds=annotations,constants,names \
      --out=6-challenge.c 6-input.c
 

Structure of Random Programs

free web templates

To fulfill our needs, a random program generator must have certain characteristics:

  • in order to make it easy for reverse engineers to automate attacks, the generated programs must a have simple I/O behavior;
  • the generator must be able to produce an "infinite" sequence of different programs;
  • generated programs should be small, but still have an "interesting" internal structure;
  • generate programs should always terminate;
  • generate programs should terminate within a reasonable timebound;
  • an adversary should not be able to guess the internals of a generated program by examining its input-output behavior;
  • generated programs should be "minimal," i.e. there should not be a smaller program that has the same I/O behavior (this is so that it is obvious to the reverse engineer when he has found the correct program).
free web templates

The generated challenge programs all take the following simple form:

#include ≤stdio.h> 
#include ≤stdlib.h>

void SECRET(unsigned long input[1] , unsigned long output[1] ) 
   ...
}
int main(int argc, char** argv) {
{ 
  unsigned long input[1] ;
  unsigned long output[1] ;
  int i5 ;
  unsigned long value6 ;
  int i7 ;
  }
  i5 = 0;
  while (i5 < 1) {
    value6 = strtoul(argv[i5 + 1], 0, 10);
    input[i5] = value6;
    i5 ++;
  }
  SECRET(input, output);
  i7 = 0;
  while (i7 < 1) {
    printf("%lu\n", output[i7]);
    i7 ++;
  }
} 
free web templates

That is, a challenge program reads one or more longs from standard in, and produces one or more longs on standard out.

free web templates

Internally, generated SECRET functions share the same basic structure: an expansion phase, a mixing phase, and a compression phase:

  • The expansion phase expands the input arguments into a (typically larger) state space. It is currently a sequence of assignment statements.
  • The mixing phase perform operations on the state space. This is where we add control structures.
  • The Compression phase compresses the state into the (typically smaller) output. It is currently a sequence of assignment statements.
free web templates

Here is an example:

void SECRET(unsigned long input[1] , unsigned long output[1] ) { 
  unsigned long state[4] ;
  unsigned long local1 ;

  /* Expansion phase */
  state[0UL] = input[0UL] + 762537946UL;
  state[1UL] = input[0UL] | ((16601096UL << (state[0UL] % 16UL | 1UL)) | (16601096UL >> (64 - (state[0UL] % 16UL | 1UL))));
  state[2UL] = (input[0UL] ^ 643136481UL) ^ (state[0UL] + 292656718UL);
  state[3UL] = (input[0UL] << (((state[1UL] >> 4UL) & 15UL) | 1UL)) | (input[0UL] >> (64 - (((state[1UL] >> 4UL) & 15UL) | 1UL)));

  /* Mixing phase */
  local1 = 0UL;
  while (local1 < 3UL) {
    state[1UL] |= (state[2UL] & 15UL) << 3UL;
    state[local1 + 1UL] = state[local1];
    local1 += 2UL;
  }
  if ((state[0UL] | state[1UL]) > (state[2UL] | state[3UL])) {
    state[3UL] |= (state[1UL] & 31UL) << 3UL;
  } else {
    state[2UL] = state[0UL];
    state[3UL] |= (state[2UL] & 15UL) << 3UL;
  }
  state[0UL] = state[2UL];

  /* Compression phase */  
  output[0UL] = (state[0UL] << (state[1UL] % 8UL | 1UL)) << ((((state[2UL] << (state[3UL] % 8UL | 1UL)) >> 1UL) & 7UL) | 1UL);
}
OptionArgumentsDescription
--Transform RandomFuns Generate a random function useful as an attack target.
--RandomFunsFunctionFunctionCount INTSPEC Number of functions to generate. Default=1.
--RandomFunsTrace INTSPEC Insert tracing in the generated program. 0=no tracing. 1=trace function entry and exit. 2=as 1, but also trace control points. 3=as 2, but also trace individual statements. 4=as 3, but also trace the values of the STATE variable at each statement. Default=0.
--RandomFunsInputSize INTSPEC Size of input. Default=1.
--RandomFunsLocalStaticStateSize INTSPEC Size of function-local static state. Default=1.
--RandomFunsLocalDynamicStateSize INTSPEC Size of function-local dynamic state, i.e. the number of linked structures (lists, trees, etc) that can be built. Default=0.
--RandomFunsGlobalStaticStateSize INTSPEC Size of global static state. Default=0.
--RandomFunsGlobalDynamicStateSize INTSPEC Size of global dynamic state, i.e. the number of linked structures (lists, trees, etc) that can be built. Default=0.
--RandomFunsOutputSize INTSPEC Size of output. Default=1.
--RandomFunsCodeSize INTSPEC Size of the generated code. This is the number of nodes in generated Abstract Syntax Tree. Default=10.
--RandomFunsLoopSize INTSPEC Maximum count of loop iterations. Used to control the length of execution. Default=None.
--RandomFunsBoolSize INTSPEC Size of boolean expressions, specifically the number of conjunctions and disjunctions (&& and ||) that are used to build up expressions. Default=1.
--RandomFunsType char, short, int, long, float, double Type of input/output/state. Default=long.
  • char = C int type
  • short = C int type
  • int = C int type
  • long = C long type
  • float = C float type
  • double = C double type
--RandomFunsName string The name of the generated function.
--RandomFunsFailureKind message, abort, segv The manner in which a triggered asset may fail. Comma-separated list. Default=segv.
  • message = Print a message.
  • abort = Call the abort function.
  • segv = Die with a segmentation fault.
--RandomFunsInputKind argv, stdin How inputs are read by the program, through the command line or stdin. Default=argv.
  • argv = Enter input to the program on the command line.
  • stdin = Enter input to the program through stdin.
--RandomFunsDummyFailure BOOLSPEC Generates excatly the same code whether true or false, except the failure code is rendered impotent. In other words, --RandomFunsDummyFailure=true will have the failure code inserted, but inactive. Default=false.
--RandomFunsActivationCode int The code the user has to enter (as the first command line arguments) to be allowed to run the program. Default=42.
--RandomFunsPassword string The password the user has to enter (read from standard input) to be allowed to run the program. Default="42".
--RandomFunsTimeCheckCount int The number of checks for expired time (gettimeofday() > someTimeInThePast) to be inserted in the program. Default=0.
--RandomFunsActivationCodeCheckCount int The number of checks for correct activation code to be inserted in the program. Default=0.
--RandomFunsPasswordCheckCount int The number of checks for correct password to be inserted in the program. Probably only 0 and 1 make sense here, since the user will be prompted for a password once for every check. Default=0.
--RandomFunsControlStructures S-Expression If set, will define the nested control structures of the generated function. Otherwise, a random structure will be generated. The argument is an S-Expression, where each subexpression has one of the forms (bb INTSPEC) (for a basic block consisting of a certain number of assignment statements), (for S-expression) (for a for-loop with a given body), or (if S-expression S-expression) (for an if-statement with a given then and else part), or (switch S-expression S-expression) (for a switch-statement with a given list of cases and a default case). For example, --RandomFunsControlStructures='(for ((bb 4)))' will generate a body consisting of a for-loop with 4 assignment statements in the body. (for ((bb 4) (if ((bb 1)) ((bb 2))))) also puts an if-statement inside the loop. (for ((bb 4) (if ((bb 1)) ((for (bb 2)))))) puts a for-loop inside the else-part of the if-statement. (switch ((bb 4) (if ((bb 1)) (bb 4))) (bb 4)) creates a switch-statment with two cases (a basic block and an if-statement), and a basic block as the default case. Default=none.
--RandomFunsBasicBlockSize INTSPEC The size of basic blocks, when control structures are not explicitly specified using RandomFunsControlStructures. Default=3.
--RandomFunsForBound constant, input, boundedInput, boundedAny The allowable upper bound in a for-statement. Comma-separated list. Default=constant.
  • constant = Literal integer.
  • input = Value from the input array, could cause index out of bounds.
  • boundedInput = Value from the input array, will not cause index out of bounds.
  • boundedAny = Value from the any source, will not cause index out of bounds.
--RandomFunsOperators PlusA, MinusA, Mult, Div, Mod, Shiftlt, Shiftrt, Lt, Gt, Le, Ge, Eq, Ne, BAnd, BXor, BOr, * The allowable operators in expressions. Comma-separated list. Default=all.
  • PlusA = +
  • MinusA = -
  • Mult = *
  • Div = /
  • Mod = %
  • Shiftlt = <<
  • Shiftrt = >>
  • Lt = <
  • Gt = >
  • Le = =<
  • Ge = >=
  • Eq = ==
  • Ne = !=
  • BAnd = &
  • BXor = ^
  • BOr = |
  • * = all operators
--RandomFunsPointTest BOOL Add if (output[0] == 4242424242U) printf("You win!\n"); after the call to the generated function. The idea is to replace (by hand) 4242424242U with one actual output of the function. This can be used as another reverse engineering challenge: "Find an input for which the program prints "You win!. Default=false.

 

Issues

  • Generating random functions is actually a really cool research problem! Generating a program that always terminates, finishes within a reasonable time, looks vaguely normal (i.e. could possibly have been written by a human), and has a unique and unpredicatable I/O behavior (i.e. the code can't be derived from blackbox access), is decidedly non-trival.
  • A good solution to this problem should be worthy of at least a master's thesis, for sure! Email me! (:
  • This implementation doesn't achieve any of the stated goals. In particular, it's hard to ensure that all programs generated from the same script runs in a reasonable time.
  • Various options can be used to tweak the runtime, in particular --RandomFunsLoopSize=1000 which bounds the number of loop iterations.
  • If you're willing to set the control structures yourself using --RandomFunsControlStructures="(if (bb 5) (bb 2))", then the problem goes away. When these are generated automatically, however, there's always some risk that you'll wind up with --RandomFunsControlStructures="(for (for (for (for (bb 4)))))" which is likely to run forever. I try to mitigate this by reducing the loop size (set by --RandomFunsLoopSize=1000) by a factor of 10 for every nesting level, but this doesn't help if looping is interprocedural (I don't currently check for this).
  • The random call graphs are actually just trees. They should be DAGs, at least.
 

References

free web templates

Random function generation is described in Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning by Sebastian Banescu, Christian Collberg, and Alexander Pretschner, USENIX Security 2017.