Implicit Flow

Several transformations, in particular AntiTaintAnalysis, use implicit flow as a basic building block. Before you can use these you need to call the --Transform=InitImplicitFlow and list which of the implicit flow variants you are going to use later. Here's an example of what it might look like:

tigress --Seed=0 --Verbosity=1 --Environment=x86_64:Darwin:Gcc:4.6 \
   --Transform=InitEntropy --Functions=main \
   --Transform=InitImplicitFlow --Functions=main \
      --InitImplicitFlowKinds=trivial_thread_1,trivial_counter,\
                              mem_cache_time,mem_cache_thread_1,\
                              file_cache_time,file_cache_thread_1,\
                              jit_time \
      --InitImplicitFlowHandlerCount=1 \
      --InitImplicitFlowJitCount=1 \
      --InitImplicitFlowJitFunctionBody="(for (if (bb 50) (bb 50)))" \
      --InitImplicitFlowTrace=false \
      --InitImplicitFlowTrain=false \
      --InitImplicitFlowTime=false \
      --InitImplicitFlowTrainingTimesClock=500 \
      --InitImplicitFlowTrainingTimesThread=500 \
      --InitImplicitFlowTrainingMinGap=90 \
      --InitImplicitFlowTrainingConfidenceLevel=0.99 \
      --InitImplicitFlowTrainingTargetErrorRate=0.00001 \
      --InitImplicitFlowTrainingKind=statistics \
   --Transform=AntiTaintAnalysis --Functions=main \
      --AntiTaintAnalysisKinds=vars \
      --LocalVariables=main:b \
      --AntiTaintAnalysisImplicitFlow="(repeat mem_cache_time 3)" \
   input.c --out=output.c

Here, we specify that we might want to use the implicit flow variants trivial_thread_1,trivial_counter,mem_cache_time,mem_cache_thread_1,file_cache_time,file_cache_thread_1,jit_time. We then insert implicit flow using (repeat mem_cache_time 3) after any assignment to the variable b in function main. The transformation of the assignment b = a would look something like this:

    exp_orig3 = a;
    exp_orig3_origPtr4 = (unsigned char *)(& exp_orig3);
    exp_orig3_copyPtr6 = (unsigned char *)(& exp_orig3_copy5);
    size_iter7 = 0;
    while (size_iter7 < 4) {
      maxCount34 = 0;
      while (maxCount34 < 1) {
        majority_iter9 = 0;
        while (majority_iter9 < 255) {
          counters8[majority_iter9] = 0;
          majority_iter9 ++;
        }
        repeat_iter32 = 0;
        while (repeat_iter32 < 3) {
          charCopy10 = 0;
          copy_iter11 = 0;
          while (copy_iter11 < 8) {
            if ((*exp_orig3_origPtr4 >> copy_iter11) & 1) {
              pagesize13 = getpagesize();
              memalignRet14 = posix_memalign(& __buffer_slow_1, pagesize13, 64);
              __asm__  volatile   ("cpuid\n"
                                   "rdtsc\n": "=a" (low16), "=d" (high17));
              sum20 = _1entropy ^ 1;
              iter21 = 0;
              while (iter21 < _2_time_implicitFlow_memCache_numberOfReads) {
                __asm__  volatile   ("mfence\n"
                                     "clflush (%0)\n": : "r" (__buffer_slow_1));
                sum20 += *((long *)__buffer_slow_1);
                *((long *)__buffer_slow_1) = sum20;
                iter21 ++;
              }
              _1entropy = sum20 - _1entropy;
              __asm__  volatile   ("cpuid\n"
                                   "rdtsc\n": "=a" (low18), "=d" (high19));
              free(__buffer_slow_1);
              time12 = (((unsigned long )high19 << 32) | (unsigned long )low18) -
                       (((unsigned long )high17 << 32) | (unsigned long )low16);
            } else {
              pagesize22 = getpagesize();
              memalignRet23 = posix_memalign(& __buffer_fast_1, pagesize22, 64);
              memalignRet23 = posix_memalign(& __buffer_fast_2, pagesize22, 64);
              __asm__  volatile   ("cpuid\n"
                                   "rdtsc\n": "=a" (low26), "=d" (high27));
              sum30 = _1entropy | 2;
              iter31 = 0;
              while (iter31 < _2_time_implicitFlow_memCache_numberOfReads) {
                __asm__  volatile   ("mfence\n"
                                     "clflush (%0)\n": : "r" (__buffer_fast_1));
                sum30 += *((long *)__buffer_fast_2);
                *((long *)__buffer_fast_2) = sum30;
                iter31 ++;
              }
              _1entropy = sum30 | _1entropy;
              __asm__  volatile   ("cpuid\n"
                                   "rdtsc\n": "=a" (low28), "=d" (high29));
              free(__buffer_fast_1);
              free(__buffer_fast_2);
              time12 = (((unsigned long )high29 << 32) | (unsigned long )low28) -
                        (((unsigned long )high27 << 32) | (unsigned long )low26);
            }
            if ((double )time12 > 
              _2_time_implicitFlowTimerClock_mem_cache_time_timingCounter_middleReferenceTime) {
              charCopy10 |= 1 << copy_iter11;
            } else {

            }
            copy_iter11 ++;
          }
          (counters8[charCopy10]) ++;
          repeat_iter32 ++;
        }
        maxCount34 = 0;
        max_iter35 = 0;
        while (max_iter35 < 255) {
          if (counters8[max_iter35] > maxCount34) {
            maxCount34 = counters8[max_iter35];
            winner33 = max_iter35;
          } else {

          }
          max_iter35 ++;
        }
      }
      *exp_orig3_copyPtr6 = winner33;
      exp_orig3_origPtr4 ++;
      exp_orig3_copyPtr6 ++;
      size_iter7 ++;
    }
    b = exp_orig3_copy5;
 

Algorithms

Some primitives are based on the timing differences between a slow and a fast process:

void P'() {
   value=0;   
   for(i in [0...(bits in b)-1]) {
      timeT start = time();
      if ((i:th bit of b)==1) 
         slow process(param);
      else 
         fast process(param);
      timeT time = time()-start;
      if (time  > threshold)
         value |= 1 << i;
   }
   a = value;
}

A variant of such timing-based flow primitives use threads to do the timing. This helps with stealth:

int result;
void slow(int v) {
   slow process(param);
   result = v;
}
void fast(int v) {
   fast process(param);
   result = v;
}
void P'() {
   a=0;   
   for(i in [0...(bits in b)-1]) {
      int bit = i:th bit of b;
      s=thread_create(slow, bit);
      f=thread_create(fast, !bit);
      thread_join(s);
      thread_join(f);
      if (result)
         a |= 1 << i;
   }
}

There are two implementations of these threading-based primitives, using either one or two threads. This allows you to embed threading code that matches the code that is already in the program.

 

Implicit Flow Kinds

  • counter_int: Copy a variable by counting up to its value. I.e., the assigment a=b is transformed into
    int i,a = 0;
    for(i=0;i<b;i++)
       a++;
    
  • counter_float: Same as counter_int, but using a float.
  • counter_signal: Copy a variable bit-by-bit using a signal handler.
  • bitcopy_unrolled: Copy a variable bit-by-bit, each bit tested by an if-statement
  • bitcopy_loop: Loop over the bits in a variable and copy each bit by testing in an if-statement.
  • bitcopy_signal: Loop over the bits in a variable and copy each bit in a signal handler:
    unsigned int value;
    int bitNo;
    void handler(int sig){
      value |= 1 << bitNo;
    }
    
    {
       value=0;   
       signal(31, handler);
       for(i in 0..(bits in b)-1]) {
          if ((i:th bit of b)==1) {
               bitNo=i;
               raise(31);
         } 
       }
       signal(31, 1);
      a = value;
    }
    
  • file_write: Copy a variable byte-by-byte by writing to a file and reading it back again.
  • trivial_clock: Copy a variable by timing a trivial loop.
  • trivial_thread_1: Same as trivial_clock, but using threads for timing.
  • trivial_thread_2: Same as trivial_thread_1, but using two threads instead of one.
  • file_cache_time : Copy a variable by timing a file being read with caching turned on or off:
    process<(param,nocache):
       posix_memalign(&buf,pagesize,
                      pagesize);
       fd=open("/tmp/file.txt", writing);
       fcntl(fd, F_NOCACHE, nocache);
       for(i=0; ireading);
       fcntl(fd, F_NOCACHE, nocache);
       start = time();
       for(i=0; islow process(param):
       process(param,1):
    fast process(param):
       process(param,0):
    
  • file_cache_thread_1: Same as file_cache_time , but using threads for timing.
  • file_cache_thread_2: Same as file_cache_thread_1, but using two threads instead of one.
  • mem_cache_time: Copy a variable by timing a CPU data cache being read with or without caching:
    posix_memalign(&buf1,pagesize,64);
    posix_memalign(&buf2,pagesize,64);
    slow process(param):
       for(i=0; i<  param;i++){
          asm ("mfence\n"
          "clflush (%0)\n"::"r"(buf1));
          sum += *((long *)buf1);
          *((long *)buf1) = sum;
       }
    fast process(param):
       for(i=0;i < param;i++){
          asm ("mfence\n"
          "clflush (%0)\n"::"r"(buf1));
          sum += *((long *)buf2);
          *((long *)buf2) = sum;
       }
    
  • mem_cache_thread_1: Same as mem_cache_time, but using threads for timing.
  • mem_cache_thread_2: Same as mem_cache_thread_1, but using two threads instead of one.
  • time_jit: Copy a variable by timing a jitted function:
    int freq=0;
    void foo(input,output) {
       static void (*foop)(...,...);
       if (freq==0) {
          foop = JIT(bytecodes)
          freq++;
       }
       (*foop)(input,output);
    }
    slow process(param):
       freq=0;
       start=time();
       foo(...,...);
       time=time()-start;
    
    fast process(param);
       freq=0;
       foo(...,...);
       start=time();
       foo(...,...);
       time=time()-start;
    
 

Training

All the implicit flow primitives that are based on timing need to be trained, i.e. we have to run tests to see what is a good threshold between slow and fast primitives. There are two major kinds of training, gap and resend. The gap training simply sends lots of "slow" and "fast" tests, and finds a gap between them. The resend training is similar, but uses more sophisticated statistical tests (Wilson's Score Rule) to figure out how many times a bit should be sent in order to achieve a particular confidence level and target error rate. The commands look like this:

--Transform=InitImplicitFlow \
   --Functions=... \
   --InitImplicitFlowKinds=... \
   --InitImplicitFlowTrain=true \
   --InitImplicitFlowTrainingTimesClock=50 \
   --InitImplicitFlowTrainingTimesThread=50 \
   --InitImplicitFlowTrainingGapMaxFailureRateClock=5 \
   --InitImplicitFlowTrainingGapMaxFailureRateThread=5 \
   --InitImplicitFlowTrainingGapMinGap=90 \
   --InitImplicitFlowTrainingKind=gap \

or like this:

--Transform=InitImplicitFlow \
   --Functions=... \
   --InitImplicitFlowKinds=... \
   --InitImplicitFlowTrain=true \
   --InitImplicitFlowTrainingResendConfidenceLevel=0.99 \
   --InitImplicitFlowTrainingResendTargetErrorRate=0.00001 \
   --InitImplicitFlowTrainingKind=resend \

The output would look like this (for the resend training kind):

TRAINING_DATA '(mem_cache_time 4496.000000 16 4)'

which means that for the mem_cache_time implicit flow primitive, a parameter value of "16" is optimal, that each bit needs to be sent "4" times, and that the threshold between slow and fast processes is "4496". You can then feed this into another run of tigress:

--Transform=InitImplicitFlow \
   --InitImplicitFlowTrain=false \
   --InitImplicitFlowTrainingData='(mem_cache_time 4496.000000 16 4)' \
--Transform=AntiTaintAnalysis \
   --Functions=... \

You can also run the training at startup time of your program, i.e., insert your training code at the beginning of main:

--Transform=InitImplicitFlow \
   --InitImplicitFlowTrain=true \
--Transform=AntiTaintAnalysis \
   --Functions=... \
 

Implicit Flow Combiners

Implicit flow primitives can be combined for improved correctness and higher levels of resistance to analysis. Combiners are expressed as S-expressions:

type combiner = 
   | (single primitive)
   | (compose combiner combiner ...)
   | (select combiner  combiner ...)
   | (majority combiner  combiner ...)
   | (repeat combiner int)
   | (until combiner int int)

Their behavior as as follows:

  • (single a) a single primitive, e.g. (single trivial_counter).
  • (compose a b c) chains together several combiners, i.e. a(b(c)).
  • (select a b c) picks one of combiners a,b,c at random, based on input.
  • (majority a b c) chooses the most frequently occurring value computed by a,b,c.
  • (repeat a n) is equivalent to (majority a a ... a) with n copies of .
  • (until a m n) continuously repeats combiner a m times until there is at least n agreements on the resulting value.

You would use these in the AntiTaintAnalysis command's --AntiTaintAnalysisImplicitFlow=... option:

   --Transform=AntiTaintAnalysis \
   --Functions=main \
   --AntiTaintAnalysisKinds=vars \
   --LocalVariables=main:b \
   --AntiTaintAnalysisImplicitFlow="(repeat mem_cache_thread_1 3)" \

Here are some examples of combiners:

  (single counter_int)
  (single counter_float)
  (single counter_signal)
  (single bitcopy_unrolled)
  (single bitcopy_loop)
  (single bitcopy_signal)
  (single file_write)
  (repeat trivial_thread_1 3)
  (repeat trivial_counter 3)
  (repeat mem_cache_time 3)
  (repeat mem_cache_thread_1 3)
  (repeat file_cache_time 3)
  (repeat file_cache_thread_1 3)
  (repeat jit_time 3)
  (compose counter_int counter_float file_write)
  (select counter_int counter_float file_write)
  (majority mem_cache_thread_1 bitcopy_loop counter_signal)
  (majority mem_cache_thread_1 file_cache_time bitcopy_loop counter_signal counter_float)
OptionArgumentsDescription
--Transform InitImplicitFlow Call this before --Transform=AntiTaintAnalysis, in case you want to use the implicit flow copy kinds counter_signal and bitcopy_signal. This transformation inserts the requisite signal handlers and jitted functions.
--InitImplicitFlowKinds counter_int, counter_float, counter_signal, bitcopy_unrolled, bitcopy_loop, bitcopy_signal, file_write, lookup_byteArray, trivial_clock, trivial_thread_1, trivial_thread_2, trivial_counter, file_cache_time, file_cache_thread_1, file_cache_thread_2, mem_cache_time, mem_cache_thread_1, mem_cache_thread_2, jit_time, branchPrediction_time, * Comma-separated list of the kinds of implicit flow that can be inserted. Default=all options.
  • counter_int = Copy a variable by counting up to its value.
  • counter_float = Copy a variable by counting up to its value.
  • counter_signal = Copy a variable by counting up to its value in a signal handler.
  • bitcopy_unrolled = Copy a variable bit-by-bit, each bit tested by an if-statement.
  • bitcopy_loop = Loop over the bits in a variable and copy each bit by testing in an if-statement.
  • bitcopy_signal = Loop over the bits in a variable and copy each bit in a signal handler.
  • file_write = Copy a variable byte-by-byte by writing to a file.
  • lookup_byteArray = Copy a variable byte-by-byte by looking up each byte in a table.
  • trivial_clock = Copy a variable by timing a trivial loop
  • trivial_thread_1 = Copy a variable by timing a trivial loop, using 1 thread for timing
  • trivial_thread_2 = Copy a variable by timing a trivial loop, using 2 threads for timing
  • trivial_counter = Copy a variable by timing (using performance counters) a trivial loop
  • file_cache_time = Copy a variable by timing a file being read with caching turned on or off.
  • file_cache_thread_1 = Copy a variable by timing a file being read with caching turned on or off, but using 1 thread for timing.
  • file_cache_thread_2 = Copy a variable by timing a file being read with caching turned on or off, but using 2 threads for timing.
  • mem_cache_time = Copy a variable by timing a CPU data cache being read with or without caching
  • mem_cache_thread_1 = Copy a variable by timing a CPU data cache being read with or without caching, but using 1 thread for timing.
  • mem_cache_thread_2 = Copy a variable by timing a CPU data cache being read with or without caching, but using 2 threads for timing.
  • jit_time = Copy a variable by timing a jitted function.
  • branchPrediction_time = Copy a variable by timing the CPU's branch prediction.
  • * = Same as all options turned on.
--InitImplicitFlowHandlerCount INTSPEC How many signal handlers to insert. Default=1.
--InitImplicitFlowJitCount INTSPEC How many jitted functions to insert. Default=0.
--InitImplicitFileCacheSize INTSPEC Size of the file (in 8k blocks) for a file cache implicit flow. Default=100.
--InitImplicitFlowJitFunctionBody S-Expression S-Expression describing the control structures of the function to be jitted. See the RandomFuns transformation for description of the syntax. Default="(for (for (for (for (if (bb 2) (bb 2))))))".
--InitImplicitFlowTrainingKind gap, resend How to do the training. Either by simply looking for a gap between slow and fast times, or to do a more sophisticated statistical test. Default=gap.
  • gap = Look for a gap between slow and fast times.
  • resend = Use a more sophisticated statistical test, resending each bit until a guaranteed confindence level and target error rate has been achieved. Set these with --InitImplicitFlowTrainingResendConfidenceLevel=... and --InitImplicitFlowTrainingResendTargetErrorRate=....
--InitImplicitFlowTrainingParameterRange S-Expression An S-Expression consisting of lists of elements of the form (implicit-flow-kind from to), indicating that we will start by training with the parameter set to from, then 2*from, 4*from, until to is reached. Default=(trivial_clock 1 10000) trivial_thread_1 1 10000) trivial_thread_2 1 10000) trivial_counter 1 10000) file_cache_time 1 1024) file_cache_thread_1 1 1024) file_cache_thread_2 1 1024) mem_cache_time 2 64) mem_cache_thread_1 2 64) mem_cache_thread_2 2 64) jit_time 0 0) jit_thread_1 0 0) jit_thread_2 0 0).
--InitImplicitFlowTrace BOOLSPEC Trace the execution of the initialization of implicit flow. Default=false.
--InitImplicitFlowTrain BOOLSPEC Generate a program that runs the training pass. The output will look something like this: T RAINING_DATA '(trivial_counter 779.000000 2) (trivial_thread 0.000000 8192) (file_cache_time 24618781.250000 512) (file_cache_thread 0.000000 2) (mem_cache_time 26890.750000 128) (mem_cache_thread 0.000000 2048) (jit_time 10020289.000000 0)'. This can then be input to a subsequent run of tigress, setting --InitImplicitFlowTrainingData= to this output. Default=false.
--InitImplicitFlowTime BOOLSPEC Generate a program that runs lots of timing tests. The output looks something like this TRAINING_MEASURED_TIME,trivial_counter,fast,example.com,1,208.000000, where the two last entries is the parameter of the implicit flow kind and the time (usually in clock cycles). Default=false.
--InitImplicitFlowTrainingTimesClock INTSPEC Number of training measurements to take for timing-based primitives. Default=0.
--InitImplicitFlowTrainingTimesThread INTSPEC Number of training measurements to take for threading-based primitives. Default=0.
--InitImplicitFlowTrainingGapMaxFailureRateClock INTSPEC Maximum acceptable failure rate for timing-based primitives, out of 100, for the gap training kind. Default=5.
--InitImplicitFlowTrainingGapMaxFailureRateThread INTSPEC Maximum acceptable failure rate for thread-based primitives, out of 100, for the gap training kind. Default=5.
--InitImplicitFlowTrainingGapMinGap INTSPEC Minimum gap between fast and midpoint and slow and midpoint, as a percentage, for the gap training kind. Default=80.
--InitImplicitFlowTrainingResendConfidenceLevel real Confidence level, a floating point number close to 1.0, for the resend training kind. Default=0.95.
--InitImplicitFlowTrainingResendTargetErrorRate real Acceptable error rate for the resend training kind. Should be a floating point number close to 0. Default=0.0001.
--InitImplicitFlowTrainingData SExpression Set training data from previous run of --InitImplicitFlowTrain. Default=none.
 

Issues

  • You may have to add -lm -lphtread to the Tigress command line.
 

References

The implicit flow library is described in Probabilistic Obfuscation through Covert Channels , by Jon Stephens, Babak Yadegari, Christian Collberg, Saumya Debray, and Carlos Scheidegger, presented at the 3rd IEEE European Symposium on Security and Privacy.