Several transformations, in particular
AntiTaintAnalysis,
use implicit flow as a basic building block. Before you can use these you need
to call the --Transform=InitImplicitFlow
and list which of the implicit
flow variants you are going to use later. Here's an example of what it might
look like:
tigress --Seed=0 --Verbosity=1 --Environment=x86_64:Darwin:Gcc:4.6 \
--Transform=InitEntropy --Functions=main \
--Transform=InitImplicitFlow --Functions=main \
--InitImplicitFlowKinds=trivial_thread_1,trivial_counter,\
mem_cache_time,mem_cache_thread_1,\
file_cache_time,file_cache_thread_1,\
jit_time \
--InitImplicitFlowHandlerCount=1 \
--InitImplicitFlowJitCount=1 \
--InitImplicitFlowJitFunctionBody="(for (if (bb 50) (bb 50)))" \
--InitImplicitFlowTrace=false \
--InitImplicitFlowTrain=false \
--InitImplicitFlowTime=false \
--InitImplicitFlowTrainingTimesClock=500 \
--InitImplicitFlowTrainingTimesThread=500 \
--InitImplicitFlowTrainingMinGap=90 \
--InitImplicitFlowTrainingConfidenceLevel=0.99 \
--InitImplicitFlowTrainingTargetErrorRate=0.00001 \
--InitImplicitFlowTrainingKind=statistics \
--Transform=AntiTaintAnalysis --Functions=main \
--AntiTaintAnalysisKinds=vars \
--LocalVariables=main:b \
--AntiTaintAnalysisImplicitFlow="(repeat mem_cache_time 3)" \
input.c --out=output.c
Here, we specify that we might want to use the implicit flow variants
trivial_thread_1,trivial_counter,mem_cache_time,mem_cache_thread_1,file_cache_time,file_cache_thread_1,jit_time
.
We then insert implicit flow using (repeat mem_cache_time 3)
after any assignment to the variable b
in
function main
. The transformation of the assignment b = a
would look something like this:
exp_orig3 = a;
exp_orig3_origPtr4 = (unsigned char *)(& exp_orig3);
exp_orig3_copyPtr6 = (unsigned char *)(& exp_orig3_copy5);
size_iter7 = 0;
while (size_iter7 < 4) {
maxCount34 = 0;
while (maxCount34 < 1) {
majority_iter9 = 0;
while (majority_iter9 < 255) {
counters8[majority_iter9] = 0;
majority_iter9 ++;
}
repeat_iter32 = 0;
while (repeat_iter32 < 3) {
charCopy10 = 0;
copy_iter11 = 0;
while (copy_iter11 < 8) {
if ((*exp_orig3_origPtr4 >> copy_iter11) & 1) {
pagesize13 = getpagesize();
memalignRet14 = posix_memalign(& __buffer_slow_1, pagesize13, 64);
__asm__ volatile ("cpuid\n"
"rdtsc\n": "=a" (low16), "=d" (high17));
sum20 = _1entropy ^ 1;
iter21 = 0;
while (iter21 < _2_time_implicitFlow_memCache_numberOfReads) {
__asm__ volatile ("mfence\n"
"clflush (%0)\n": : "r" (__buffer_slow_1));
sum20 += *((long *)__buffer_slow_1);
*((long *)__buffer_slow_1) = sum20;
iter21 ++;
}
_1entropy = sum20 - _1entropy;
__asm__ volatile ("cpuid\n"
"rdtsc\n": "=a" (low18), "=d" (high19));
free(__buffer_slow_1);
time12 = (((unsigned long )high19 << 32) | (unsigned long )low18) -
(((unsigned long )high17 << 32) | (unsigned long )low16);
} else {
pagesize22 = getpagesize();
memalignRet23 = posix_memalign(& __buffer_fast_1, pagesize22, 64);
memalignRet23 = posix_memalign(& __buffer_fast_2, pagesize22, 64);
__asm__ volatile ("cpuid\n"
"rdtsc\n": "=a" (low26), "=d" (high27));
sum30 = _1entropy | 2;
iter31 = 0;
while (iter31 < _2_time_implicitFlow_memCache_numberOfReads) {
__asm__ volatile ("mfence\n"
"clflush (%0)\n": : "r" (__buffer_fast_1));
sum30 += *((long *)__buffer_fast_2);
*((long *)__buffer_fast_2) = sum30;
iter31 ++;
}
_1entropy = sum30 | _1entropy;
__asm__ volatile ("cpuid\n"
"rdtsc\n": "=a" (low28), "=d" (high29));
free(__buffer_fast_1);
free(__buffer_fast_2);
time12 = (((unsigned long )high29 << 32) | (unsigned long )low28) -
(((unsigned long )high27 << 32) | (unsigned long )low26);
}
if ((double )time12 >
_2_time_implicitFlowTimerClock_mem_cache_time_timingCounter_middleReferenceTime) {
charCopy10 |= 1 << copy_iter11;
} else {
}
copy_iter11 ++;
}
(counters8[charCopy10]) ++;
repeat_iter32 ++;
}
maxCount34 = 0;
max_iter35 = 0;
while (max_iter35 < 255) {
if (counters8[max_iter35] > maxCount34) {
maxCount34 = counters8[max_iter35];
winner33 = max_iter35;
} else {
}
max_iter35 ++;
}
}
*exp_orig3_copyPtr6 = winner33;
exp_orig3_origPtr4 ++;
exp_orig3_copyPtr6 ++;
size_iter7 ++;
}
b = exp_orig3_copy5;
Some primitives are based on the timing differences between a slow and a fast process:
void P'() {
value=0;
for(i in [0...(bits in b)-1]) {
timeT start = time();
if ((i:th bit of b)==1)
slow process(param);
else
fast process(param);
timeT time = time()-start;
if (time > threshold)
value |= 1 << i;
}
a = value;
}
A variant of such timing-based flow primitives use threads to do the timing. This helps with stealth:
int result;
void slow(int v) {
slow process(param);
result = v;
}
void fast(int v) {
fast process(param);
result = v;
}
void P'() {
a=0;
for(i in [0...(bits in b)-1]) {
int bit = i:th bit of b;
s=thread_create(slow, bit);
f=thread_create(fast, !bit);
thread_join(s);
thread_join(f);
if (result)
a |= 1 << i;
}
}
There are two implementations of these threading-based primitives, using either one or two threads. This allows you to embed threading code that matches the code that is already in the program.
counter_int
:
Copy a variable by counting up to its value. I.e., the assigment
a=b
is transformed into
int i,a = 0;
for(i=0;i<b;i++)
a++;
counter_float
:
Same as counter_int
, but using a float.
counter_signal
:
Copy a variable bit-by-bit using a signal handler.
bitcopy_unrolled
:
Copy a variable bit-by-bit, each bit tested by an if-statement
bitcopy_loop
:
Loop over the bits in a variable and copy each bit by testing in an if-statement.
bitcopy_signal
:
Loop over the bits in a variable and copy each bit in a signal handler:
unsigned int value;
int bitNo;
void handler(int sig){
value |= 1 << bitNo;
}
{
value=0;
signal(31, handler);
for(i in 0..(bits in b)-1]) {
if ((i:th bit of b)==1) {
bitNo=i;
raise(31);
}
}
signal(31, 1);
a = value;
}
file_write
:
Copy a variable byte-by-byte by writing to a file and reading it back again.
trivial_clock
:
Copy a variable by timing a trivial loop.
trivial_thread_1
:
Same as trivial_clock
, but using threads for timing.
trivial_thread_2
:
Same as trivial_thread_1
, but using two threads instead of one.
file_cache_time
:
Copy a variable by timing a file being read with caching turned on or off:
process<(param,nocache):
posix_memalign(&buf,pagesize,
pagesize);
fd=open("/tmp/file.txt", writing);
fcntl(fd, F_NOCACHE, nocache);
for(i=0; ireading);
fcntl(fd, F_NOCACHE, nocache);
start = time();
for(i=0; islow process(param):
process(param,1):
fast process(param):
process(param,0):
file_cache_thread_1
:
Same as file_cache_time
, but using threads for timing.
file_cache_thread_2
:
Same as file_cache_thread_1
, but using two threads instead of one.
mem_cache_time
:
Copy a variable by timing a CPU data cache being read with or without caching:
posix_memalign(&buf1,pagesize,64);
posix_memalign(&buf2,pagesize,64);
slow process(param):
for(i=0; i< param;i++){
asm ("mfence\n"
"clflush (%0)\n"::"r"(buf1));
sum += *((long *)buf1);
*((long *)buf1) = sum;
}
fast process(param):
for(i=0;i < param;i++){
asm ("mfence\n"
"clflush (%0)\n"::"r"(buf1));
sum += *((long *)buf2);
*((long *)buf2) = sum;
}
mem_cache_thread_1
:
Same as mem_cache_time
, but using threads for timing.
mem_cache_thread_2
:
Same as mem_cache_thread_1
, but using two threads instead of one.
time_jit
:
Copy a variable by timing a jitted function:
int freq=0;
void foo(input,output) {
static void (*foop)(...,...);
if (freq==0) {
foop = JIT(bytecodes)
freq++;
}
(*foop)(input,output);
}
slow process(param):
freq=0;
start=time();
foo(...,...);
time=time()-start;
fast process(param);
freq=0;
foo(...,...);
start=time();
foo(...,...);
time=time()-start;
All the implicit flow primitives that are based on timing need to be trained,
i.e. we have to run tests to see what is a good threshold between slow and fast
primitives. There are two major kinds of training, gap
and resend
.
The gap
training simply sends lots of "slow" and "fast" tests, and finds a gap
between them. The resend
training is similar, but uses more sophisticated
statistical tests (Wilson's Score Rule) to figure out how many times a
bit should be sent in order to achieve a particular confidence level and target error
rate. The commands look like this:
--Transform=InitImplicitFlow \
--Functions=... \
--InitImplicitFlowKinds=... \
--InitImplicitFlowTrain=true \
--InitImplicitFlowTrainingTimesClock=50 \
--InitImplicitFlowTrainingTimesThread=50 \
--InitImplicitFlowTrainingGapMaxFailureRateClock=5 \
--InitImplicitFlowTrainingGapMaxFailureRateThread=5 \
--InitImplicitFlowTrainingGapMinGap=90 \
--InitImplicitFlowTrainingKind=gap \
or like this:
--Transform=InitImplicitFlow \
--Functions=... \
--InitImplicitFlowKinds=... \
--InitImplicitFlowTrain=true \
--InitImplicitFlowTrainingResendConfidenceLevel=0.99 \
--InitImplicitFlowTrainingResendTargetErrorRate=0.00001 \
--InitImplicitFlowTrainingKind=resend \
The output would look like this (for the resend
training kind):
TRAINING_DATA '(mem_cache_time 4496.000000 16 4)'
which means that for the mem_cache_time
implicit flow primitive, a parameter value of "16" is optimal,
that each bit needs to be sent "4" times, and that the threshold between slow and fast processes is
"4496". You can then feed this into another run of tigress:
--Transform=InitImplicitFlow \
--InitImplicitFlowTrain=false \
--InitImplicitFlowTrainingData='(mem_cache_time 4496.000000 16 4)' \
--Transform=AntiTaintAnalysis \
--Functions=... \
You can also run the training at startup time of your program, i.e., insert your
training code at the beginning of main
:
--Transform=InitImplicitFlow \
--InitImplicitFlowTrain=true \
--Transform=AntiTaintAnalysis \
--Functions=... \
Implicit flow primitives can be combined for improved correctness and higher levels of resistance to analysis. Combiners are expressed as S-expressions:
type combiner =
| (single primitive)
| (compose combiner combiner ...)
| (select combiner combiner ...)
| (majority combiner combiner ...)
| (repeat combiner int)
| (until combiner int int)
Their behavior as as follows:
(single a)
a single primitive, e.g. (single trivial_counter)
.
(compose a b c)
chains together several combiners, i.e. a(b(c))
.
(select a b c)
picks one of combiners a,b,c
at random, based on
input.
(majority a b c)
chooses the most frequently occurring value computed by a,b,c
.
(repeat a n)
is equivalent to (majority a a ... a)
with n copies of (until a m n)
continuously repeats combiner a
m times until there is at least n agreements on the resulting value.
You would use these in the AntiTaintAnalysis
command's
--AntiTaintAnalysisImplicitFlow=...
option:
--Transform=AntiTaintAnalysis \
--Functions=main \
--AntiTaintAnalysisKinds=vars \
--LocalVariables=main:b \
--AntiTaintAnalysisImplicitFlow="(repeat mem_cache_thread_1 3)" \
Here are some examples of combiners:
(single counter_int)
(single counter_float)
(single counter_signal)
(single bitcopy_unrolled)
(single bitcopy_loop)
(single bitcopy_signal)
(single file_write)
(repeat trivial_thread_1 3)
(repeat trivial_counter 3)
(repeat mem_cache_time 3)
(repeat mem_cache_thread_1 3)
(repeat file_cache_time 3)
(repeat file_cache_thread_1 3)
(repeat jit_time 3)
(compose counter_int counter_float file_write)
(select counter_int counter_float file_write)
(majority mem_cache_thread_1 bitcopy_loop counter_signal)
(majority mem_cache_thread_1 file_cache_time bitcopy_loop counter_signal counter_float)
Option | Arguments | Description |
---|---|---|
--Transform | InitImplicitFlow | Call this before --Transform=AntiTaintAnalysis, in case you want to use the implicit flow copy kinds counter_signal and bitcopy_signal. This transformation inserts the requisite signal handlers and jitted functions. |
--InitImplicitFlowKinds | counter_int, counter_float, counter_signal, bitcopy_unrolled, bitcopy_loop, bitcopy_signal, file_write, lookup_byteArray, trivial_clock, trivial_thread_1, trivial_thread_2, trivial_counter, file_cache_time, file_cache_thread_1, file_cache_thread_2, mem_cache_time, mem_cache_thread_1, mem_cache_thread_2, jit_time, branchPrediction_time, * | Comma-separated list of the kinds of implicit flow that can be inserted. Default=all options.
|
--InitImplicitFlowHandlerCount | INTSPEC | How many signal handlers to insert. Default=1. |
--InitImplicitFlowJitCount | INTSPEC | How many jitted functions to insert. Default=0. |
--InitImplicitFileCacheSize | INTSPEC | Size of the file (in 8k blocks) for a file cache implicit flow. Default=100. |
--InitImplicitFlowJitFunctionBody | S-Expression | S-Expression describing the control structures of the function to be jitted. See the RandomFuns transformation for description of the syntax. Default="(for (for (for (for (if (bb 2) (bb 2))))))". |
--InitImplicitFlowTrainingKind | gap, resend | How to do the training. Either by simply looking for a gap between slow and fast times, or to do a more sophisticated statistical test. Default=gap.
|
--InitImplicitFlowTrainingParameterRange | S-Expression | An S-Expression consisting of lists of elements of the form (implicit-flow-kind from to), indicating that we will start by training with the parameter set to from, then 2*from, 4*from, until to is reached. Default=(trivial_clock 1 10000) trivial_thread_1 1 10000) trivial_thread_2 1 10000) trivial_counter 1 10000) file_cache_time 1 1024) file_cache_thread_1 1 1024) file_cache_thread_2 1 1024) mem_cache_time 2 64) mem_cache_thread_1 2 64) mem_cache_thread_2 2 64) jit_time 0 0) jit_thread_1 0 0) jit_thread_2 0 0). |
--InitImplicitFlowTrace | BOOLSPEC | Trace the execution of the initialization of implicit flow. Default=false. |
--InitImplicitFlowTrain | BOOLSPEC | Generate a program that runs the training pass. The output will look something like this: T RAINING_DATA '(trivial_counter 779.000000 2) (trivial_thread 0.000000 8192) (file_cache_time 24618781.250000 512) (file_cache_thread 0.000000 2) (mem_cache_time 26890.750000 128) (mem_cache_thread 0.000000 2048) (jit_time 10020289.000000 0)'. This can then be input to a subsequent run of tigress, setting --InitImplicitFlowTrainingData= to this output. Default=false. |
--InitImplicitFlowTime | BOOLSPEC | Generate a program that runs lots of timing tests. The output looks something like this TRAINING_MEASURED_TIME,trivial_counter,fast,example.com,1,208.000000, where the two last entries is the parameter of the implicit flow kind and the time (usually in clock cycles). Default=false. |
--InitImplicitFlowTrainingTimesClock | INTSPEC | Number of training measurements to take for timing-based primitives. Default=0. |
--InitImplicitFlowTrainingTimesThread | INTSPEC | Number of training measurements to take for threading-based primitives. Default=0. |
--InitImplicitFlowTrainingGapMaxFailureRateClock | INTSPEC | Maximum acceptable failure rate for timing-based primitives, out of 100, for the gap training kind. Default=5. |
--InitImplicitFlowTrainingGapMaxFailureRateThread | INTSPEC | Maximum acceptable failure rate for thread-based primitives, out of 100, for the gap training kind. Default=5. |
--InitImplicitFlowTrainingGapMinGap | INTSPEC | Minimum gap between fast and midpoint and slow and midpoint, as a percentage, for the gap training kind. Default=80. |
--InitImplicitFlowTrainingResendConfidenceLevel | real | Confidence level, a floating point number close to 1.0, for the resend training kind. Default=0.95. |
--InitImplicitFlowTrainingResendTargetErrorRate | real | Acceptable error rate for the resend training kind. Should be a floating point number close to 0. Default=0.0001. |
--InitImplicitFlowTrainingData | SExpression | Set training data from previous run of --InitImplicitFlowTrain. Default=none. |
-lm -lphtread
to the Tigress command line.
The implicit flow library is described in Probabilistic Obfuscation through Covert Channels , by Jon Stephens, Babak Yadegari, Christian Collberg, Saumya Debray, and Carlos Scheidegger, presented at the 3rd IEEE European Symposium on Security and Privacy.