February 19,
2021. Opaque Predicate is maybe the best
known code obfuscation primitive. It's an easy concept to explain, it
makes perfect sense as "something that makes code hard to understand,"
and it is supported by most code obfuscation tools (including
Tigress). In practice opaque predicates may be mostly useless in
thwarting real attacks but, at least to academics, they have instant
appeal: they're easy to formalize, they have a mathematical flavor,
and you can first write a paper on how to construct them and then
another one on how to defeat them!
But, where does the term "opaque
predicate" come from?
Well, for my PhD thesis (Flexible
Encapsulation, get your copy now - only $95 on Amazon!)
I designed a language I called Zuse, in which I
examined the concept of Opaque Types. (Side note: I wrote
to Konrad Zuse in my best hish school German and
asked permission to use his name for my language. But, by the time he
responded with a "No, because I don't want people to confuse this
language with the Plankalkül!" the thesis was already in
print. Ooops... Unfortunately, I lost the letter in one of the
many intercontinental moves.) I had learned about opaque types from
Niklaus Wirth's Modula-2, my favorite imperative language at the
time. But, where did Wirth get the term? Well, during his sabbatical at Xerox Park he had learned about
the Mesa language and brought the opaque type
concept back home with him. The joke at the time was that Modula-2 was
what Wirth remembered of Mesa after his transatlantic flight home to
Switzerland, with too much turbulence and one too many cocktails. Yes,
those were simpler times, with lamer jokes.
Anyway, when I started
thinking about code obfuscation (this was back at my first job at the
University of Auckland, New Zealand), I needed a term for an
"expression that always evaluates to the same value, is easy for a
defender to construct, but hard for an attacker to (statically)
determine." I'm not entirely sure that, at the time, I properly
understood the litteral meaning of "opaque" (I was still working on my
English), but it somehow seemed to fit the bill - hence, Opaque
Predicates! This led first to TR148 and eventually to the first published
paper on the topic, Manufacturing cheap, resilient, and stealthy opaque
constructs.
So, from Mesa, to Modula-2, to Zuse, to the
fundamental term of code obfuscation! But, maybe the term Opaque Type
predates Mesa? Was it used in the C community prior to the Mesa
design? Does anyone know?
February 13, 2020. Tigress version 3.1 is now available for download. It contains numerous bugfixes.
December 28, 2019. I sometimes get asked "which software protection tool/algorithm/product should I be using?" That is the wrong question to ask, or, at least, the wrong end in which to start asking the question. Software protection isn't a thing, it's a process. This process starts with a detailed attack model: what are your attackers after, how good are they, what tools are they likely to use, etc. Then you need to think about what overhead you are willing to absorb; the more protection you add, the more slowdown/space increase you can expect. And, even when you have settled on a particular set of protection tools, the process doesn't end there; you have to put a plan in place for when the attackers still break through your protections. A vendor who tells you to "set it and forget it, just apply our tool and you're done," is simply lying. A serious and trustworthy vendor is one who instead tells you that "all our techniques will eventially succumb to a serious adversary, but we monitor hacker groups to learn what techniques are about to become obsolete, and when they do, we have new tricks in our back pocket to roll out."
December 28, 2019. I would like to maintain an up-to-date list of all companies in the software protection space. This list has been moved here.
December 27, 2019. Getting started with Tigress can seem like a daunting task, given that there are some 250 different options to choose from! I added a page with "recipes" here. Hopefully this should give you some ideas on how to get started. If you use Tigress for something useful and would like to share your experiences, please send me your script and I will add it to the recipes page!
December 14, 2019. Because software protection in industry is mostly clouded in secrecy, it is hard for those of us in academia to get a grip on what are acceptable overheads and typical levels of protection afforded by industrial-strength tools. Here are two pieces of information, one from a Wikileaked document (a memo from Cloakware to Sony), and one from a presentation by my good friend Gu Yuan, formerly of Cloakware/IRDTO:
December 10, 2019. We managed to obfuscate and compile a small test program for Android NDK. No guarentees it will work well on a real platform for now, but we'd love to have some Android developers try it out! Have a look under Platforms/Androd for more details.
December 2, 2019. In Problems in Cryptocurrency: Five Years Later,
Vitalik Buterin writes: "... we want to come up with a way to
'encrypt' a program so that the encrypted program would still give the
same outputs for the same inputs, but the 'internals' of the program
would be hidden. ... A solution to code obfuscation would be very
useful to blockchain protocols. ... Unfortunately this continues to be
a hard problem. ... these paths are still quite far from creating
something viable and known to be secure."
It is
interesting to compare the goals of cryptographically secure
obfuscation with those of language-based
obfuscation. Those of us who work In the latter are plagued by
ridiculous security requirements paired with ridiculous performance
requirements. For example, in A
compiler-based infrastructure for software-protection, my
friends Liem, Gu, and Johnson at IRDETO/Cloakware state that in one
typical case, their performance budget (CPU and memory) was
50% over baseline. Now, we get a lot of flack from
the crypto community (in the paper In Pursuit
of Clarity In Obfuscation cited by Buterin, for example, the
author writes: "Well, maybe those commercial products for program
obfuscation work in practice. Let me try to break one. ... ten
minutes later... Oh, nevermind."), but in practice it is
hard to protect a program for more than 10 minutes when you're only
allowed a miniscule reduction in performance.
I'm not
aware of any work in language-based obfuscation that examines what
happens when you substantially loosen the performance
requirements. This makes sense since most of the applications have
been in areas where performance matters, such as DRM. So, this is the
question: if you were allowed not 50% overhead, but, say, 5
orders of magnitude overhead, what could you accomplish?
Would you be able to supply useful levels of security for high-value
assets, such as smart contracts?
November 28, 2019. The old site tigress.cs.arizona.edu has now been deprecated in favor of our new site, tigress.wtf. At the same time, we have completely reorganized the code and fixed numerous problems. Tigress is now built on top of the latest version of CIL, CIL 1.7.3 goblint, by Gabriel Kerneis. Tigress should now be easier to install - there is now just one (fat) package to download, containing binaries for all platforms.
November 20, 2019. We have added a transformation, SelfModify, which transforms a function into one that modifies itself at runtime. This transformation is currently only available for X86/64 targets. It can, of course, be combined with other transformations, but it is probably best run at the very end of the transformation chain. One interesting aspect of this transformation is that when combined with virtualization with a direct or indirect threaded dispatch, the indirect jumps to the instruction handlers are replaced with (self-modified) direct jumps! This should confuse analyses that rely on indirect branches to locate the instruction handlers. For dynamic analyses, this transformation shouldn't do very much; the instructions that are executed will be the same, except for the ones that modify the code. Analyses that assume that instruction addresses uniquely identify unique code will, of course, also be confused when a code location is reused for different instructions.