How Does It Work?
This is a small introduction to CrossHair’s implementation.
Background
We recommend the following reading, depending on your familiarity with other work in this space.
First, we can’t recommend https://www.fuzzingbook.org/ enough. CrossHair’s approach largely corresponds to the chapter on concolic fuzzing. Some parts of the symbolic fuzzing chapter are relevant as well. Unlike concolic execution, CrossHair’s values start each path execution as purely symbolic values, and do not get concrete values until they are needed.
CrossHair uses the Z3 SMT solver to perform its deductions. This Python-specific introduction is a good place to get a feel for what it does. Please note that we’ll say “Z3” below, but many statements about Z3 apply to all SMT solvers and/or the SMT-LIB language.
The Basic Idea
CrossHair repeatedly calls the function you want to analyze and passes in special objects that behave like things your function expects.
CrossHair doesn’t do any kind of AST or bytecode analysis. It just calls your function.
These “special objects” behave differently on each execution. That’s how CrossHair explores different paths through your function.
These special CrossHair objects hold one or more Z3 expressions which are used in the Z3 solver.
When your function takes a parameter with some python type (say, int
) CrossHair will
supply an object of a custom CrossHair type (SymbolicInt
), which holds a Z3
expression having some Z3 sort (IntSort()
). Here are some examples:
Python Type |
CrossHair Type |
Z3 Sort |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Let’s Explore
First, we need to create a context manager that manages a symbolic execution.
Normally, you’d do this by saying with standalone_statespace:
, but since we’re in
an interactive terminal, we’ll just open the context manually:
>>> from crosshair.core_and_libs import proxy_for_type
>>> from crosshair.core_and_libs import NoTracing
>>> from crosshair.core_and_libs import standalone_statespace
>>> space = standalone_statespace.__enter__()
We can initialize a CrossHair object by providing a type and a name:
>>> symbolic_x = proxy_for_type(int, 'x')
This object acts just like a regular integer:
>>> type(symbolic_x)
<class 'int'>
But it isn’t a real integer. We can see the real object better by entering a “NoTracing” context manager:
>>> no_tracing = NoTracing()
>>> no_tracing.__enter__()
>>> type(symbolic_x)
<class 'crosshair.libimpl.builtinslib.SymbolicInt'>
We can access the .var
attribute of any CrossHair object to get
the Z3 variable(s) that it holds:
>>> symbolic_x.var
x
>>> type(symbolic_x.var)
<class 'z3.z3.ArithRef'>
This takes the Z3 variable we just defined and adds one to it:
>>> import z3
>>> expr = symbolic_x.var + z3.IntVal(1)
>>> expr
x + 1
>>> type(expr)
<class 'z3.z3.ArithRef'>
We can create CrossHair objects not only for fresh variables, but
also for Z3 expressions.
So, if we wanted to wrap x + 1
back into a CrossHair object,
we’d write:
>>> from crosshair.libimpl.builtinslib import SymbolicInt
>>> symbolic_x_incr = SymbolicInt(symbolic_x.var + z3.IntVal(1))
Now, let’s exit the NoTracing context so that we can return to the illusion that
symbolic_x
is just a regular integer:
>>> no_tracing.__exit__()
The SymbolicInt
class defines the __add__
method, so we can add one just by
saying it in regular Python (i.e. symbolic_x + 1
).
SymbolicInt
does the necessary unwrapping and re-wrapping:
>>> (symbolic_x + 1).var
x + 1
SymbolicInt
also defines the comparison methods so that they return symbolic
booleans:
>>> (symbolic_x >= 0).var
0 <= x
So far, everything is symbolic. But eventually, the Python interpreter needs a real value; consider:
if symbolic_x > 0:
print('bigger than zero')
Should this execute the print or not? When python executes the if
statement, it calls __bool__
on the SymbolicBool
object. This method
does something very special. It consults Z3:
If the Z3 boolean expression must be True (or False), just return that value.
Otherwise, decide it to be True or False randomly. Take that decision and add it to the set of Z3 constraints for this execution path. Return the (concrete) bool that we decided.
CrossHair will remember what decisions it has made so that it can make different decisions on future executions. Ultimately, we’re looking for some target thing to happen: an exception to be raised, or a postcondition to return False. When that happens, we ask Z3 for a model and report it as a counterexample.
That’s the core of how CrossHair works.
Devil in the Details
Simple right?
Well, if there is an accomplishment about CrossHair, it’s that it tries hard to get the details right. And there are a lot of details.
Here are some of the topics that aren’t yet discussed. Reach out to help us prioritize documenting them!
Balancing the amount of work done inside and outside the solver.
Developing heuristics for effective path exploration.
Dealing with the cases that Z3 cannot. (concrete/symbolic scaling)
Interpreting logic that’s implemented in C.
Reconciling semantic differences between Python and Z3.
Dealing with mutable values.
Dealing with potentially aliased mutable values (x is y).
Creating symbolics for your custom classes.
Reconciling error behavior (ValueErrors, TypeErrors).
Implicitly converting types accurately.
Managing evaluation order. (under-approximation and over-approximation tactics)
Creating symbolics for base classes, or even for
object
.