CMC: A Pragmatic Approach to Model Checking Real Code/KennKnowles
From GradTurkey
Response to CMC: A Pragmatic Approach to Model Checking Real Code
[edit] State Space Issues
I'm new to model checking, but I saw so many more possibilities in each kind of heuristic.
- States are hashed once visited, but must be kept whole in the queue. If states are large (are they, usually? I don't know) then would storing a trace of the transitions, maybe from some periodic reference point, be smaller? The paper states that space rather than time is the limiting factor, but is it the size of the queue of unvisited states or the size of the hash table of visited states?
- For equivalent heap structures, C and C++ make their job difficult, but imagine a language where the heap was opaque, then any two heaps that are isomorphic graphs with the same data are the same. Maybe heaps in the queue could be compressed by storing their shape separate from their non-pointer data.
- I gather than the normal model checking thing to do is to define only non-equivalent states in one's model, but when checking code, discovering equivalent states could go a long ways. Deriving equivalence relations from the checking process or the safety predicate or whatever...
- Data types often have even more equivalent forms than that; there has been work on "Quotient Types" which is a bit beyond me in terms of category theory but the intuition is simply that you take an ordered list and "mod out" by the order to define an equivalence relation on lists. Likewise for binary search trees, etc. The observational equivalence according to their public interface is what matters (and even that might be too specific). Does model checking automatically hadle some of this?
- Dealing with large state space is one thing, but their examples have essentially infinite state spaces. Perhaps the infinite state space has a finite number of equivalence classes, but otherwise must correctness be defined inductively? On what well-founded order? Succesor?
[edit] Complaints
- CMC catches use-after-free by overwriting freed stuff with random values, but isn't it simpler to overwrite with special "error" values?
- Modelling the environment of the program is still a non-trivial amount of work, but perhaps since environments have more in common than programs it can be done once and packaged up.
- They caught bugs such as not properly checking that allocation succeeded, and invalid packets breaking things. There are almost certainly lighter-weight analyses that could catch both of these (basic information flow, and some kind of regex type, respectively) The deep bugs they caught are still nice.
[edit] Misc Notes
- It was an interesting epiphony for me that state spaces are much larger for erroneous programs. Perhaps according to some equivalence relation on states that collapses the erroneous ones this would not hold.
