Abstracting Abstract Control

J. Ian Johnson and David Van Horn
DL ACM Extended version, arXiv

The strength of a dynamic language is also its weakness: run-time flexibility comes at the cost of compile-time predictability. Many of the hallmarks of dynamic languages such as closures, continuations, various forms of reflection, and a lack of static types make many programmers rejoice, while compiler writers, tool developers, and verification engineers lament. The dynamism of these features simply confounds statically reasoning about programs that use them. Consequently, static analyses for dynamic languages are few, far between, and seldom sound.

The “abstracting abstract machines” (AAM) approach to constructing static analyses has recently been proposed as a method to ameliorate the difficulty of designing analyses for such language features. The approach, so called because it derives a function for the sound and computable approximation of program behavior starting from the abstract machine semantics of a language, provides a viable approach to dynamic language analysis since all that is required is a machine description of the interpreter.

The AAM recipe as originally described produces finite state abstractions: the behavior of a program is approximated as a finite state machine. Such a model is inherently imprecise when it comes to reasoning about the control stack of the interpreter: a finite state machine cannot faithfully represent a stack. Recent advances have shown that higher-order programs can be approximated with pushdown systems. However, such models, founded in automata theory, either breakdown or require significant engineering in the face of dynamic language features that inspect or modify the control stack.

In this paper, we tackle the problem of bringing pushdown flow analysis to the domain of dynamic language features. We revise the abstracting abstract machines technique to target the stronger computational model of pushdown systems. In place of automata theory, we use only abstract machines and memoization. As case studies, we show the technique applies to a language with closures, garbage collection, stack-inspection, and first-class composable continuations.