with Interactive Theorem Provers

Tjark Weber

Department of Information Technology, Uppsala University

14 September, 2011

System Overview

Validation

Automation

SMT

propositional logic

equality

uninterpreted functions

arrays

quantifiers

linear arithmetic

bit-vectors

data types

SAT

QBF

Boolean variables

negation

disjunction

conjunction

propositional logic

Boolean quantifiers

Demo

LCF-style Theorem Proving

Theorems are implemented as an abstract data type.

There is a fixed number of constructor functions - one for each axiom/inference rule of higher-order logic.

More complicated proof procedures must be implemented by composing these functions.

Selected HOL4 Inference Rules

**Sequent Clauses**

Resolution, LCF-style

Q-resolution is propositional resolution followed by forall-reduction.

Q-Resolution

Propositional Resolution

How can we implement this efficiently on top of existing LCF-style kernels?

Q-Resolution, LCF-style

Example

Forall-Reduction, LCF-style

Z3 uses 34 different inference rules.

Satisfiability Modulo Theories

Proof Structure

Proofs are directed acyclic graphs.

Each node corresponds to an inference step.

Children are premises of inferences. Leaves are axioms/input formulas.

A designated root note concludes .

Proofs can be checked by depth-first postorder traversal.

1. A single primitive inference/theorem instantiation.

2. Combinations of primitive inferences/theorem instantiations.

3. Automated proof procedures (e.g., for linear arithmetic).

4. Combinations of the above.

Combinations of Primitive Inferences

Avoiding Automated Proof Procedures

Combinations of primitive inferences

Instantiation of schematic theorems

Generalization wrt. background theories

Memoization

Speed-ups of up to 3 orders of magnitude!

**Evaluation**

**Integrating SAT, QBF and SMT Solvers**

with Interactive Theorem Provers

with Interactive Theorem Provers

**http://www.cl.cam.ac.uk/~tw333/**

SAT

QBF

SMT

LCF-style proof checking is feasible for SAT, QBF and SMT. It scales to large formulas and proofs with millions of inferences.

Naive solutions are slow - an efficient implementation is crucial! Also, some prover kernel optimizations were necessary.

Our integrations provide a high degree of automation in Isabelle and HOL4, enable new applications, and provide high assurance for the external solvers' results.

Conclusions

Evaluation on 69 invalid QBF problems from the 2005 fixed instance and 2006 preliminary QBF-Eval data sets

up to 131 alternating quantiers, 24,562 variables, 35,189 clauses

For QBF validation, HOL4's name-carrying kernel is about 75 times faster (after optimizations) than the kernel that uses de Bruijn indices internally.

QBF certificate validation in HOL4 is about 25 times faster than proof search with Squolem.

All certificates are checked successfully!

Certificate validation in Isabelle/HOL is about 19 times slower than proof search with Z3.

Timeouts are due to theory reasoning; Z3's proofs are not detailed enough.

75% of SMT-LIB problems are checked successfully!

Complex systems almost inevitably contain bugs.

Formal Verification

Program testing can be used to show the presence of bugs, but never to show their absence!

Edsger W. Dijkstra

Interactive Theorem Proving

A theorem prover is a computer program to prove theorems.

Given a precise description of a system, and a formal specification of its intended behavior, we obtain a computer-checked proof that the system meets its specification.

Lemma “ ”

Proof

assume “ ”

from this obtain x where X: “ ” ..

fix y

from X have “ ” ..

then show “ ” ..

qed

Interactive Proof

Notable ITP Projects

L4.verified: a machine-checked proof of the functional correctness of the seL4 microkernel

Flyspeck: a machine-checked proof of the Kepler conjecture

Lemma “ ”

by auto

With 431 million adult victims and at an annual price of $388 billion, cybercrime costs the world significantly more than the global black market in marijuana, cocaine and heroin combined.

Norton Cybercrime Report, Sept. 2011

Software errors cost the U.S. economy $60 billion annually.

NIST, 2002

**Applications**

**Tarski-Kleene Algebras**

**C++ Memory Model**

Algebra is helpful for modelling and analysing computing systems.

concise models

abstract semantics

equational reasoning

[JAL 2009]

[ITP 2010, ITP 2011]

[ITP 2010, CPP 2011]

Large TKA repository with over 1000 lemmas

Hierarchy of algebraic structures

High degree of automation (via SMT)

Important models (relations, languages, ...)

Point-free and point-wise reasoning

Theorems for free: dualities, instantiation

Automated Algebraic Theory Engineering

in Isabelle/HOL

[RAMiCS 2011, ICFEM 2011]

Solver Integration

... with special thanks to my colleagues and co-authors: Hasan Amjad, Mark Batty, Jasmin Blanchette, Sascha Böhme, Simon Foster, Anthony Fox, Walter Guttmann, Ramana Kumar, Scott Owens, Susmit Sarkar, Peter Sewell, Thomas Sewell, Georg Struth, and many others

[JAL 2009]

[ITP 2010]

[ITP 2010]

A new C++ standard (C++11) was approved on 12 August, 2011.

It contains a memory model that defines how threads interact through shared data.

Example: Store Buffering

atomic_int x = 0;

atomic_int y = 0;

x.store(1, seq_cst);

y.load(seq_cst);

y.store(1, seq_cst);

x.load(seq_cst);

Can both loads observe the old value of 0?

The memory model defines various relations over a program's memory actions.

happens-before

reads-from

modification-order

sequentially-consistent

[POPL 2011, PPDP 2011]

Different choices for these relations give rise to different executions.

...

...

...

A mechanised semantics of C++ concurrency in Isabelle/HOL and HOL4

CppMem: a tool for understanding what multi-threaded C++ programs will do

A proof of correctness for a compilation strategy to x86-TSO

Refinements to the Standard that will improve the language

Mathematizing C++ Concurrency