Loading…

RoyalSociety Using Formal Methods to Eliminate Exploitable Bugs

Kathleen Fisher

Updated Oct. 7, 2016

Transcript

Using Formal Methods to Eliminate Exploitable Bugs

Kathleen Fisher

Professor and Chair, Computer Science Department

Tufts University

Original Program Manager, DARPA's HACMS program

QUESTIONS?

Thanks to:

Within Reach?

Critical parts of critical systems are built out of verified components & their composition is verified.
Buggy software is no longer an easy attack vector.

Black Hat is reduced to non-scalable attack vectors:

Taking Stock

Lessons Learned

On-Going Research & Challenges

Producing and validating models of real systems

x86, LLVM, Linux, Browser APIs, POSIX interfaces

Increasing automation
Scaling
Proof engineering
Integrating with normal development processes
Getting buy-in/adoption & training

"Exhaustively testable pseudo-code"

[Using Formal Methods at Amazon Web Services]

Handling concurrency

Don't verify existing code artifacts

seL4, CompCert, FSCQ, certiKOS, Rocksalt, Nucleus, QUARK

Use static analyzers to eliminate obvious bugs before starting formal verification.
Don't verify all code:

Secure essential infrastructure & contain the rest.

QUARK, Verve/Nucleus, Rocksalt

Use DSLs to generate code and correctness proofs

Rocksalt, Ivory/Tower, SpiralGen, BilbyS

Composition enables scaling

Facebook INFER; UC-KLEE, Rockwell Collins Workbench; mCertiKOS

Automation is essential:

Tactic libraries and SMT/SAT solver integration

FSCQ, Nucleus, mCertiKOS

What software is worth verifying?

seL4 Microkernel

General purpose, operating system microkernel.
Implemented and proven correct in Isabelle/HOL.
Size: 10K LoC, 480K LoP
Time to build: 13 (8) person years
Speed: 206 vs 227 cycles in 1-way IPC fastpath
Machine-checked theorems include
Access-control enforcement
Non-interference
Compilation to binary
IPC fast-path correctness

Available open source.

CompCert Verifying C Compiler

Subset of C used by aviation industry
Implemented and proven correct in Coq.
Size: 42K LoC+P
Time to build: 3 person years
Speed: 2x speed of gcc -O0,

7% slower than gcc -O1,

12% slower than gcc -O2

Poised to become the compiler for Airbus software.

Separation Kernel: seL4 [SOSP 2009, ToCS 2014]
Hypervisor: mCertiKOS [POPL 2015]
RTOSes: eChronos [echronos.org] ORIENTIAS [ICECCS 2012]
C Compiler: CompCert [POPL 2006]
File Systems: FSCQ [SOSP 2015], BilbyFS [OpSysRev 2014]
Web Browser: QUARK [USENIX 2012]
Browser Sandbox: RockSalt [PLDI 2012]
Crypto Algorithms: SHA-256, HMAC [USENIX 2015]

AES-128, SHA-384, ECDSA(NIST P-384) [HILT 2013]

Garbage Collector: Nucleus [PLDI 2010]

All verified to be functionally correct!

Formal Methods: An Overview

Formal Method Based Tools

What are "Formal Methods"?

Functional

Correctness

Formal methods are best described as the application of a fairly broad variety of theoretical computer science fundamentals ... to problems in software and hardware specification and verification.

Understanding Formal Methods, Jean-Francois Monin, 2003

No Run-Time

Errors

"I know it when I see it" —Justice Potter Stewart

Strength of Guarantee

Characteristics:

Based on math
Machine-checkable
Capable of proving properties of code and models
But, read the fine print!
Assumptions may be unreasonable.
Guarantees may be too weak.

Type

Safety

Automatic/Unlimited

PhD Years/Kloc

Level of User Effort/Scalability

Notional graph

Some Evidence:

DARPA's HACMS Program

Using Formal Methods to produce more secure vehicles.

The Setup

Baseline Security Assessment

Red Team:Attacker could crash legitimate ground control station & hijack quadcopter in flight.

Program thesis:

FM can yield vehicles less susceptible to remote cyber attack.

Threat model:

No physical access, full knowledge of system and source code.

Out-of-scope:

Hardware assumed to be correct.

Experimental Platforms:

Arducopter and Boeing's Unmanned Little Bird (ULB)

The Evolving SMACCMCopter

The SMACCMCopter: 18-Month Assessment

The SMACCMCopter flies:
Stability control, altitude hold, directional hold, DOS detection.
GPS waypoint navigation 80% implemented.

Air Team proved system-wide security properties:
The system is memory safe.
The system ignores malformed messages.
The system ignores non-authenticated messages.
All “good” messages received by SMACCMCopter radio will reach the motor controller.

Red Team:
Found no security flaws in six weeks with full access to source code.

Penetration Testing Expert:The SMACCMCopter is probably “the most secure UAV on the planet.”

Hypothesis:

Formal Methods can eliminate many exploitable vulnerabilities.

Been there, tried that, wasn't impressed.

Networked computers are unacceptably insecure

Survey Results on Using Formal Methods

Source: Formal Methods: Practice and Experience, ACM Computing Surveys, October 2009

Formal Methods:

Not just for implementation bugs!

Formal methods can help with other security challenges as well:

Faulty design

Ex: Amazon Web Services use of TLA+

Buggy specifications
Ex: Rockwell Collins detecting unencrypted comm channel
Side-channel information leaks

Ex: NICTA analysis of seL4

Dependence on 3rd party software

Why so insecure?

Today

many things are networked computers.

Many Exploitable Vulnerabilities

Ubiquitous and Pernicious

Exploit Kits leverage software bugs

Phoenix Exploit Kit, 2010

Remote code execution on all Windows platforms (July 20, 2015). Caused by a buffer underflow in the Adobe Type Manager Library.

https://technet.microsoft.com/en-us/library/security/ms15-078.aspx

CVE-2009-0836: Buffer Overflow

CVE-2009-0927: Missing Bounds Check

CVE-2009-1869: Integer Overflow

CVE-2010-0188: Unspecified

CVE-2010-0840: Unspecified

CVE-2010-0842: Invalid Array Index

CVE-2010-1297: Memory corruption

CVE-2010-1818: Unmarshalling bad ptr

CVE-2010-1885: Mishandling escapes

CVE-2010-2883: Buffer overflow

Faulty Design
Buggy Specifications
Implementation Errors
Side-channel leaks
Misconfiguration
Gullible users
Weak passwords
Malicious insiders
Physical security failures
Reliance on 3rd party software
Faulty/malicious hardware
And the list goes on...

http://heartbleed.com

In 2012, 3,436 vulnerabilities had public exploits available. 42% of total number of vulnerabilities.

Heartbleed: Missing bounds check in OpenSSL allows theft of secret keys, user names and passwords, instant messages, emails, documents without leaving a trace.

Enabling the masses to launch pernicious attacks at scale.

Source: The Exploit Intelligence Project, Dan Guido

Source: IBM X-Force 2012 Trend and Risk Report

And attackers take control with distressing ease.

and essentially all computers are networked.

Why is now a good time to revisit hypothesis?

More automation

Faster Hardware. More memory.

More infrastructure

[A] significant part of the effort in existing projects was spent on the further development of verification tools, on formal models for low-level programming languages and paradigms, and on general proof libraries...Future efforts will be able to build on these tools and reach far-ranging verification goals faster, better, and cheaper.

Gerwin Klein, Formal OS Verification—An Overview

Picking 80 problem point, the best time has dropped from 1000 (2002) to 40 seconds (2010).

Astrée

Source: Daniel Le Berre, The International SAT Solver Competitions

Source: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

We have found that testing the code is inadequate as a method to find subtle errors in design, as the number of reachable states of the code is astronomical. So we looked for a better approach.

Use of Formal Methods at Amazon Web Services, 2014

Bind, QEMU

Symbolic Execution

(FuzzBALL, KLEE)

Impediments to Using Formal Methods

Performance Consequences

The Required Level of Effort

The Problem of Expertise

Verified code is not intrinsically slower, but verifying faster code can be more time consuming.

Significant overhead in terms of lines of code/proof, but level of effort is becoming a reasonable investment for certain kinds of software.

seL4 Separation Kernel [SOSP 2009; ToCS 2014]

10K LoC, 480K LoP; 13(8) person years

CompCert Verifying C compiler [POPL 2006]

42K LoC+P; 3 person years

FSCQ File System [SOSP 2015]

24K LoC+P; <5 person years

certiKOS Hypervisor [POPL 2015]

3K LoC, 18.5K LoP; 1 person year

SHA-256/HMAC [USENIX 2015]

407 LoC, 14.4K LoP; not reported

Rocksalt Sandbox [PLDI 2012]

100 LoC, 10K LoP; <2 person years

Nucleus Allocator/Garbage Collector [PLDI 2010]

6K LoP+C; 0.75 person years

QUARK Web Browser [USENIX 2012]

5.5K LoP+C; 0.5 person years

seL4 Separation Kernel [SOSP 2009; ToCS 2014]

206 vs 227 cycles in 1-way IPC fastpath

CompCert Verifying C compiler [POPL 2006]

2x speed of gcc-O0, 7% slower than gcc -O1, 12% slower than gcc -O2

FSCQ File System [SOSP 2015]

Performance roughly 80% of xv6 file system.

certiKOS Hypervisor [POPL 2015]

<2x slowdown on most lmbench benchmarks

SHA-256/HMAC [USENIX 2015]

Performance equal to OpenSSL 0.9.1c (March 1999)

Rocksalt Sandbox [PLDI 2012]

1M instructions/second; faster than Google's checker

Nucleus Allocator/Garbage Collector [PLDI 2010]

“competitive performance on macro-benchmarks when compared to native garbage collection.”

QUARK Web Browser [USENIX 2012]

24% overhead wrt WebKit baseline on the top 10 Alexa Web sites

Source: Formal Methods: Practice and Experience, ACM Computing Surveys, October 2009

Keeping Up

The Fine Print

Time to produce a proof can be serious impediment to adoption.

All proofs come with assumptions.

Violate those assumptions and all bets are off.

(Many of the artifacts we've considered lack features.)

John Regehr et al found 325 bugs in various C compilers, including CompCert [PLDI 2011].

CompCert bugs were in:

the unverified front end (subsequently fixed & verified)
a hardware model

Facebook's INFER sound static analyzer processes millions of lines of code and thousands of diffs per day.

4 hours for full Android/iOS code base
<10 minutes on average for single diffs

But, proves only the absence of null pointer exceptions and resource leaks.

"The striking thing about our CompCert results is that the middle-end bugs we found in all other compilers are absent. ...CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task."

Source: Moving Fast with Software Verification, NASA Formal Method Symposium, 2015

Source: Finding and Understanding Bugs in C Compilers, PLDI 2011

http://xkcd.com/1559/

Type systems (C, Java, Haskell, ...)

John Launchbury

DARPA

Gerwin Klein

NICTA

Gary McGraw

Cigital

Greg Morrisett

Cornell

Aaron Tomb

Galois

Mike Walker

DARPA

Andrew Appel

Princeton

Adam Chlipala

MIT

Darren Cofer

Rockwell Collins

Drew Dean

SRI

Dan Guido

Trail of Bits

Joe Hendrix

Galois

OSEK-Certified Vehicle OS (China)

Nucleus Garbage Collector (Microsoft)

Automatic Theorem Provers

(Alt-Ergo, VCC, Z3)

Open source: autopilot and tools available from http://smaccmpilot.org

Source: DYI Drones

Comprehensive Formal Verification of an OS Microkernel, ACM Transactions on Computer Systems, February 2014.