**Quick Category Theory Intro for Software Developers, part 1**

**Quick Category Theory Intro for**

Software Developers (1)

Software Developers (1)

What is Category Theory and how does

it help me with day-to-day coding activities?

What is it?

Somewhat esoteric math study that is difficult to state in layman's terms, but as applied to computer science, it is roughly:

The study of functions and how they can be transformed and combined.

Function Domains

and Codomains

In Algebra, the domain of a function is the set of inputs it is valid for, and the range is the set of outputs it can generate.

In Category Theory, the Range is called the CoDomain.

For function composition, the range of the first function should match the domain of the second, or at least be a subset. If this isn't the case,

there will be trouble.

Function Classification

Describes Domain and CoDomain.

Referential Transparency

A function has no side effects (i.e. it is "pure").

It takes input (from its domain) and generates an output (in its codomain) and

does nothing else

.

It does not modify global variables, update databases, read stdin, write to stdout, or launch missiles.

Simple examples:

def f(x) = x * 2 // int-to-int

def g(x: Int): Int = x + 5

strlen(s: String): Int

Even polymorphic functions are morphisms (i.e. "generics", "templates", etc.)

Composition Notation

When combining two functions together, there is a special notation using a dot separator, where:

f(g(x)) = (f.g)(x)

[The dot is usually in the middle height-wise, but in ascii just a period is used.]

Read this as:

"f applied to the result of: g applied to x"

(right to left)

Domain Mismatch Example

f(x) = sqr_root(x)

g(x) = x * (-1)

f(64) = 8

(g.f)(64) = -8

(f.g)(64) = ??!

Injective

All inputs generate a

unique

output, but not all

possible

outputs. A monomorphism.

f(x) = x * 2 // injective

g(x) = abs(x) // no: abs(1) == abs(-1)

Implies that the function is reversible, but there may be more values in the output type than can be generated from the inputs.

h(x) = ord(x) // injective

Surjective

All possible outputs

can

be generated, although some outputs may be generated from

multiple

different inputs. An epimorphism.

f(x) = x * 2 // surjective

g(x) = abs(x) // not surjective (no negatives)

h(x) = sqr_root(x) // not surjective (no negatives)

cos(x) // surjective in CoDomain of -1..1

Not reversible.

Bijective

Every input value has exactly one output value (i.e. it is Injective and Surjective). An isomorphism.

Same number of inputs and outputs. Frequently reversible.

f(x) = x * 2 // bijective

g(x) = abs(x) // not bijective

cos(x) // not bijective (multiple inputs -> same output)

Uses?

Simplifies code (prevents violation of the "Area of Concern" rule).

Allows function composition.

Makes operations idempotent.

Allows operations to be temporally shifted (i.e. delayed or pre-computed as needed), which helps optimizers.

Databases and I/O are Useful!

Yes, they are. Any "real world" interaction is not referentially transparent, including:

Keyboard input

Database values

Disk I/O

Network traffic

Random numbers

All of these change the state of "stuff" outside the code.

Any function you can't call an infinite number of times is not referentially transparent).

So, which is it?

Keep the two separated!

Any non-referentially-transparent code should be separated out as independent functions which should have the minimum possible functionality.

Most changes occur in the referentially transparent code!

Testing

Testing referentially transparent code is easy!

No side effects, no external dependencies.

Just a set of inputs and outputs.

Same transformation.

Every time.

Testing

Injective: focus on inputs, for each input output must be unique.

Surjective: focus on outputs and verify surjections

Bijective: random sample testing useful, especially if reversible.

Some functions do not fit any of these 3 types.

f(x) = if (x < 0) 0 else (x mod 10)

Multiple inputs generate a single output (all negative values return 0), so not injective.

There is no input that can generate an output value > 10, so it is not surjective.

Category of Sets

A Class of objects is mapped to a new class of objects by a transformation, known as a "morphism".

A morphism is a function.

Composition

Single functions are easy. Putting them together is more interesting. This is called "composition".

def f(x) = x * 2

def g(x) = x + 5

Composing our two functions together:

(f.g)(x) = f(g(x)) = (x+5)*2

Why Use Composition?

Compiler optimization

f(x) = x * 2

(f.f) = (x * 2) * 2 = x * 4

Build complex systems from simple parts (maintains Separation of Concerns).

Increase both extensibility and re-useability.

Graphically

(sqr_root . strlen)

STRING

INTEGER

FLOAT

strlen

sqr_root