Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Abstractions Considered Harmful

No description
by

Erik Stenman

on 10 September 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Abstractions Considered Harmful

and other opinions about writing maintainable code
Abstractions Considered Harmful
Introduction
or know your algorithm
The Complexity of Complexity
The symptom
The hunt
the problem
the fix

...Unless you can avoid a crash.
Dropping Errors
Sympton
The Problem
More problems
Saved by luck
Take away
and some other problems...
Losing Sharing
Symptoms
The hunt
The test
The fix
The cause
Take aways
Symptom
Dangers of Abstractions
Symptoms
Finding the bug
Finding the real bug
The problem
Takeaways
One day our system died.
The symptom is just the tip of the iceberg
Write readable code
Side effects of side effects
Transactions
Counters
email
file writes
DSLs are not for your domain
How to do it
Callbacks
Behaviours
Callback interfaces
New Syntax
Known domains with specified languages
Logging
Use it.
Tag everything.
Be explicit
Be direct
Erik Happi Stenman
HappiHacking

Learn from the mistakes of others.
You can never live long enough
to make them all yourself.
-- Groucho Marx?
"Don't believe everything you read on the internet."
-- Abraham Lincoln
Smart people learn from their mistakes.

Geniuses learn from others' mistakes.

-- Unknown
I will tell you a story...
... or several stories ...
... so you know how not to do IT.
n = 1
After upgrading to the latest version of Erlang/OTP
we started to see congestion and dropped calls when
the load was high.
Looking at our monitoring:
There was no process that stod out.
There was nothing strange in the logs.
Things where just slow.
We had to use GDB to see that most of the
emulator time was spent in the functions
generating stack traces.

... and the call stack contained the
explicit stack trace bif.

With R15 stack trace generation now also
generated line numbers.

We looked through the code for use of
this bif and found some contenders.
The Artificial Problem
I don't care about all these errors...
filter(F, X, Y) ->
try F(X, Y) of
true -> {true, X}
_ -> false
catch
_:_ ->
ST = get_stacktrace(),
throw {error, ST, {X, Y}}
end.
I don't care about all these errors
I just want the answers...
match(X, X) -> true.

get_matches(List1, Answer) ->
F = fun (X, Y) -> match(X,Y) end,
all(List1, Answer, F).

all([X|Xs], Y, F) ->
try filter(F, X, Y) of
{true, X} -> [X];
false -> []
catch
_:_ -> []
end ++ all(Xs, Y, F);
all([], _, _) -> [].
Take Away
Don't throw away errors if you don't know what they mean.

Don't get a stacktrace unless you are crashing and logging.

Don't use throw() when you mean error()

Don't write layers of libraries hiding too much info.

Let it crash.
At some point the system started
going down when new code was
loaded.

This seemed to happen about every
second time we loaded new code.

Fortunately we had 5 servers and just
about half of them fell over each time.
No trace
We could not see anything in the logs.
We could not see anything in the monitoring tools.
There was no crash dump.
There were no error messages.

We had a log of how far into the upgrade we were...


... the problem came while loading new code for our DSL.
We wrote a program that kept on loading
modules generated by our DSL compiler.

Our DSL had backtracking...
... implemented with try-catch.



Disclaimers
n=1. These are just my experiences
your milage may vary.

I have changed and rewritten most of
the code to protect the not so innocent.

I have written this kind of code
myself many times.

This is not really my opinions but actual facts...
possibly clouded by the fog of old memories.
beam_catches_cons: no free slots :-(
When trying this test program from the shell
Erlang died and printed:
Yep, including the sad face.
A grep of the ERTS source quickly found the sad face a couple of lines below the code:
/* XXX: should use dynamic reallocation */
#define TABSIZ (16*1024)
The problem was in how we started erlang in our running system.

We started it in a detached mode without any shell.

All stderr output ended up in /dev/null

We now run Erlang through run_erl
Another problem was that we were running an old version of Erlang.
- fprintf(stderr, "beam_catches_cons: no free slots :-(\r\n");
- exit(1);
+ /* No free slots and table is full: realloc table */
+ tabsize = 2*tabsize;
+ beam_catches = erts_realloc(ERTS_ALC_T_CODE,
+ beam_catches,
+ sizeof(beam_catch_t)*tabsize);
+ i = high_mark;
+ high_mark++;
Yes!
Take Away
Use run_erl
Use latest version of ERTS
Don't send errors to /dev/null
Use more than one machine...
... and keep them a bit out of sync
so they don't all crash at the same time.
I know... it isn't clear why this is a case against abstraction it got lost in my effort to make the code less abstract and presentable...
We could see that
memory was increasing
just before the crash.
There were no crash dump files...
... probably because the system
was pushing 256GB in memory footprint.
We did have some monitoring.
We found a process using a lot of
memory.

We did have logs.
We found that that process was
monitoring an XML parser.
It turned out to be a parse error of an XML file.

The parser died.

The monitor got the parser state.

The state became huge when sharing was expanded.
If a list is used in several places all that is repeated is the tagged pointer to the list:
00000000000000000000000001000001
L = [104, 101, 108, 108, 111],
T = {L, L}.
ADR BINARY VALUE DESCRIPTION
144 1010000 00000000000000000000000001000001 128+CONS
140 1001100 00000000000000000000000001000001 128+CONS
136 1010100 00000000000000000000000010000000 2+ARITYVAL
This is nice...

... until you do

a send
or IO (any deep copy)

... then the sharing
is expanded.
share(0, Y) -> {Y,Y};
share(N, Y) -> [share(N-1, [N|Y]) || _ <- Y].

timer:tc(fun() -> test:share(10,[a,b,c]), ok end).
{1131,ok}

test:share(10,[a,b,c]), ok.
ok

byte_size(list_to_binary(test:share(10,[a,b,c]))), ok.
HUGE size (13695500364)
Abort trap: 6
Sharing
This was bad but not really what caused the crash.
We got a huge spike in memory.

Pushing beyond 256GB of ram.

No Erlang system had done that before.

We wrote a set of test programs to stress the
memory allocations.

When 1 process allocated more than 32 GB of mem
the GC crashed.

There where some subtle uint32/uint64 bugs...

Richard Carlsson patched things up.
Take aways
Avoid pushing deeply shared terms to other processes.

Don't crash while handling shared terms.

Be careful when pushing a system to places no Erlang system has gone before.

Monitor memory usage.

Log processes.
So... one day our system ran out of memory.
We did have some memory in the machine.
I think 1 TB at the time.
We did have monitoring.
We could see which process.
We did have logs.
We could see what that process did.

It compared two sets.
This is a bit simplified...
... but basically the function tried to see if
one set of names matched another set of names.
Imagine a name could be
Names = "Erik Nils Stenman"
and it needed to match
FullName = "Mr Nils Erik Mårten Stenman"

The function basically had the following algorithm:
permute(Names) and compare to permute(FullName)
Now... when you permute a list you get N! different lists.

This is not a problem if you have e.g. 5 names in a list.

5! = 120.
The thing is, if you compare all your lists to all lists in the FullName you get

n! * m!

For n=m=5 you get 14.400 combinations.



For n = m = 7 you get 25.401.600

For n = m = 12

you get

~2.3 *10 ^17
A simple sort fixed the problem.


(There where a few details left out in this presentation which complicated things (and had led to the permutation solution))
Take Away
Know your algorithms,
and your complexity.
If you really need a permutation... avoid
generating it all in advance. Do it lazily.
It is so much fun to define your own
language.

Do not do it!
Don't use callbacks
Callbacks makes the code harder to read.

(list_to_existing_atom(binary_to_list(B)):type()

CB:F(Args)
Don't use macros either
-
module
(time_wrap).
-
export
([new/0, new_page/1, export/1, delete/1]).
-
include_lib
(
"my_log.hrl"
).

-
define
(
wrapper_0
(Name),
Res = pdf:Name(),
?logm(
"~p us process: ~p pdf: ~p "
??
Name
,
[my_time:now_micro_sec(), self(), Res]),
Res).

-
define
(
wrapper_1
(Name, Arg),
?logm(
"~p us process: ~p pdf: ~p "
??
Name
,
[my_time:now_micro_sec(), self(), Arg]),
pdf:Name(Arg)).

new
() -> ?
wrapper_0
(new).
new_page
(Pdf) -> ?
wrapper_1
(new_page, Pdf).
export
(Pdf) -> ?
wrapper_1
(export, Pdf).
delete
(Pdf) -> ?
wrapper_1
(delete, Pdf).

-
module
(time_wrap).
-
export
([new/0,new_page/1,export/1,delete/1]).

new
() ->
Res = pdf:new(),
log:log(
"~p us process: ~p pdf: ~p new"
,
[my_time:now_micro_sec(),self(),Res]),
Res.

new_page
(Pdf) ->
log:log(
"~p us process: ~p pdf: ~p new_page"
,
[my_time:now_micro_sec(),self(),Pdf]),
pdf:new_page(Pdf).

export
(Pdf) ->
log:log(
"~p us process: ~p pdf: ~p export"
,
[my_time:now_micro_sec(),self(),Pdf]),
pdf:export(Pdf).

delete
(Pdf) ->
log:log(
"~p us process: ~p pdf: ~p delete"
,
[my_time:now_micro_sec(),self(),Pdf]),
pdf:delete(Pdf).

-module(game_behaviour).

-callback init(Args :: list(term())) ->
'ok' | {'error', Reason :: string()}.

-callback action(Event :: atom(),State :: map())->
'ok' | {'error', Reason :: atom()}.

-callback get_state(State :: map()) ->
ExtenalState :: map().

-callback type() -> Type :: string().

-module(call_game).
-export([init/2, action/3
, get_state/2, type/1]).
-export([b_init/2, b_action/3
, b_get_state/2
, b_type/1]).

init(M, Args) -> M:init(Args).
action(M, Event, State) ->
M:action(Event, State).
get_state(M, State) -> M:get_state(State).
type(M) -> M:type().

mod(B) ->
list_to_existing_atom(
binary_to_list(B)).

b_init(M, Args) ->
(mod(M)):init(Args).
b_action(M, Event, State) ->
(mod(M)):action(Event, State).
b_get_state(M, State) ->
(mod(M)):get_state(State).
b_type(M) ->
(mod(M)):type().


(list_to_existing_atom(binary_to_list(B)):type()
call_game:b_type(B).
"I find that when someone's taking time to do something right in the present, they're a perfectionist with no ability to prioritize, whereas when someone took time to do something right in the past, they're a master artisan of great foresight."
--Randall Munroe
yes | sudo gdb -f -quiet -ex "thread apply all bt" -ex q ../otp/bin/x86_64-unknown-linux-gnu/beam.smp $BEAMP | grep "#0"
export BEAMP=`ps -af | grep -v grep | grep beam | awk 'BEGIN{}{print $2}'`

Find your beam process
Attach GDB to the process
Thank you Björn-Egil!
{
}
,
h e l l o
twohundrand and thirty quadrillions
A program is written once,
but read several thousand times
by other people!
Even a Swede writes comments in English.
For a Really Heavy Toolbelt
Full transcript