Dynamic Forests & Load Balancing for Data Gathering in

Wireless Sensor Networks Aravind Ranganathan

University of Cincinnati WSN Characteristics Number of sensor nodes deployed over a given area.

Sensor nodes

Measure physical parameters:

Temperature, Humidity, Pressure, Motion, etc.

Wireless communication capabilities Energy constrained Commercially available motes One or more high power sink nodes

Accessed remotely

Perform data storage & processing

Gather information from rest of network

Multi-hop communication for conserving power

Nodes forward data for other nodes Applications Applications

Wireless Sensor

Networks volcano monitoring seismic monitoring Geologic Sensor Web Ecosystem Monitoring Sensor Web habitat monitoring Military applications Public systems firetowers sensor web weather forecasting buoy sensor web disaster response traffic control home automation healthcare industry battlefield surveillance Data Gathering in WSNs of Network lifetime: * * "Online": Sequence of source nodes not known a priori (as in most WSN applications)

- Maximizing network lifetime for online data gathering is known to be NP-Hard! Loss of coverage

Loss of communication paths to sinks

Thus, node failure results in network failure. - gathering sensed data from sensor nodes at the sinks

- fundamental operation of a WSN (deliberately vague) Load Balancing in WSN Forwarding a data packet contributes a unit load on the node. Problem Statement Shortest-path DAG Modeling Vertices of DAG: Sensor nodes and Sinks

Nodes arranged in depths/levels using "shortest-hop count" from sinks

Assigns for each node:

- a set of parent nodes

- depth from sink(s) Sample WSN and corresponding shortest-path DAG Parent selection based Data Gathering Parent selection: Each node forwards data to one of its parents

Data reaches sink in minimum hops

Parent selection generates data gathering forests in DAG Disadvantages of Static Parent Selection 2. No Load Balancing on Edges 1. Poor Nodal Load Balancing node forwards its congestion to a parent same parent edge chosen all the time 3. Optimizing Load Balancing (Lifetime)

is NP-Complete Balanced Forests problem is NP-Complete

(reduction from 3-Exact Cover) Thus, we consider dynamic parent selection approaches.

But given a WSN how do we form the DAG? Optimal Network Lifetime Number of data packets successfully received by sinks until first node failure in the network. Network Lifetime: Optimizing "Nodal" Network Lifetime Define the lifetime of a node:

- Expected # packets forwarded by node before it dies

Each node has a different expected lifetime

Generate data gatherings forests that:

- Maximize the minimum expected lifetime over all the nodes

MU-balancing: Optimal Load Balancing Distribute load evenly among the sensor nodes during data gathering. Topological Imbalance in a WSN: Sample WSN Fundamental Question 1. There exists M-balanced load vector for the given network

2. There exists an unique vector that achieves ultimate balance at the sink

i.e., there exists unique load vector that minimizes 2-norm of loads on the sink

3. Load vector that minimizes 2-norm also minimizes the k-norm (for k=3, 4, ...)

4. There exists a MU-balanced load vector Min-Max + Ultimate Load Balancing: Part of the self-organization of WSN

Our algorithm:

- Initiated by sink

- Flooding-based asynchronous breadth-first search Dynamic Parent Selection Algorithms * Based on Frederickson’s asynchronous centralized BFS trees construction algorithms. State-based approach:

1. Max-Min Path Energy Algorithm

2. Weighted Path Energy Algorithm

Probabilistic approach:

3. Probabilistic Balanced Forests Algorithm

4. Adaptive-Probabilistic Balanced Forests Algorithm

Introduction to

Wireless Sensor Networks (WSNs)

Online data gathering problem in WSNs:

Load balancing based approaches to maximize network lifetime. (focus on load balancing on nodes) 3 Benchmark Algorithms 3. Optimal "Offline" Algorithm:

Assumes that the source sequence is known before hand

Computes optimal data gathering forests using flow theory

- Uses Algorithm to compute Optimal Markov Probabilities with source

loads obtained from the node frequencies in the source sequence . Lower bounds for Performance 1. Random Parent:

Choose a parent randomly

Equivalent to equal probabilities on parent edges

2. "Greedy" Parent Energy:

Choose parent with maximum residual energy

Special case of Weighted Path Energy algorithm with = 0. (naive probabilistic approach) Performance Measures 3 Existing Algorithms 1. Network Lifetime Measure: Number of data packets successfully received by sink until first node failure Compute the final load on each node and each network link

- Number of packets forwarded until network lifetime is reached

Use Jain's fairness Measure (measured depth-wise) 2. a) Nodal Load-balancing Measure: : individual share on nodes : perfect balancing Use Jain's fairness Measure on:

Nodes loads at each depth 2. b) Link Load-balancing Measure: Algorithm 1: (static parent selection) Minimum-cost Forwarding

Assigns a static cost metric to each path from the node to sink

Forward data along the least cost path

Bottleneck Networks:

Networks with few points of heavy congestion

Few nodes determine the network lifetime (irrespective of the algorithm)

Complete-bottleneck: A single point of very high congestion. Load balancing on Nodes / Links Harder to compare: dependent on network lifetime

Another parameter -- "depth" (networks have varying depths)

Better load balancing at depths 1, 2, ... is favorable Our Theorems / Results Data Gathering & Load balancing in WSNs Shortest-path DAG Modeling &

Parent Selection based Routing Performance Evaluation Applications of our Work Amount of time for which the network functions properly. Effects of Node Failure: Load on a node: Parent selection based data gathering in a DAG (source) (source) (source) (source) (source) (source) (source) (source) (source) ____________

What parent selection results in maximum network lifetime? Static parent selection:

- Choose one parent & always forward to same parent

Dynamic parent selection:

- Forward to different parents at different times Parent selection (done locally) must maximize network lifetime (globally).

Two possible approaches for Parent selection: Find dynamic parent selection approaches that perform efficient load balancing and maximize network lifetime Problem Statement (modified) Ensure bi-directional communication between nodes

Regulate DAG construction level by level DAG Initialization Algorithm Given a WSN (after running the DAG initialization algorithm): ______ * Sample WSN Optimizing Network Lifetime: Delay first node failure by as much as possible When node forwards data:

Chooses parent with probability

If parent was chose, it then decrements by 1 (similar for )

When sum of and reaches zero, they are rest to original values

Problem Introduction Our Model and Algorithms Dynamic Forests & Load Balancing for Data Gathering in Wireless Sensor Networks Performance Evaluation Conclusions Research Summary 2. Load balancing Measure: Measure of fairness: Use Jain's fairness Measure on:

Load on edges (links) at each depth

(i.e. depth to ) (naive state-based approach) Upper bound for Performance (lower bounds & upper bound for performance) Similar model, assumptions and optimization criteria

Consider only Multi-hop routing:

- Not Hierarchical/cluster-based algorithms

- Cluster heads assumed to directly transmit to sink

Nodes must be homogeneous, not mobile, have a fixed transmission radius, no GPS

Must not have centralized computation

- Eg: Many tree-based data gathering algorithms use centralized tree construction

Must focus on optimizing network lifetime

- Multi-path routing algorithms focus on reliability, choose longer paths

Focus on data gathering at sinks, not one-to-one communication Which algorithms to compare against? Algorithm 2: (dynamic parent selection) Dynamic Load Balanced Trees (DLBT)

Similar to our probabilistic approch on the shorest-path DAG

Maximizing network lifetime by load balancing

Recomputations from descendants to compute forwarding probabilities

2-versions: DLBT-1 and DLBT-16

Simulation Restriction:

Avoid complete-bottleneck networks Large Sized Single-sink Networks Randomly constructed large single sink networks usually have a large maximum depth

Less realitic modeling of a real-world WSN Random network generation often yields bottleneck networks Random single-sink networks, size = 25, 30, 35, 40 nodes

- Existing algorithms defined for single-sink model

- Avoid complete bottleneck networks

Initial energy on the nodes: 5000 units

- Steady state performance

Consider 10 different source sequences for each network 10 Algorithms Compared Simulation Setup Probabilistic: Prob. Balanced Forests, adaptive-PBF

State-based: Max-min Path Energy, Weighted Path Energy

Benchmark: Random Parent, "Greedy" Parent Energy, Optimal Offline

Existing: Min-Cost Forwarding, DLBT-1, DLBT-16

We compare relative performance (wrt Optimal Offline algorithm)

Depth-wise load balancing until depth = 4 _____________ _____________ _____________ _____________ Network Lifetimes Observed network lifetimes were normalized by that of Optimal Offline Algorithm

Static routing (Min Cost Forwarding) performs poorest

- Lower bound benchmarks (dynamic algorithms) perform much better

Our algorithms achieve network lifetimes close to theoretical optimal network lifetimes

Relative lifetime gain from 10% - 50% over lower bound benchmark algorithms (dynamic)

(we consider networks with lesser bottlenecks)

Our probabilistic approach gives a overall better load balancing at depth 1

- Expected to achieve U-balancing at sink nodes (or depth 1)

As expected, Weighted Path Energy algorithm uses the relative loss in network lifetime for better load balancing at depths 2, 3, ...

Comparing relative load-balancing of our algorithms: Load balancing wrt benchmarks & existing algorithms: Our algorithms achieve better load balancing at depth 1 RWB Balance Diagrams Measures topological imbalance at a given depth

Identifies hot-spots (Red), cold-spots (Blue), balanced-spots (White)

Intensity of color indicates extent of imbalance Based on our results on U-balancing

Identify "ultimate-balanced" load for given depth

Compare with the load in case of perfect balancing

The difference gives the imbalance measure:

hot-spot: more than ideal load (>> 0)

cold-spot: less than ideal load (<< 0)

balanced-spot: exactly ideal load (= 0)

Used to improve the performance and the security in the network

Adding more nodes near hot-spots

Securing hot-spots against vulnerabilities RWB Balance Diagram

for a sample WSN @ depth 1 Applications: Computing the imbalance measure: (colder-spot) (cold-spot) (hot-spot) (hotter-spot) Applications to Other Networks State-based Approach Parent Selection depends on the current state of the network

Residual energy of nodes = Parameter for network state

- Inversely proportional to load on the nodes Computing PSF Values Assign a weight on each data gathering path

PSF of a node given by its "best-weight path" Weighting on a Data Gathering Path Updating PSF Values 1. Global PSF Update: Residual energy of node is constantly changing

PSF value of a node defined in terms of its parents

Node computes its PSF value from its parents

- Its child nodes must re-compute their PSF values

- Potentially affect all descendants (huge overhead) - Top-down recomputation initiated by

sink after receiving each data

- High overhead (expected) 2. Delayed PSF Update: - "Bottom up" updating of PSF values:

1. Nodes piggyback their PSF value and forward data

2. Child nodes "overhear" parents' PSF and update

- Leaves inaccurate PSF values at nodes - Bottom-up PSF update (Delayed Scheme)

- Periodic Top-down recomputation

initiated by sinks 3. Refreshed PSF Update: WPE Routing: Computing Optimal Generated random networks:

- Nodes deployed randomly over a unit square area

Network Size: 50, 100, 150, 200, 250 nodes (10 in each)

Initial energy on the nodes: = 1000 units

Consider 10 different source sequence

- Sequence in which source data is generated

Varied : 0, 0.1, 0.2, ... , 1

Use = 0.15 for WPE routing Comparing PSF Update Schemes Global PSF Update, Delayed PSF Update, & Refreshed PSF Update

Compare network lifetime and overhead Network Size: 50, 100, 150, 200, 250 nodes (10 in each)

- = 500, 1000, 1500, 2000, 2500 units

Consider 10 different source sequences

Global PSF Update expected to perform best but also have the most overhead.

Comparing decrease in lifetime and decrease in overhead for Delayed and Refreshed PSF Update schemes:

Results for MPE routing Less than 1% decrease in lifetime

Over 85% decrease in overhead Use Refreshed PSF Update scheme For both Delayed/ Refreshed Update: State-based Approach "PSF" value of a node:

Measure of "How good is the node for forwarding data?"

Dependent on the residual energies of the nodes in the network. prob. prob. Routing Table at a Node Parent selection strategy: parent parent parent PSF PSF PSF Choose the parent with the best "PSF" value. 1. How to compute PSF values?

2. How to update PSF values? (residual energies are constantly changing) (PSF values must indicate goodness of the node) 4 paths from source node 10 7 8 Each path has a weight Each node keeps track of the best path in its PSF value 2 Nodes keeps track of best path How to define weights on the paths? 1. Max-min Path Energy Algorithm

2. Weighted Path Energy Algorithm Max-min Path Energy (MPE) Routing Uses the max-min fairness measure commonly used in networks for equal sharing of resources. Max-min Path Energy: - Each node has multiple paths to sinks

- Identify "weakest" node in the path

- Choose path with "strongest weakest node".

(uses node with most residual energy) PSF value of node , is given by: : energy on sinks : residual energy on the node (least residual energy) PSF Computation: 2 3 5 2 4 3 5 4 8 2 5 3 4 5 PSF computation in a sample WSN with residual energies Max-min Path Energy algorithm:

"weakest node" typically at depth 1

Expected to load balance nodes at depth 1 better

Weighted Path Energy algorithm:

Expected to perform better load balancing at depths 2, 3, ... Weighted Path Energy (WPE) Algorithm Previous algorithm considers only the "weakest node" in the path

Path Weighting in WPE includes residual energy of all nodes in the path

Capture "overall path strength"

WPE Distributed Definition: : path strength parameter (Use = 0.15 for WPE routing) (residual energy of all nodes) Comparing the two algorithms: Less than 1% decrease in lifetime

Over 85% decrease in overhead Use Refreshed PSF Update scheme For both Delayed/ Refreshed Update: "Greedy" Parent Energy Selection Greedy Parent Energy Selection:

Forward to parent with maximum residual energy

Does not give optimal network lifetime

Does not consider nodes further up the path 2 8 6 5 2 State-based Algorithms Design Sample residual energies on the nodes

in a WSN Probabilistic Approach Define Markov probability distribution P on the DAG

- Probability distribution on the parent edges of each node Parent selection strategy: Choose parent randomly based on P Compute P that achieves optimal expected load balancing (0/1-P yields static parent selection) Equal (Uniform) Probabilities may not give

Optimal Load Balancing Equal probabilities yields:

E[L(a)] = 2, E[L(b)] = 6

Optimal load balancing gives:

E[L(a)] = E[L(b)] = 4

Example: Probabilistic Approach parent prob. prob. parent parent prob. Routing Table at a Node Load - Flow Equivalence Computing the optimal flow in DAG Theorem:

f P = _____ uv f uv u

Then,

E[L(v)] = 30 units

E[L(w)] = 20 units 1. Compute "optimal flow"

2. Define P from the computed flow (using above result)

Then, the expected load in the DAG from P will also be "optimal". We use flow theory to compute a "MU-balanced" flow 1. Let each node have a source flow of "f" units.

2. For unit source flow from a node:

find an augmenting path to the sink with the least flow such that:

- the path does not increase the maximum flow on nodes, depth-wise.

3. Repeat step 2 until all souce flow reaches sinks

4. Compute P from the flow

The expected load corresponding to this P is expected to be "MU-balanced"

Algorithm for computing the Optimal P: (ultimate balance at sink) (max-min balance) Initial source flow "f": Given a WSN, using the initial energy on the nodes as the capacities, we compute the maximum possible value as "f" for which the max-flow in the network equals the entire source flow.

Probabilistic Balanced Forests (PBF) Routing Adaptive Probabilities Version Extended initialization phase:

1. Shortest-path DAG construction

- Assigns parent nodes for each nodes, probabilities unassigned

2. Sink gathers parent information from each node

- Sink can recreate the DAG structure

3. Sink computes the optimal probability distribution P

- Using the "Algorithm for computing Optimal Markov probability P"

4. Sink sends the parent probabilities for each node

Probabilistic Algorithms: 1. Probabilistic Balanced Forests (PBF) Routing

2. Adaptive-Probabilistic Balanced Forests (a-PBF) Routing Advantages:

No routing overhead after the initialization process.

The assigned probabilities are expected to yield optimal load balancing when the network lifetime is reached.

Expected vs. Actual Outcomes:

With probabilstic approach:

observed outcomes may sometimes be significantly off from expected outcomes

Example: 55 Heads / 45 Tails in 100 tosses of a unbiased coin.

b u Probabilistic parent selection:

1/2 a Node u maintains the parent probabilites in its table

Expected: Node a & Node b receive 50% packets each

Observed: Node a = 55%, Node b = 45% 1/2 Adaptive-PBF Algorithm: Same as PBF Algorithm except:

- Nodes maintain "parent flow values" instead of parent probabilities in their table

- These "parent flow values" are the flow values computed in the algorithm. b u = 50 = 50 a b a f f 50 / 100 50 / 100 a P b P = 1/2 = 1/2 u a b f f a a a f b a f + f f / ( ) a b Selecting parent uniformly randomly does not balance load optimally How to compute P such that the resulting expected load is optimally balanced?

Use flow theory to define P. Consider a network flow f in the DAG. Consider a Markov probability P, defined by: Then, the expected load on the nodes is equivalent to the node flows in f. Computing Optimal Markov probability P from the flow: 1. What is "optimal flow"?

2. How do you compute "optimal flow"? w 30 v Net Flow-in = 50 units w u Given a source flow at the nodes Example: 20 v uw 3/5 = 3/5 P uv u P 2/5 = 2/5

We say that the load on the nodes is optimally balanced only if it is both:

a) Min-Max balanced (M-balanced):

Lexicographically minimizes the maximum load on the nodes depthwise.

b) Ultimate balanced at sinks (U-balanced):

Minimizes the 2-norm of the load on the sink nodes.

* Based on results of Harvey et. al. on optimal semi-matching in bi-partite graphs.

We show that there exists a unique load vector that achieves "ultimate balancing" An optimal load balancing yields an optimal network lifetime. The converse is not true. M-balancing: Min-max balancing (M) + Ultimate (U) balancing at sinks: m1 m2 m3 m4 Max-Load @ depth s1 s2 * Example: U-balancing: lexicographically minimize(m1, m2, m3, m4) ||S|| = [ (s1) + (s2) ] minimize 2-norm of S=(s1, s2) 2 2 1/2 2 Results and algorithms applicable to other networks

Wireless Mesh Networks (sink) Architecture of a WMN

3 types of nodes:

Mesh Gateways

Mesh Routers

Mesh Clients

Mesh routers are not energy constrained.

Load-balancing and fairness are equally important issues

Our Algorithms Formalize the shortest-path DAG modeling

- Shortest-path DAG initialization algorithm

Dynamic Parent-selection approaches:

- State-based approach

-- MPE Algorithm, WPE Algorithm

-- PSF Update Schemes

- Probabilistic approach

-- PBF Algorithm (adaptive version)

- Optimal Offline Routing Algorithm

Performance evaluation:

- Consider Benchmark algorithms, Existing algorithms

- Measures: Network lifetime, Load balancing (nodes, links)

- Establish better performance of our algorithms

RWB Balance Diagrams 1. Balanced Forest problem is NP-Complete (reduction from 3-Exact cover)

2. Static parent selection is inefficient (NP-Hard to optimize)

3. Global Network Lifetime vs. Nodal Network Lifetime

Network Lifetime: Optimizing load balancing optimizes network lifetime. Converse is not true. Lifetime vs. Load balancing: Probabilistic Approach: 1. Expected load on a node is given by that of its child nodes weighted by the

corresponding edge probabilities.

2. Equivalence between flow and load (when P is computed from flow) Future Research Directions Establishing approximation bounds for probabilistic approach

- Similar to optimal offline routing

Extending to similar problems in other network

- Wireless mesh networks

Probabilistic: Focus on load balancing the links

- MUL-balancing:

-- Min-Max balancing + Ultimate @ sinks + Link-balancing

State-based:

- WPE routing:

-- Define a different path strength parameter

Testbed implementation

Genetic-Algorithm based approach to compute optimal Markov probabilities (instead of flow theory) Energy consumption in the network:

Sensor nodes use energy for: sensing and forwarding data

When a sensor node runs out of energy: "failed" or "died"

Efficient data gathering maximizes the lifetime of the network. Load Balancing on Links source node Provides greater reliability, security and trust

Data forwarded along multiple paths malicious node Advantages of Load balancing Malicious node does not receive all data

Better fault/intrusion detection

Greater reliability using Erasure codes:

- Convert data (k symbols to n symbols) and forward

- Cannot reconstruct original data with <k packets!

* Example: Load balancing on links Increases Network Lifetime:

Nodes have even energy consumption

Nodes run out of energy uniformly

Increases Network Throughput:

Decreases congestion at one node

Decreases network delay * We show that optimal load balancing optimizes the network lifetime as well. Load Balancing on Nodes Load balancing in a WSN: Nodes closer to sinks forward a lot more data

- Expected to generate more load

Perfect load balancing is usually not achievable

Example: Heavy load at a node (source) (source) (source) _____________ Why not forward to the parent with the most residual energy? Choosing parent u causes weaker energy distribution in the network. u u v w v w u v u w u v w 5 6 Simulation results: Note: Markov probability distribution P yields an expected load on the nodes Net flow = 100 units ____________ Choose with probability 50/100

Choose again with 49/99

Choose with 50/98 a a b Simulation Restriction:

Use Networks of Smaller Size Load balancing on Nodes Load balancing on Links Parent selection determines the energy consumption, network lifetime & load balancing Result: Conclusions Our data gathering approach:

Shortest-path DAG modeling

Dynamic parent selection based data gathering

- State-based and Probabilistic approaches

Related results and algorithms

Evaluation:

Achieve greater network lifetime

Achieve better load balancing in the network

- Greater lifetime, reliability, security & trust

Applicable to other networks (proved/verified correctness) Thank you!

### Present Remotely

Send the link below via email or IM

CopyPresent to your audience

Start remote presentation- Invited audience members
**will follow you**as you navigate and present - People invited to a presentation
**do not need a Prezi account** - This link expires
**10 minutes**after you close the presentation - A maximum of
**30 users**can follow your presentation - Learn more about this feature in our knowledge base article

# Dynamic Forests and Load Balancing for Data Gathering in Wireless Sensor Networks

Technical Presentation

by

Tweet