Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Week 3

TASK :

Word Count

Week 2

Week 6

Prime Number Count

TASK:

TASK

25/26

Machine Learning - Classifiers

Week 9

Take away

  • Schedules differences
  • Mini talks complemented knowledge
  • Helpful on retakes
  • Slow but Sure
  • Amazing course structure

Encoding

Kmer count

Familiarize with snapshots

Minimizing collisions by k-mer

subset selection

Week 3

We had more time cause of the smaller sample size

  • Getting a better grasp of Hadoop & Spark concepts
  • Finally, Coding!
  • Master Crashed? RE-Networking

Install Spark

12/26

  • Rate of training and test: Cross Validation

  • Python Versioning – Our last headache

  • Finally adding new nodes

Parallelizing training set with RDD: Big performance improvement

Take I2DL and Machine Learning lectures

24/26

  • Changing the replication factor to 1,

loosing a node and crying over a bucket of ice cream

Using ‘links’ to view the spark UI

18/26

11/26

Support Vector Machine

Learn Ansible

Random Forest Classifier

Maddy found a clever way to make it work

17/26

Week 7

TASK:

Decision Tree Classifier

Encoding

Week 2

23/26

Hadoop Ecosystem

Mapreduce Limitations

Week 2

Ansible

10/26

Minimizing collisions by varying encoding scheme

We decided to explore multiple encodings best of which was the 7bit group encoding yielding a 9.02% collision

9/26

  • Silly mistakes, re-computation

  • Memory issues, cleaning the system

Time crunch

20/26

19/26

Week 5

TASK :

Week 1

Setup

We further explore k-mer counting

Week 8

TASK :

A

We start to hash the k-mers

K-means

Multi-Node

Bisecting K-means

TASK :

We look into optimizing the spark-submit

Week 5

Hadoop

This was the first time we experimented

with a different configuration

Temporarily lost the master node PANIC!!!

We documented the fix for that

Finally started working on scala with sbt

L

D

A

16/26

Temporary loss of teammate

15/26

Hamming Distance

MCA

Week 8

Environment

  • Data sparsity is a big problem

  • Do NOT be afraid to try new things
  • Go for the cleaner approach to quantify the data

Lots

of

math!!

Best K – Using Highest squared Euclidean distance

K means: 5

Bisecting K-mean: 15

22/26

Week 1

Hafiz Sameeullah

Data

Node 2

Data

Node

1

Hadoop

Multi-Node

Installation

Informatics - Master of Science (M.Sc.)

Expectations

Master

Lrz. Large

  • Under the hood Data mining
  • Digestion of distributed-systems-buzz
  • It’s goanna be boring

Learning Outcome

VS

mapred-site.xml

core-site.xml

  • Linux World
  • Looking for the right tools
  • Different ways to look at the same problem
  • Python, Scala, Dask

Data

Node 3

yarn-site.xml

hadf-site.xml

Hadoop Configuration

21/26

5/26

TEAM 3

8/26

Distributed

Mohammad Shaharyar Shaukat

Jonathan Narvaez

Informatics - Master of Science (M.Sc.)

Data Engineering and Analytics

General concepts

  • Distributed systems IN2259

General concepts

  • Networking and Virtualization
  • Foundations of Data Engineering
  • Distributed Systems

  • Basic Linux administration skills

L

A

B

Expectations

  • Apply Theoretical Concepts
  • Map Reduce

  • Apply Theoretical Concepts
  • Hands on - Hadoop ecosystem

D

A

T

A

3/26

6/26

7/26

Mining

Group 3

Madhavendra S. Negi

Aka Maddy

Informatics - Master of Science (M.Sc.)

Expectations :-

  • Learn new tools and languages
  • Learning to work in a distributed environment

2/26

4/26

Week 4

TASK :

Working with Dask

Distributed Data Mining Lab

Cluster Scale out vs Scale up dilemma

Kmer Count

29.07.2020

Week 4

  • Run time comparison for same problem using different tools
  • Memory Problems
  • Learning curve for Dask and Scala
  • Reconfiguration of Cluster

14/26

13/26

Learn more about creating dynamic, engaging presentations with Prezi