(SDEV 4010) Software Frameworks for Large Data Sets

This module provides an introduction into programming with frameworks that are designed for distributed processing of large data sets across clusters of computers. The module will describe big data referring to its volume, velocity, and variety and how software is utilised for handling such large volumes of unstructured data.
It will demonstrate how to scale software from single servers to multiple servers and its implcations on computation and storage. Students will also learn to use software to detect and handle application failure, so delivering high-availability across a cluster of computers.

The module will utilise a state of the art software framework designed for handling large data sets (e.g., hadoop, HPPC systems, spark, etc.). The teaching and learning will be based on practical implementations and problem solving related to the challenges described above.

*Curricular information is subject to change

What will I learn?

Software framework architecture/ecosystem and common utilities for large data

Distributed file systems – clusters, nodes, read/writes, data integrity/replication, fault tolerance

MapReduce – processing/generating large data sets, map APIs, failover

Job scheduling and cluster management – fair scheduler, user queues

Data warehousing – data summarisation, data types/schemas, query language

Parallel processing – parallel evaluation, execution modes

Structured data storage – schema design, optimise read/write

Multi-master databases – data replication, eventual consistency

Data mining – clustering, classification

Content

Lectures/labs, discussion, practical examples, problem-solving exercises, project work, self-directed learning.Note, computer labs must have the relevant software installed and available to students.

How will I learn?

Lectures/labs, discussion, practical examples, problem-solving exercises, project work, self-directed learning.

Note, computer labs must have the relevant software installed and available to students.

How will I be assessed?

Module Content & Assessment
Assessment Breakdown	%
Other Assessment(s)	100

Module Overview

Software Frameworks for Large Data Sets

Module Code

ECTS Credits