Module Overview

Software Frameworks for Large Data Sets

This module provides an introduction into programming with frameworks that are designed for distributed processing of large data sets across clusters of computers. The module will describe big data referring to its volume, velocity, and variety and how software is utilised for handling such large volumes of unstructured data.
It will demonstrate how to scale software from single servers to multiple servers and its implcations on computation and storage. Students will also learn to use software to detect and handle application failure, so delivering high-availability across a cluster of computers.

The module will utilise a state of the art software framework designed for handling large data sets (e.g., hadoop, HPPC systems, spark, etc.). The teaching and learning will be based on practical implementations and problem solving related to the challenges described above.
 

Module Code

SDEV 4010

ECTS Credits

5

*Curricular information is subject to change

Software framework architecture/ecosystem and common utilities for large data

Distributed file systems – clusters, nodes, read/writes, data integrity/replication, fault tolerance 

MapReduce – processing/generating large data sets, map APIs, failover 

Job scheduling and cluster management – fair scheduler, user queues

Data warehousing – data summarisation, data types/schemas, query language

Parallel processing – parallel evaluation, execution modes

Structured data storage – schema design, optimise read/write

Multi-master databases – data replication, eventual consistency 

Data mining – clustering, classification
 

Content

Lectures/labs, discussion, practical examples, problem-solving exercises, project work, self-directed learning.Note, computer labs must have the relevant software installed and available to students.

Lectures/labs, discussion, practical examples, problem-solving exercises, project work, self-directed learning.

Note, computer labs must have the relevant software installed and available to students.
 

Module Content & Assessment
Assessment Breakdown %
Other Assessment(s)100