Module Overview

Programming for Big Data

Part-time / Level 9 / Online / 5 ECTS

Students taking this module will acquire the computer programming skills necessary to analyse and manipulate big data. Big data in this context refers to datasets that are too large to be handled by the software tools commonly used to analyse and manipulate data within a tolerable elapsed time. The context and challenges for processing large datasets form a core part of this course, such that the student will be able to select the appropriate approaches, tools or methods for big data problems in addition to being able to implement and evaluate solutions using a variety of programming tools and techniques.

 

 

  • Introduction to programming for big data

What is big data?

How is programming for big data different?

Distributed programming paradigms

Hadoop and HDFS

Map, Reduce, and Chained MapReduce Processes

Distributed programming tools for data storage and data analysis

  • Advanced Big Data Analytics and Machine Learning

Examine various distributed analytics and machine learning languages (Mahoot, Flink, Spark)

Spark architecture

RDDs

Creating Spark pipelines

Working with different data sources

Practical application of these various technologies for given problems and case studies

  • Big Data Storage Engines

Examine various distributed data storage engines for big data

Examine the use of Object Storage in big data environments

The module is designed to be delivered within a blended learning model, employing mixed modes (online and face to face) of learning, teaching and assessment.
TU059 will be delivered primarily in a face-to-face mode while TU060 will be delivered in a blended mode.
This module will employ teaching methods and learning situations in the traditional roles such as lectures, seminars and tutorials, as well as more innovative, Student-based learning methods such as problem solving in groups for both theoretical and practical situations.
Students will be encouraged to be pro-active in their approach to learning through the use of case studies and simulation exercises, working independently and in groups. In some cases students
will be expected to use computer-based learning material to supplement studies.
There will be a strong emphasis on the practical element of the module and this will be supported through the medium of supervised and independent practical sessions. Students will be able to
explore the characteristics, advantages and limitations of approaches learnt through their application to suitable case studies and simulation exercises. Where appropriate, students will provide
feedback from group research through cascading the knowledge to peers and through presentations. In-class discussions, review of leading research papers in each topic covered will also
contribute towards the practical content.
Guest lecturers from industry and academia will be invited where appropriate to expose students to how topics covered in this module are used within the broader area of data analytics.
The most appropriate distribution methods will be used to distribute materials to students, between students and from students, e.g. a VLE, blogs, Twitter, a forum.
Students will be expected to develop independence in, and responsibility for their own learning.

Module Content & Assessment

Assessment Breakdown %
Other Assessment (s) 100.00%

Contact school.cs@tudublin.ie for further information.

EU students: €230

Non-EU students: Contact international.city@tudublin.ie for more details.