Module Overview

Data Analysis

An introduction to the field of data analysis, including its positioning in the wider world and the main methods and concepts of both its statistics and machine learning disciplinary facets

Module Code

DATA H3008

ECTS Credits

5

*Curricular information is subject to change

Data preparation

Types of data sources, including structured and unstructured; types of data; detection and fixing of errors in data; normalisation and scaling; integration of disparate sources; data type conversion; attribute merging and creation; missing data imputation; introduction to multiple imputation.

Data summarisation

Statistical description of data using measures of central tendency (mode, median, mean, percentiles), of variability (range, inter-quartile range, variance, standard deviation, mean absolute deviation) and of shape (skewness, kurtosis); distributions; graphical representation of data; confidence intervals.

Data relationships

Measures of relatedness, both for numeric and categorical data, such as correlation coefficients, entropy and the chi-square test; the connection between the attribute-target relationship and potential for prediction.

Data models

Comparative overview of statistical and machine learning models; the purpose of modelling data; model creation; regression; logistic regression; model evaluation.

Hypothesis testing

The purpose of hypothesis testing; parametric vs. non-parametric tests; uni-variate and multi-variate tests; statistic and distribution tests; examples e.g. T-test, ANOVA; test application in machine learning.

Module Content & Assessment
Assessment Breakdown %
Formal Examination50
Other Assessment(s)50