An introduction to the field of data analysis, including its positioning in the wider world and the main methods and concepts of both its statistics and machine learning disciplinary facets
Data preparation
Types of data sources, including structured and unstructured; types of data; detection and fixing of errors in data; normalisation and scaling; integration of disparate sources; data type conversion; attribute merging and creation; missing data imputation; introduction to multiple imputation.
Data summarisation
Statistical description of data using measures of central tendency (mode, median, mean, percentiles), of variability (range, inter-quartile range, variance, standard deviation, mean absolute deviation) and of shape (skewness, kurtosis); distributions; graphical representation of data; confidence intervals.
Data relationships
Measures of relatedness, both for numeric and categorical data, such as correlation coefficients, entropy and the chi-square test; the connection between the attribute-target relationship and potential for prediction.
Data models
Comparative overview of statistical and machine learning models; the purpose of modelling data; model creation; regression; logistic regression; model evaluation.
Hypothesis testing
The purpose of hypothesis testing; parametric vs. non-parametric tests; uni-variate and multi-variate tests; statistic and distribution tests; examples e.g. T-test, ANOVA; test application in machine learning.
Module Content & Assessment | |
---|---|
Assessment Breakdown | % |
Formal Examination | 50 |
Other Assessment(s) | 50 |