Module Overview

Data Science 2

This module introduces the learner to the principles and practice of modelling in Data Science, focussing on dimensionality reduction and Bayesian methods. The module will have a focus on applying these principles to scientific and, in particular, physical data.

Module Code

PHYS 2004

ECTS Credits

10

*Curricular information is subject to change
  • Dimensionality reduction with principal components analysis (PCA):

    • Vector algebra and vector spaces;

    • Similarity and distance metrics;

    • Matrix operations – scaling, rotation, reflection, projection;

    • Singular value decomposition;

    • Covariance; Orthonormal basis and statistical independence;

    • Centering, standardization and rotation; scores and loadings;

    • Biplots and scores plots; Interpretation.

  • Probabilistic inference:

    • Overview and basics of Bayesian Inference;

    • Theory of Bayesian linear models;

    • Prior specification;

    • Bayesian Computing methods;

    • Bayesian Model Criticism and Selection.

    • Classification with Bayesian methods and Discriminant Analysis

  • Evaluation of modelling performance;

    • Training and testing strategies; Hold-out method, cross-validation, boot-strapping;

    • Selection of classification metrics; class distribution, confusion matrix, sensitivity and specificity, F-measure, kappa, MCC, ROC curve; ROC analysis

    • Variable selection techniques, forward and reverse selection

  • Decision trees

    • Introduction to splitting criteria (Gini, Entropy) within greedy decision trees;

    • Information Gain;

    • Development of DTs; Prediction of discrete and continuous targets;

    • Pruning and optimisation with cross-validation; Iteration;

    • Validation and performance evaluation;

    • Consensus models and Random Forests;

Programming will be taught in the computer laboratory, and with supplemental lectures. The module will use the computer laboratory throughout the syllabus to achieve as much as possible subject matter interaction.

Module Content & Assessment
Assessment Breakdown %
Other Assessment(s)100