Module Review

Machine Learning for Data Analytics

Part-time / Level  8 / Online / 5 ECTS

The module aim is to provide the student with an understanding of machine learning techniques and data analytics. The module begins by introducing the student to the core-concepts in machine learning, it then reviews for different families of machine learning algorithms (information based, similarity based, probability based and error based algorithms).

The module concludes with a review of the best practice in designing evaluation experiments for machine learning. Throughout the module there is a emphasis on grounding the module content in real world examples and linking the material covered to real world problems. At the end of the module the student will have an excellent foundation in the fields of machine learning and data analytics.

 

 

  • Introduction to Machine Learning: Define the machine learning task of learning from data, distinguish between supervised and unsupervised machine learning, explain how machine learning is an ill-posed problem and the need for inductive bias if learning is to take place, explain the problems that can go wrong with machine learning (underfitting and overfitting), introduced the bias-variance trade-off, the no-free lunch principle, and the need to generalise beyond the given dataset.
  • Introduction  Supervised Learning:the curse of dimensionality and feature selection, the stationarity assumption and i.i.d data, distinguish between lazy versus eager learners, distinguish between classification and regression tasks, overview approaches to learning: Probability, Reducing Error, Analogy, Information
  • Learning using information (entropy based approaches):review information theory and entropy and show how these concepts can be used to develop decision tree classification models. Explain how decision trees can be used for regression problems. Explain how tree-pruning can improve models generalizability, and explain techniques for tree pruning. Introduce model ensembles in terms of boosting and bagging techniques and Random Forest models.
  • Similarity based learning: Introduce the concepts of a feature space and nearest-neighbour algorithms. Review a range of indexes for measuring similarity between data points (e.g., Miknowski distances, cosine similarity, russell-rao, sokal-michener, mahalanobis, inter alia). Introduce different methods for making similarity based models robust to noise (k-nearest neighbour models, weighted k-NN).
  • Probability based learning:Review basic probability including marginalization, conditional independence and Bayes' Theorem. Introduce the Naive Bayes classifier and the Maximum A Posterior classification criterion. Review methods for handling continuous features within a probabilistic framework (e.g., binning and probability density functions). Introduce Bayesian Networks models and Markov Chain Monte Carlo techniques.
  • Error based learning: Review a range of error functions, including the sum of squared errors. Introduce the method of least squares and the gradient descent algorithm for fitting a multivariate linear model to a dataset. Demonstrate how to train a logistic regression model for binary classification and how this model can be extended to multi-nominal classification. Review how linear models can learn non-linear relationships through the use of basic functions and introduce the Support-Vector Machine approach to classification.
  • Introduce the best practice in model evaluation,including the problem of peeking and the design of evaluation experiments (e.g., hold-out test set, cross-validation methods). Review a number of evaluation metrics and match these metrics to characteristics within a dataset. Explain the problem of concept-drift and review methods for evaluating models after deployment.

This module can be delivered either through standard delivery or blended delivery. In standard delivery this module is delivered through a series of lectures with associated practical assignments. In blended delivery this module is delivered through a series of live and recorded lectures with associated laboratory work and practical assignments. Both blended and standard delivery have the same overall number of teaching and self-directed learning hours.

 

 

Assessment Breakdown %
Formal Examination 70.00%
Other Assessment 30.00%

 

Start date: January 2025

Contact school.cs@tudublin.ie for further information.

 

EU students: €230

Non-EU students: Contact international.city@tudublin.ie for more details.