Module Overview

Text Analysis

The aim of this module is to give learners an understanding of how natural language processing (NLP) techniques and machine learning techniques can be applied to text analytics.

Module Code

COMP H4016

ECTS Credits


*Curricular information is subject to change

Overview & Methodology

Text & Web Mining methodologies such as adapting CRISP-DM. Application domains for text mining & case studies.

Applying structure to textual information

Extracting structure using NLP techniques such as Tokenisation, Phrase recognition, Named Entity Recognition & Concept Chunking. Vector generation.

Web mining

Crawling the web. Web content mining; Web usage Mining. Information Extraction. Semantic Web.


Learning methods for classification such as similarity measures; nearest neighbour methods; decision rules; probability based methods.Evaluating model performance.


Measures of similarity.Methods for clustering documents such as k-means clustering; hierarchical clustering;

Big Data analytics

Introduction to the challenges of working with Big Data and Large scale file systems. Adapting standard analytics techniques to run in a distributed environment. Emerging trends.

Module Content & Assessment
Assessment Breakdown %
Other Assessment(s)40
Formal Examination60