(COMP H4016) Text Analysis

The aim of this module is to give learners an understanding of how natural language processing (NLP) techniques and machine learning techniques can be applied to text analytics.

*Curricular information is subject to change

What will I learn?

Overview & Methodology

Text & Web Mining methodologies such as adapting CRISP-DM. Application domains for text mining & case studies.

Applying structure to textual information

Extracting structure using NLP techniques such as Tokenisation, Phrase recognition, Named Entity Recognition & Concept Chunking. Vector generation.

Web mining

Crawling the web. Web content mining; Web usage Mining. Information Extraction. Semantic Web.

Classification

Learning methods for classification such as similarity measures; nearest neighbour methods; decision rules; probability based methods.Evaluating model performance.

Clustering

Measures of similarity.Methods for clustering documents such as k-means clustering; hierarchical clustering;

Big Data analytics

Introduction to the challenges of working with Big Data and Large scale file systems. Adapting standard analytics techniques to run in a distributed environment. Emerging trends.

How will I be assessed?

Module Content & Assessment
Assessment Breakdown	%
Other Assessment(s)	40
Formal Examination	60

Module Overview

Text Analysis

Module Code

ECTS Credits