Module Overview

Data Set Management

The aim of this module is to introduce the student to techniques and tools for data validation, manipulation and cleaning (sorting, counting, reformatting and aggregating).  It also teaches students how to check for and deal with errors, missing values and inconsistencies in a data set.

Case studies will be used so that students can apply these techniques to real data.  Students will also gain experience in assessing the relevance and suitability of a data set for a specific data analysis.

Module Code

MATH 4002

ECTS Credits

5

*Curricular information is subject to change

Microsoft Excel

- Work with formulae, copy formulae using absolute, relative and mixed addresses.- Assess a data set using descriptive statistics: to include, AVERAGE, MEDIAN, MIN, MAX, QUARTILE.INC, VAR, STDEV, CORREL.- Count number of cases, identify missing data and outliers, clean data:  to include, Sort, Filter, Remove Duplicates, Flash Fill, COUNTIF, AVERAGEIF.- Recode, categorise new and existing variables.- Date/time functions: to include, DATE, DATEVALUE, DAY, MONTH, YEAR, WEEKDAY, WEEKNUMBER.- String functions: to include, MID, CONCATENATE, EXACT, FIND, TRIM, REPLACE, SUBSTITUTE, VALUE, MATCH, INDEX, ISBLANK.- Logic functions: to include, IF, AND, OR, NOT.- Lookup and Reference: to include, TRANSPOSE, VLOOKUP, HLOOKUP.- Summarise and aggregate data: to include, Pivot Tables, Subtotals.- Graphs/Plots: to include, Box and Whisker plots, scatter plots, histograms.

R

- Introduction to R, working directories, libraries (packages), scripting, knitting,- Reading and writing files- Data structures, scalers, vectors and matrices- Selecting complete cases and dealing with NA data- Summary statistics- Scatter plots, histograms, box-plots- Aggregating data and merging data files

Hands on computer lab work using sample data sets and case studies.

Module Content & Assessment
Assessment Breakdown %
Other Assessment(s)100