Courses Taught
DAP5111
Introduction to Data Science for Decision-Making
The amount of data that companies record is increasing at an exponential rate. Capitalising on the information in this data can improve the performance of a company. However, many a time, the insights require great technical expertise or mathematical knowledge. This course aims to empower non-expert individuals with the ability to explore data. By doing so, they will be able to identify opportunities that can be investigated and possibly developed further. Working with data is the new norm, and this course aims to equip individuals with an understanding of this new paradigm.
Data becomes meaningful and valuable only when insights are derived from the data. This course also aims to empower non-expert individuals with the ability to make decisions based on data, which are informed decisions instead of decisions based on gut-feel. Data-driven decision-making equips decision makers to be proactive in their decision-making, resulting in more productive and efficient solutions, which give rise to higher return on investments and better customer experiences.
DAP5222
Applied Regression Models using R
This course will introduce Regression Models for continuous and categorical data with practical applications. In particular, learners will be introduced to Multiple Linear Regression, Ridge/LASSO/Elastic Net Regression and Logistic Regression. We will be using the R Programming Language for this course.
The goal of multiple regression is to model the relationship between two or more predictor variables and your target variable of interest. As the name suggests, multiple regression allows the input of more than one predictor variable into the model. Using more than one predictor variable in your model generally leads to a more accurate and precise understanding of the association of each predictor variable with the target variable of interest.
Regression Models frequently suffer from the overfitting and multi-collinearity problems. In this course, you will learn how to overcome these problems using the regularization techniques applied by Ridge/LASSO/Elastic Net Regression Models.
In the typical Multiple Linear Regression Model, your target variable of interest is a continuous variable. When your target variable of interest is a categorical binary variable, a Logistic Regression Model is used. Logistic Regression Models may be used, for example, to classify whether an employee will leave or not or to classify whether a student will pass or fail their final examination. We will teach you how to build a Logistic Regression Model, evaluate the performance of the model and interpret the model coefficients.
For each of these regression models, you will learn how to build the model, evaluate whether the model is good and accurate, and make data driven recommendations.
DAP5231
Unsupervised Learning
Unsupervised learning refers to a class of techniques used to discover interesting patterns or structures in data. The data used in unsupervised learning are not labelled. This means that only the predictor or input variables are given with no corresponding response or output variables. This module introduces several unsupervised learning techniques that are commonly used for clustering, dimensionality reduction and outlier detection. They can be used for exploratory data analysis and data visualisation, or as an intermediate step to facilitate other analyses. We will use the R programming language for data analysis.
Clustering is concerned with grouping the observed cases in such a way that cases in the same group (or cluster) are more “similar” to each other than to those in other groups (or clusters). We will see that clustering, when used appropriately, can produce interpretable results that can be used to formulate actions, which take advantage of the underlying structure of the data. Dimensionality reduction techniques transform data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation of the input variables retains as many meaningful properties as possible of the original data. We will see that they are useful for circumventing the curse of dimensionality and making data analysis computationally tractable. Outlier detection refers to the identification of “unusual” cases that are “inconsistent” with the majority of the data. We will see how dimensionality reduction techniques and clustering can be used for outlier detection.
DAP5234
Optimisation for Decision-Making
Optimisation involves determining the best decision variables from a set of available alternatives, while considering constraints, in order to optimise a specific criterion known as the objective function. There are various categories of optimisation problems that depend on the types of decision variables, constraints, and objective functions. Numerous techniques have been developed for each specific type of optimisation problem. This course aims to introduce two categories of widely used optimisation techniques: linear programming and integer programming, which will be implemented using the R programming language.
Linear programming is the technique for the optimisation of a linear objective function, subject to linear equality or inequality constraints. In this course, we will delve into the simplex method, a widely employed approach for determining the optimal solution. Additionally, we will explore sensitivity analysis to understand how the solution may vary in response to changes in the objective function and constraints.
Integer programming techniques are for optimisation problems in which some or all of the decision variables are restricted to be integers. We will introduce the branch and bound algorithm to address small-scale integer programming problems. We will also explore some heuristic approaches for tackling large-scale integer programming problems, providing practical solutions.
DSA3362
Predictive Data Analytics
Data analytics refers to the process of examining data sets to extract the information they contain. It may be of interest to predict the future values of one set of “output” variables in the data set using the future values of another set of “input” variables. This module introduces students to commonly used techniques in predictive analytics. Topics include: the paradigm of statistical learning, model selection and validation, a set of basic models for learning. Students will gain practical experience applying these techniques to real-world data.
GEA1000
Quantitative Reasoning with Data
This course aims to equip undergraduate students with essential data literacy skills to analyse data and make decisions under uncertainty. It covers the basic principles and practice for collecting data and extracting useful insights, illustrated in a variety of application domains. For example, when two issues are correlated (e.g., smoking and cancer), how can we tell whether the relationship is causal (e.g., smoking causes cancer)? How can we deal with categorical data? Numerical data? What about uncertainty and complex relationships? These and many other questions will be addressed using data software and computational tools, with real-world data sets.