OMSCS

Pages in this section

  • CS6035 O01 Introduction to Information Security
    Last edited: 2026-01-28

    Introduction to Information Security is a graduate-level introductory course in information security. It teaches the basic concepts and principles of information security, and the fundamental approaches to secure computers and networks.

    Main topics include:

    • Security basics
    • Security management and risk assessment
    • Software security
    • Operating systems security
    • Database security
    • Cryptography algorithms and protocols
    • Network authentication and secure network applications
    • Malware
    • Network threats and defences
    • Web security
    • Mobile security
    • Legal and ethical issues
    • Privacy

    # Links

  • CS6200 Graduate introduction to Operating Systems
    Last edited: 2023-12-03

    This is a graduate-level introductory course in operating systems. This course teaches the basic operating system abstractions, mechanisms, and their implementations. The core of the course focuses on OS support for concurrency (threads) and synchronisation, resource management (CPU, memory, I/O), and distributed services. The practical component of the course teaches multi-thread programming, inter-process communication, and distributed interactions via RPC.

    Break

    I retook this course so there is a slight break in dates between lectures 4 and 5. The content was the same for both times I took it.

  • CS6210 Advanced Operating Systems
    Last edited: 2025-12-03

    Advanced Operating Systems is a graduate-level course that addresses a broad range of topics in operating system design and implementation, including:

    • Operating system structuring
    • Synchronization, communication, and scheduling in parallel systems
    • Distributed systems, their communication mechanisms, distributed objects and middleware
    • Failures and recovery management
    • System support for Internet-scale computing By tracing the key ideas of today’s most popular systems to their origins in research, the class highlights key developments in operating system design over the last two decades and illustrates how insight has evolved to implementation.

    # Links

  • CS6215 Introduction to Graduate Algorithms
    Last edited: 2023-12-03

    This course is a graduate-level course in the design and analysis of algorithms. We study techniques for the design of algorithms (such as dynamic programming) and algorithms for fundamental problems (such as fast Fourier transform FFT). In addition, we study computational intractability, specifically, the theory of NP-completeness. The main topics covered in the course include: dynamic programming; divide and conquer, including FFT; randomized algorithms, including RSA cryptosystem;  graph algorithms; max-flow algorithms; linear programming; and NP-completeness.

  • CS6250 Computer Networks
    Last edited: 2025-12-05

    This course provides a quick refresher on introductory material and offers broad coverage of protocols and algorithms that span all layers of the Internet protocol stack.

    The course begins with an overview of the evolution of the Internet architecture and a short refresh of the transport layer algorithms and protocols such as TCP and congestion control. Next, the course will dive into intradomain/interdomain routing, peering and networks relationships with a follow-up into router design and functionalities. After discussing routing and router technologies, the course will continue with Software Defined Networking technologies and discuss topics that intersect Network Security and Computer Networks, for example, attacks on Internet routing such as BGP hijacking. Finally, the course wraps up with a discussion on multimedia applications and Content Delivery Networks.

  • CS7641 Machine Learning
    Last edited: 2024-01-10

    This is a 3-course Machine Learning Series, taught as a dialogue between Professors Charles Isbell (Georgia Tech) and Michael Littman (Brown University).

    Supervised Learning Supervised Learning is a machine learning task that makes it possible for your phone to recognize your voice, your email to filter spam, and for computers to learn a number of fascinating things. This sort of machine learning task is an important component in all kinds of technologies. From stopping credit card fraud; to finding faces in camera images; to recognizing spoken language - our goal is to give students the skills they need to apply supervised learning to these technologies and interpret their output. This is especially important for solving a range of data science problems.

  • CS7642 Reinforcement Learning
    Last edited: 2024-01-10

    The course explores automated decision-making from a computational perspective through a combination of classic papers and more recent work. It examines efficient algorithms, where they exist, for learning single-agent and multi-agent behavioral policies and approaches to learning near-optimal decisions from experience.

    Topics include Markov decision processes, stochastic and repeated games, partially observable Markov decision processes, reinforcement learning, deep reinforcement learning, and multi-agent deep reinforcement learning. Of particular interest will be issues of generalization, exploration, and representation. We will cover these topics through lecture videos, paper readings, and the book Reinforcement Learning by Sutton and Barto.

  • CS8803 O08 Compilers - Theory and Practice
    Last edited: 2026-01-28

    The objective of this course is to learn the theory and practice behind building automatic translators (compilers) for higher level programming languages and to engineer and build key phases of a compiler in Java or C++ for a small language.

    # Links

  • CSE6220 Introduction to High Performance Computing
    Last edited: 2026-01-28

    This course is a graduate-level introduction to scalable parallel algorithms. “Scale” really refers to two things: efficient as the problem size grows, and efficient as the system size (measured in numbers of cores or compute nodes) grows. To really scale your algorithm in both of these senses, you need to be smart about reducing asymptotic complexity the way you’ve done for sequential algorithms since CS 101; but you also need to think about reducing communication and data movement. This course is about the basic algorithmic techniques you’ll need to do so. The techniques you’ll encounter cover the main algorithm design and analysis ideas for three major classes of machines: for multicore and manycore shared memory machines, via the work-span model; for distributed memory machines like clusters and supercomputers, via network models; and for sequential or parallel machines with deep memory hierarchies (e.g., caches). You will see these techniques applied to fundamental problems, like sorting, search on trees and graphs, and linear algebra, among others. The practical aspect of this course is implementing the algorithms and techniques you’ll learn to run on real parallel and distributed systems, so you can check whether what appears to work well in theory also translates into practice. (Programming models you’ll use include OpenMP, MPI, and possibly others.)

  • Online Masters of Science in Computer Science Notes

    This masters program from Georgia Tech is aimed at making rigorous computer science education accessible to all through an affordable programme on an accessible platform.

    # Resources

  • Week 0 - Assumed knowledge
    Last edited: 2026-02-05

    # Notes from Fundamentals course

    I followed this Fundamentals course to understand the basics.

    # Lecture 1: Hosts and devices

    Some terminology first:

    Host (networks)

    Client

    Server

  • Week 1 - Are you ready?
    Last edited: 2026-02-05

    # C Programming

    Because this course focuses on operating systems, primarily covering Linux, it assumes that students have a reasonable understanding of the C programming language.

    # Concepts to Know

    For this course, a “reasonable understanding” would be defined as having used or been exposed to the following C concepts:

  • Week 1 - Basic Model of Locality
    Last edited: 2026-01-28

    This lecture is based on the paper:

    The Input/Output Complexity of Sorting and Related Problems

    High performance computing

    This is a bad name, all computing should be high performance. This course is really about supercomputers, where you need to extract every last ounce of compute you have to solve the problem faster.

  • Week 1 - Chapter 1 Machine Learning
    Last edited: 2024-01-13

    This is my notes from the first chapter of MachineLearningTomMitchell.pdf .

    1.1 Well posed learning problems

    Machine Learning

    Checkers

    If we want a programme to learn at checkers we simply need to define $T$, $P$, and $E$!

  • Week 1 - Chapter 3, Finite Markov Decision Process
    Last edited: 2026-02-05

    Compulsory reading from Reinforcement learning: An introduction .

    # Finite Markov Decision Process

    Finite Markov Decision Process

    Using the function $p$ (defined in the finite Markov decision process), we can calculate some other probabilities.

    State transition probability: When in state $s_t$ and you take the action $a_t$ the probability of ending up in $s_{t+1}$:

  • Week 1 - Course material
    Last edited: 2026-02-05

    # Reference Books

  • Week 1 - Decision Trees
    Last edited: 2024-01-10

    Supervised learning

    Supervised learning falls into two main problems sets

    Classification problems - Mapping from a domain to some discrete set, and Regression problems - mapping from a domain to some continuous space like the Real numbers ($\mathbb{R}$).

    For example, making a decision on whether you should give someone a loan is a classification problem as there are only two outcomes - whereas mapping from someone’s picture to their age would be a regression problem.

  • Week 1 - Dynamic Programming
    Last edited: 2025-12-05

    Dynamic programming is a method to speed up recursion especially when the recursive algorithm computes the same step multiple times.

  • Week 1 - Introduction
    Last edited: 2026-02-05

    Here are some questions you should be able to answer to start the course:

    • What are the advantages and disadvantages of a layered architecture? Modularity, Separation of concerns, and interchange of methods at different layers. Duplication of effort, more covert coupling, and additional overhead. • What are the differences and similarities between the OSI model and the five-layered Internet model? Connection between OSI and IPS models • What are sockets? Socket • Describe each layer of the OSI model. OSI • Provide examples of popular protocols at each layer of the five-layered Internet model. HTTP TCP ARP • What is encapsulation, and how is it used in a layered model? encapsulation • What is the end-to-end (e2e) principle? End to end principle • What are the examples of a violation of e2e principle?

  • Week 1 - Introduction to Compilers
    Last edited: 2026-01-28

    Compiler

    A program that translates source code written in one programming language into another programming language. Commonly the target programming language is machine code that is executable.

  • Week 1 - Introduction to operating systems
    Last edited: 2026-02-05

    Operating system (OS)

    # Metaphor

    The OS is a lot like a toyshop manager. They are required to:

    • Direct the operational resources
      • Controls the use of employees’ time,
      • Distribute tools and parts between workers
    • Enforces working policies
      • Ensures fairness between workers,
      • Make sure toys are made safely,
      • How to clean up after the job has been completed.
    • Mitigates difficulty of complex tasks
      • Simplifies operations for each worker to do,
      • Chooses optimal task allocation to improve performance.

    In comparison an OS does the following:

  • Week 1 - Knapsack Problem
    Last edited: 2026-02-05

    # Week 1 - Knapsack Problem

    Statement

    There are different versions of this problem:

    • Where you can have at most one version of each object,
    • Where you can have at most a given number of each object, and
    • Where you can have an infinite amount of each object.

    We will focus on the first problem for the next sections.

  • Week 1 - Security Mindset
    Last edited: 2026-01-28

    We need security when two things are met:

    • We have something of value, and

    • We have a threat to that thing.

    # Things of value

    In the context of cyber security usually the thing of value is information. However, there are examples of using cyber attacks to attack things in the physical world such as the Stuxnet attack on Iran’s nuclear facility or digital attacks on electric grids.

  • Week 1 - Smoov & Curly's Bogus Journey
    Last edited: 2025-05-12

    # Review

    The first part of this course is a review of:

    Week 12 - Reinforcement learning

    There is a change of notation within this course compared to the referenced lecture. Instead of using $U(s)$ for the utility, we use $V(s)$ for value. Instead of considering the reward function $R: S \rightarrow \mathbb{R}$, we consider it $R: S \times A \rightarrow \mathbb{R}$, i.e. the reward takes into account the action you have taken. Therefore, the Bellman equation in this notation is restated below.

  • Week 1 - Supplementary Introduction to Reinforcement Learning
    Last edited: 2025-05-13

    Going down in levels of specificity:

    • Artificial intelligence: The ability of machines to simulate human behaviour.
    • Machine learning: The ability of machines to simulate human behaviour by learning from data.
    • Reinforcement Learning: The ability of machines to simulate human behaviour by learning from data that is simultaneously sequential, evaluative, and sampled.

    Learning Classifications

    Later in the course we will be adding deep learning into our use of reinforcement learning . This is mainly using large neural networks to assist with the learning process.

  • Week 1 - Writing and reading papers
    Last edited: 2024-01-21

    # Ten simple rules for reading a scientific paper by Maureen A. CareyID, Kevin L. Steiner, William A. Petri Jr

    Reading articles is a large part of research and you should try to keep up to date with your area of interest. However, may people struggle with reading articles and this guide is some tips and tricks for starting.

  • Week 10 - Failures and Recovery
    Last edited: 2025-11-30

    # Week 10 - Failures and Recovery

    Within an operating system, crashes will occur. When they do they need to leave us in state we can recover from. To do this we need persistence and recovery.

    # Lightweight Reliable Virtual Memory (LRVM)

    Whilst the operating system and its subsystems are changing objects in memory if a crash occurs before it has a chance to save it to disk it can leave the OS in a broken state. To combat this LRVM creates an interface to make some objects in virtual memory persistent on the disk. The key to this working is to make this fast and have a clean, easy to use interface for subsystem designers.

  • Week 10 - Feature selection
    Last edited: 2026-02-05

    Feature selection is the process of deciding what features of the data are worth training on. Then picking the subset of them that are the most important. The reason to do this is:

    • Knowledge discovery, interpretability, and insight.
      • When you get the results back you are going to have to be able to explain them.
    • The curse of dimensionality
      • If you put too many dimensions in - it will decrease the accuracy of your models!

    # How hard is this problem?

    Assume we want to know the best $m \leq n$ subset of columns. The only way to verify it is the best is to check them all so it is $\binom{n}{m}$. If you don’t know $m$ then it is $2^n$.

  • Week 10 - Feature transformation
    Last edited: 2026-02-05

    Feature transformation is the problem of pre-processing a set of features to create a new (smaller/compact) feature set, while retaining as much information as possible. It is a map $p: \mathbb{F}^N \rightarrow \mathbb{F}^M$ where you usually want $M < N$.

    In this course we will focus on linear  feature reduction where $p$ is a linear map .

    Note

    Feature selection is a special case of feature transformation

  • Week 10 - Graph problem complexity
    Last edited: 2023-11-19

    # Independent set problem

    independent set

    Therefore we have the following problem.

    Statement

    Though this problem is not known to be in NP as to verify if a set is the largest set we would require to calculate the set of largest size. However, we can edit this question to make one that is in NP .

  • Week 10 - NP overview
    Last edited: 2023-11-11

    # Complexity classes

    Nondeterministic Polynomial time (NP)

    In this class we use the search problem definition.

    Search problems

    Note here we have.

    Statement

    There is an open problem.

    P equals NP or P not equals NP

  • Week 10 - NP-completeness
    Last edited: 2023-11-11

    We are going to assume the following.

    Statement

    # k-SAT

    We are going to show that the k-SAT problem is NP-Complete for $k \geq 3$.

    k-SAT

    To do this we will show 3-SAT is NP-complete which as 3-SAT is a subproblem to k-SAT for $k \geq 3$ this will prove all we need to.

  • Week 10 - Synchronization Constructs
    Last edited: 2025-03-26

    # Additional reading

    # Synchronization

    Synchronization

    Previously we discussed mutexes and condition variables . However these had a couple of downsides:

  • Week 10 - Video applications
    Last edited: 2026-02-05

    # Additional reading

    # Important Readings

    VoIP: A comprehensive survey on a promising technology https://www.sciencedirect.com/science/article/abs/pii/S1389128609001200Links to an external site.

    MPEG: A Video Compression Standard for Multimedia Application https://dl.acm.org/doi/pdf/10.1145/103085.103090Links to an external site.

    The JPEG Compression standard https://ieeexplore.ieee.org/document/125072Links to an external site.

  • Week 11 - CDNs and overlay networks
    Last edited: 2026-02-05

    # Reading

    # Important Readings

    The Akamai Network: A Platform for High-Performance Internet Applications https://dl.acm.org/doi/10.1145/1842733.1842736Links to an external site.

    Open Connect Everywhere: A Glimpse at the Internet Ecosystem through the Lens of the Netflix CDN https://arxiv.org/pdf/1606.05519.pdf Links to an external site.

  • Week 11 - IO management
    Last edited: 2025-04-09

    # Model of a IO device

    Whilst an IO device can come in many forms we model them similarly.

    Io Device

    These have 3 registers:

    • Status: Some indicate the state of the device.
    • Command: Used by the CPU to control the device.
    • Data: Used to transfer data in and out of the device.

    Then what happens internally is controlled by the micro controller on the device.

  • Week 11 - Linear Programming
    Last edited: 2023-11-13

    In broad terms a linear programme is an optimisation problem of a linear function where there are linear bounds on the variables.

    We are given a directed graph $G = (V,E)$ with capacities $c: E \rightarrow \mathbb{R}_{>0}$ and source $s \in V$ and sink $t \in V$.

  • Week 11 - Markov Decision Processes
    Last edited: 2024-04-03

    Markov decision process

    Find home

    Suppose we have a world with 12 states defined by a 2d grid $S = \{(x,y) \vert 1 \leq x \leq 4, 1 \leq y \leq 3\}$. Where state $(2,2)$ is can’t be entered, $(4,3)$ is home, $(4,2)$ is a pit of death, and $(1,1)$ is where we start.

  • Week 11 - Security
    Last edited: 2026-01-28

    # Principles of information security

    Information security is a large part of the developer industry. This field has whole masters courses dedicated to its study. However, lots of the modern terminology was invented by visionaries in the early years of computer history.

    The notes in this section are based on the following paper:

    Protection and the Control of Information in Computer Systems

  • Week 12 - Future of the Internet
    Last edited: 2026-02-05

    # Reading

    # Optional Readings

    Machine Learning and Networking Applications

    Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network Packet Transmission Based on Deep Learning https://ieeexplore.ieee.org/document/7935536

    CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics https://www.usenix.org/system/files/conference/nsdi17/nsdi17-alipourfard.pdf

  • Week 12 - Halting problem
    Last edited: 2023-11-13

    Statement

    We want to show that that this problem is computationally impossible or in other words it is undecidable .

    undecidable

    The Halting problem is undecidable

  • Week 12 - Knapsack complexity
    Last edited: 2023-11-13

    We are going to prove that the Knapsack-search is NP-complete . We will do this using the 3-SAT but first going through the Subset-sum problem .

    # Subset-sum problem

    Statement

    This can be solves using Dynamic Programming in a similar way to the Knapsack problem in $O(nt)$. However, similar to the Knapsack problem this is not a polynomial time algorithm in the length of the input as that is $n\log_2(t)$. Therefore it is Pseudo-polynomial .

  • Week 12 - Max-SAT approximation algorithm
    Last edited: 2023-11-19

    In this lecture we consider the a similar problem to the SAT problem .

    Max-SAT

    This is NP-hard .

    Max-SAT is NP-hard

    Therefore it is unlikely to find any algorithm that will solve the problem quickly. However we will look at using linear programming to approximate a solution.

    # Approximate algorithms

    Let $f$ be a boolean function in CNF with $m$ clauses. Let $m^{\ast}$ denote the solution to the max-SAT problem for $f$.

  • Week 12 - Reinforcement learning
    Last edited: 2024-04-06

    Transitions (MDP)

    Reinforcement learning

    There are 4 kinds of process we think about:

    We can compose these strategies in two major ways.

  • Week 12 - Virtualization
    Last edited: 2025-04-12

    # Additional reading

    # Virtualization

    Virtualization

    Virtual machine monitor (VMM)

    Benefits for virtualization:

    • Consolidation: Allows you to split one machine into many, allowing you to consolidate different applications that need different environments to run.
    • Migration: Makes it easy to move applications from one machine to another.
    • Security: Isolates applications from one another.
    • Debugging: Makes it easy to spin up similar instances of crashed programs to debug.
    • Support for legacy OS: Easy to host an application that needs a legacy operating system to run.
    • Support for OS research: Makes it easy to change an operating system and observe the effects of that.

    There are two main methods of virtualization:

  • Week 13 - Game theory
    Last edited: 2026-02-05

    Game theory

    # 2-player zero-sum finite game of perfect information

    This is a lot of words but these games are like tick tack toe or noughts and crosses. Whilst these are fun they all boil down to a payoff matrix for the different strategies you take. Like the following

    $$ \begin{array}{c|ccc} \ 1 & a & b & c\\ \hline x & 1 & 0 & -1 \\ y & -1 & 0 & 1 \\ \end{array} $$

    to read this table, if the first player at the top picks action $a$ and the second player on the rows picks $x$ then player one gets 1 point whereas player two gets -1 point. This is the zero-sum aspect to the game. (Note the game could be non-deterministic but we just take expectation to find the values in the grid.)

  • Week 13 - Known NP-complete problems
    Last edited: 2023-11-19

    # Week 13 - NP-complete problems

    The following problems are known to be NP-complete :

  • Week 13 - Remote Procedure Calls
    Last edited: 2025-04-12

    # Additional reading

    # Remote procedure calls (RPC)

    Remote procedure calls (RPC)

    To achieve this RPC has some requirements:

    • Offers a client and server interface.
    • Implements a procedure call Interface.
      • When RPC was invented procedural languages were big, this is synchronous and when it is called remote procedure call.
    • Type checking.
      • This offers error handling, and
      • packet bytes interpretation.
    • Cross-machine conversations.
    • Higher-level protocol.
      • Access control, fault tolerance, and
      • Can support different transport protocols.

    Below is an example flow of an RPC call:

  • Week 14 - Distributed file systems
    Last edited: 2026-02-05

    # Additional reading

    “Caching in the Sprite Network File System”

    # Distributed file system

    Distributed file system (DFS)

    There are different models for distributing files:

    • Client/server model: A client with the file system talks to a single server who has all the files.
    • Client/server-cluster: A client contacts a cluster of servers who all have files the client can get files from. This can take different forms
      • Replicated: Each server has all the files.
      • Partitioned: each server serves part of the file system.
      • Both: Files are replicated and partitioned between machines (this is functionally how most large providers operate).
    • Peer-to-peer: A cluster of computers act as peer backups for one another. (This will be covered in more detail next lecture.)

    # Implementation

    There are different models for a Distributed file system (DFS) in the client server model.

  • Week 15 - Distributed shared memory
    Last edited: 2025-04-13

    # Additional reading

    # Distributed shared memory

    Distributed shared memory is similar to a distributed file system however all clients are also clients to others. They share the state mutually.

  • Week 15 - Markov Chains
    Last edited: 2026-02-05

    To talk about Markov chain it is good to recap what a probability matrix is:

    Stochastic matrix

    Now we can define a Markov chain .

    Markov chain

    Suppose we have 4 states with the following probabilities to go between them. This is commonly represented as a edge weighted directed graph . simple_markov_chain In terms of a probability matrix it would be represented as

  • Week 16 - Datacenter technologies
    Last edited: 2026-02-05

    # Week 16 - Data-center technologies

    # Internet services

    An internet service is simply one offered over a web interface. These normally consist of 3 layers:

    • Presentation: Static content.
    • Business logic: Dynamic content.
    • Database tier: Data storage.

    Whilst these layers are separated here they may be implemented on the same machine. Middleware can be offered to integrate the different layers together.

  • Week 2 - Cache-Oblivious Algorithms
    Last edited: 2026-01-23

    Optional reading

    This lecture is optional reading in my course and will not be examinable.

  • Week 2 - Chain Matrix Multiply
    Last edited: 2023-12-03

    The goal is this problem is, given a number of matrices what is the most efficient way to multiply them together?

    Lets say we have matrices $A$, $B$, $C$ and $D$ and they have different sizes such that we can multiply $A B C D$. Whilst matrix multiplication might be associative there might be a more efficient way to multiply them together!

  • Week 2 - I/O Avoiding Algorithms
    Last edited: 2026-01-22

    In this lecture, we will examine different classical algorithms and try to calculate their number of transfers in the Von Neumann model introduced in the previous lecture.

    # Merge sort

    Merge sort divides the array into small chunks. Then merges each of those chunks into one another to produce a sorted array. This runs in $O(n \log n)$ time. So there are two steps to this:

  • Week 2 - Neural networks
    Last edited: 2026-02-05

    The basis of all neural networks is a simplified model of a neuron in the human body.

    Perceptron (neural network)

    Where we define an activation function as.

    Activation function

    First we will just take the function to be Binary step which introduces an extra parameter $\theta$.

    Binary step

  • Week 2 - OS Structure Overview
    Last edited: 2025-08-29

    The OS structure is the way OS is organised with respect to the applications it serves and the underlying hardware it manages.

    Some desirable traits for this to have are:

    • Protection: Defending the user from the OS’s activities but also the OS from anything run by the user. Similarly each user from one another.
    • Performance: The time taken to run the services the OS offers.
    • Flexibility: The ability to adapt the services the OS offers for different use cases.
    • Scalability: The performance of the OS increases as the resources provided increases.
    • Agility: Adapting to changes in application needs or resource availability.
    • Responsiveness: The ability to respond to events in a timely manner.

    # Different OS Structures

    # Monolithic OS

    In a monolithic OS, all OS services run in a separate address space than user applications. This means all access to the underlying hardware requires a context switch to kernel space.

  • Week 2 - Processes and Process Management
    Last edited: 2026-02-05

    Process

    # Aside

    Heap (OS)

    Stack (OS)

    address space

    # Metaphor

    A process is like an order of toys:

    • State of execution:
      • Completed,
      • waiting, or
      • In progress.
    • Parts and temporary holding area
      • Pieces used to make the toy, or
      • Containers to put the pieces.
    • May require special hardware:
      • Sewing machine or
      • glue gun.

    This is an analogy to an OS where a process has:

  • Week 2 - Regression and classification
    Last edited: 2024-01-17

    Regression history

    Regression problems

    The name comes from regression to the mean , this was the technique they used to first talk about it - then the word regression for the technique stuck.

    Polynomial Regression

    Polynomial regression

    Once we have an objective function we can use training data to fit the parameters $c_{p}$.

  • Week 2 - Regular expressions and DFA
    Last edited: 2026-01-28

    In the lexical analysis phase of the compiler we need to convert a sequence of characters into tokens. First, let’s define formally what a token is.

    Token

    A token is a tuple consisting of:

  • Week 2 - Reinforcement Learning Basics
    Last edited: 2025-05-14

    When evaluating different learners we normally evaluate them by the policy they produce. However, different methods of learning can create policies in different ways - therefore we may need to also consider:

    • Computation complexity: The time it takes for that learner to come up with that policy.
    • Sample complexity: The amount of interactions with its environment it needs to come up with that policy.

    We don’t normally think about space complexity unlike other subjects - since that is not normally a limiting factor.

  • Week 2 - Shortest Paths
    Last edited: 2025-12-05

    Shortest path problem

    Given a directed graph $(V, E)$ with edge weights $w: E \rightarrow \mathbb{R}$ and a start vertex $s \in V$ - the shortest path problem is to find the shortest distance between $s \in V$ and $z \in V$ for every $z$. This is called $\mbox{dist}(z)$ with the formal definition

  • Week 2 - Temporal Difference Learning
    Last edited: 2025-05-20

    Within reinforcement learning there are broadly 3 types of algorithms:

    1. Model-based: These algorithms first aim to understand the environment by building or learning an explicit model of it. This model describes how the environment behaves (e.g., transition probabilities and rewards).
      1. Model Learning/Update: Run through an example (an interaction with the environment) to collect data and update the parameters of a MDP model. This typically involves estimating state transition probabilities and reward functions.
      2. Planning/Solving: Use the learned MDP model (e.g., using algorithms like Value iteration (MDP) or Policy Iteration (MDP) ) to generate an optimal (or near-optimal) value function .
      3. Policy Derivation: Use the calculated value function and the learned model to derive an optimal policy . This policy dictates the best action to take in each state, often by choosing the action that maximizes the expected future reward (e.g., using argmax over the Q-values derived from the value function and model).
    2. Value-function based (model-free): These algorithms bypass the need to build an explicit model of the environment. Instead, they directly learn a value function (or action-value function) that estimates the goodness of states or state-action pairs.
      1. Value Function Update: Run through an example (an interaction with the environment) and use the observed outcomes (rewards and next states) to directly update the value function . This might be a state-value function V(s) or, more commonly, an action-value function Q(s,a), which estimates the expected return of being in state s and taking action a.
      2. Policy Extraction: Implicitly or explicitly derive a policy for each state by evaluating the actions that lead to the highest estimated value. For Q-functions, this often means choosing the action a that maximizes Q(s,a) for a given state s.
    3. Policy Search (or Policy Optimization): This approach directly searches for an optimal policy without necessarily learning a value function or a model. The goal is to find a policy that directly maps states to actions and maximizes the cumulative reward.
      1. Policy Evaluation and Update: Run through an example using the current policy . Based on the observed outcomes (rewards and trajectories), directly update the parameters of the policy to improve its performance. This often involves gradient-based methods where the policy parameters are adjusted in the direction that increases the expected return.

    The different approaches have varying trade-offs. Generally, model-based approaches are often more computationally intensive for the planning step but can be more sample efficient (requiring fewer interactions with the environment) because they can “plan” internally without needing new real-world data for every decision. Conversely, model-free approaches (including value-function based and policy search) are typically less computationally intensive per update (as they don’t involve a full planning step) but are often more sample intensive, requiring many interactions with the environment to learn effectively. Policy search methods, in particular, can be good for continuous action spaces and may sometimes converge faster than value-based methods in certain complex scenarios, but they can also be more prone to local optima.

  • Week 2 - Transport and application layer
    Last edited: 2024-05-24

    # Important Readings

    CUBIC: A New TCP-Friendly High-Speed TCP Variant https://www.cs.princeton.edu/courses/archive/fall16/cos561/papers/Cubic08.pdfLinks to an external site.

    # Book References

    Kurose-Ross (Edition 6): Sections 3.1.1, 3.2, 3.3, 3.4, 3.5.5, 3.6

    Peterson Section 6.3

    # Optional Readings

    Congestion Avoidance and Control https://ee.lbl.gov/papers/congavoid.pdfLinks to an external site.

  • Week 3 - Ensemble Bagging and Boosting
    Last edited: 2024-01-24

    Ensemble learning

    The idea of Ensemble learning is to look at a subset of data and try to model that well. If we do that to enough small patches we hope we can combine this to give us a good overall understanding of the data.

    This is a powerful idea when the function we are trying to map doesn’t globally generalise but instead locally generalises like when people try to filter out spam from your inbox.

  • Week 3 - Fast Integer Multiplication
    Last edited: 2025-12-05

    Multiplying $n$-bit integers problem

    Given $n$-bit integers $x$ and $y$ what is $z = xy$?

  • Week 3 - Instance Based Learning
    Last edited: 2025-12-05

    Instance-based learning

    # K-nearest neighbour

    k-nearest neighbour

    # Run times

    Instance-based learning is often called “lazy” learning because you delay computation until query time. As you can see in the table below, we assume $A = B = \mathbb{R}$ and that the input training data is sorted.

  • Week 3 - Intradomain routing
    Last edited: 2024-05-29

    # Important Readings

    Experience in Black-box OSPF Measurements http://conferences.sigcomm.org/imc/2001/imw2001-papers/82.pdfLinks to an external site.

    # Book References

    If you have access to the Kurose-Ross book and the Peterson book, you can find the list of chapters discussed in this lecture. As mentioned in the course schedule, purchasing the books is not required.

  • Week 3 - Linear-Time Median
    Last edited: 2025-12-05

    Median list problem

    Given an unsorted list $A = [a_1, \ldots, a_n]$ of $n$ numbers - you want to find the median ($\lceil \frac{n}{2} \rceil$) element.

  • Week 3 - Regular expressions and NFA
    Last edited: 2026-01-19

    This lecture focuses on how to construct a DFA from a given RegEx algorithmically. To do this we will:

    1. Introduce an easier object to work with called a Non-deterministic Finite Automaton (NFA).

    2. Show that operations in RegEx can be converted into ways to combine NFAs together.

    3. We will prove that an NFA can be converted into a DFA.

    4. We will try to minimise this representation.

    This will give us the algorithm to go from RegEx to a minimal DFA.

  • Week 3 - Solving Recurrences
    Last edited: 2023-11-11

    # $T(n) = 4T\left (\frac{n}{2}\right ) + O(n)$

    First step is to replace the Big-O notation . Let $c > 0$ such that

    $$T(n) \leq 4 T\left ( \frac{n}{2} \right ) + cn, \mbox{ and } T(1) \leq c.$$

    Lets keep reapplying this inequality

    $$\begin{aligned} T(n) \leq & cn + 4T\left ( \frac{n}{2} \right )\\ \leq & cn + 4\left [ 4T\left ( \frac{n}{2^2} \right ) + \frac{cn}{2} \right ]\\ \leq & cn (1 + \frac{4}{2}) + 4^2T\left ( \frac{n}{2^2} \right )\\ \leq & cn (1 + \frac{4}{2}) + 4^2\left[ 4T\left ( \frac{n}{2^3} \right ) + \frac{cn}{2^2} \right ]\\ \leq & cn \left (1 + \frac{4}{2} + \left ( \frac{4}{2} \right )^2 \right) + 4^3T\left ( \frac{n}{2^3} \right )\\ \leq & cn \left ( \sum_{i=0}^{k-1} \left ( \frac{4}{2} \right )^i \right ) + 4^kT\left ( \frac{n}{2^k} \right ) \end{aligned}$$

    Now lets run this till $k = \log_2(n)$ then $\frac{n}{2^k} = 1$.

  • Week 3 - Supplementary planning methods
    Last edited: 2025-05-22

    # Return

    In Reinforcement learning some people interpret the goal to be maximizing the rewards. However, with rewards sometimes distant from the current action - functionally most algorithms try to maximize the return.

    Return (RL)

    # Value function

    value function

  • Week 3 - Threading and concurrency
    Last edited: 2026-02-05

    “An Introduction to Programming with Threads”  by Birrell

    Thread

    # Visual metaphor

    Threads are like workers in a toy shop:

    • A thread is an active entity
      • Executing unit of toy order
    • Works simultaneously with others
      • many workers completing toy orders
    • requires coordination
      • sharing tools, parts, workstations

    In comparison to a thread:

  • Week 3 - Virtualization
    Last edited: 2025-09-04

    As a refresher please read:

    Week 12 - Virtualization

    # Memory Virtualisation

    # Address Translation

    When hosting guest operating systems on top of a hypervisor we need to make memory appear as it would to a normal OS. For the CPU cache which is physically tagged, there is no special operations we need to perform. However, for virtual memory we need to be able to map it to ‘machine memory’ correctly.

  • Week 3 - Work-Span model
    Last edited: 2026-01-25

    In this lecture we explore a method of parallelisation called the work-span model. This uses DAGs to represent a computation. Where each node is a computation that needs to be run and arrows between nodes represent dependencies. We will assume this is connected, and has a start/stop vertex.

    Example DAG

    We will run these on a PRAM machine. This is a machine with multiple processors attached to some memory. Here we get ‘parallelism’ by running multiple operations on different processors. These can only be run once their dependencies have all been run already. How to assign the vertices to the processors is called a ‘scheduling problem’.

  • Week 4 - AS relationships and interdomain routing
    Last edited: 2026-02-05

    Interdomain routing

    # Additional reading

    # Important Readings

    Interdomain Internet Routing https://web.mit.edu/6.829/www/currentsemester/papers/AS-bgp-notes.pdfLinks to an external site.

    BGP routing policies in ISP networks https://www.cs.princeton.edu/~jrex/papers/policies.pdfLinks to an external site.

    On the importance of Internet eXchange Points for today’s Internet ecosystem https://cryptome.wikileaks.org/2013/07/ixp-importance.pdfLinks to an external site.

  • Week 4 - Comparison-based Sorting
    Last edited: 2026-01-25

    Comparison-based sorting is done using a ‘comparator network’ which looks like a circuit with two types of symmetric connections.

    Comparison gate

    Positive gates look as below.

  • Week 4 - Fast Fourier Transforms
    Last edited: 2025-12-05

    Polynomial multiplication problem

    Given polynomials

  • Week 4 - Parallel Systems
    Last edited: 2025-12-05

    # Week 4 - Parallel Systems

    In modern computing, nearly all processors are multi-core. This presents an interesting problem - how to share memory between them?

    # Shared Memory Architectures

    Here we detail 3 different approaches to shared memory architectures:

    # 1. Dance Hall Architecture

    In this architecture, CPUs and distributed memory sit on either side of an interconnection network. All memory is accessible to all CPUs.

  • Week 4 - Parsing context free grammars
    Last edited: 2026-01-28

    In this lecture, we will cover the parser - in particular the rules for parsing token input. This has two major responsibilities:

    • Syntax analysis: Checking the program is using the correct grammar for the programming language.

    • Breaking the input into expressions: Parsing the input into a sequence of expressions.

    # Derivation tree

    Derivation tree

    A derivation tree is a tree representation of the process of parsing a program. The root is the starting symbol and the internal nodes represent sub-symbols.

  • Week 4 - PThreads
    Last edited: 2026-02-05

    # Additional reading

    # Background

    POSIX

    PThreads

    # Thread interface

    The core data structure in the PThreads library is pthread_t which represents a thread. This can be created through:

  • Week 4 - Support Vector Machines
    Last edited: 2024-01-29

    # Linearly separable

    We are going to revisit the idea of linearly separable points in the plane.

    Linearly separable

    Suppose we have training data $(x^t, y^t) \in \mathbb{R}^n \times \{1,-1\}$ for $t \in T$ such that $X_1 = \{x^t \vert y^t = 1\}$ and $X_2 = \{x^t \vert y^t = -1\}$ are linearly separable. This means that there exists some $w \in \mathbb{R}^n$ and $b \in \mathbb{R}$ such that

  • Week 5 - Beyond Multiprocessing ... Multithreading and the SunOS Kernel
    Last edited: 2026-02-05

    Ref: Beyond Multiprocessing … Multithreading the SunOS Kernel

    # Notes

    # Architecture

    In this paper they introduce the concept of a Light Weight Process (LWP) that sits as an intermediary between kernel level threads and user level threads. Each LWP is mapped to a kernel thread but not all kernel threads need to be mapped to a LWP.

  • Week 5 - Computational Learning Theory
    Last edited: 2025-12-05

    # Week 5 - Learning Theory

    Computational learning theory is fundamentally about 3 things:

    • defining learning problems,
    • showing algorithms work, and
    • showing problems are easy or hard.

    When considering problems it is useful to think about:

    1. The probability of successful training.
      1. The probability the process works $1 - \delta$
    2. Number of examples you need to train on.
      1. The size of the training set $m$.
    3. Complexity of the hypothesis class.
      1. How complex is the representation - modelling paradigm
    4. Accuracy to which the target concept is approximated.
      1. Within some $\epsilon$.
    5. Manner in which the examples are presented.
      1. Batch or online.
    6. Manner in which the training data is collected.
      1. Learner requests examples.
      2. Teacher gives examples.
      3. Fix distribution.
      4. Examples are picked poorly.

    # 20 Questions

    20 Questions is a game where there is a set of possible answers $A$ and a correct answer $c \in A$. There is a player (questioner) who wants to find $c \in A$ and a questionee who knows $c \in A$. The player then asks questions boolean questions to questionee who answers True or false about item $c \in A$.

  • Week 5 - Distributed Systems
    Last edited: 2025-10-01

    The definition of what a distributed system is varies but in this course we use the following 3 properties:

    1. Nodes on the network are connected via some LAN or WAN.
    2. Nodes do not share memory.
    3. The computational event time is significantly smaller than the communication time.

    Definition 3 is used a fair bit and within this you can consider a cluster of machines even within the same rack in a data center - distributed computing.

  • Week 5 - Implementing Lightweight Threads (paper)
    Last edited: 2026-02-05

    Ref: Implementing Lightweight threads

    # Notes

    This paper describes an implementation of light weight threads in the SunOS architecture. This is laid out in Week 5 - Beyond Multiprocessing … Multithreading and the SunOS Kernel .

    • If one thread calls exit on the process then all threads will exit.
    • Threads have their own signal mask.
    • Signals to the process are sent to one of the threads which have that signal unmasked.
      • If all threads have it masked it is left pending on the process until a thread unmasks it.
    • Signals can be sent to a particular thread.
    • All threads within a process share signal handlers.

  • Week 5 - Infinite hypothesis spaces
    Last edited: 2024-02-16

    In Week 5 - Computational Learning Theory we developed a method to build a PAC learner on a finite dimensional hypothesis space. However this breaks down in infinite dimensions. We need to get around this.

    # Lets just cheat

    Whilst the hypothesis space might be infinite theoretically, the actual number of distinct hypothesises could be finite. Then we can fall back on the previous work.

  • Week 5 - Recursive Descent Parsing
    Last edited: 2026-02-05

    In this lecture we are focusing on parsers. We can classify parsers into two types:

    • LR Bottom up parsers:
      • L: Scan left to right.
      • R: Traces rightmost derivation of the input string - starts from the end of the sentence.
      • This is bottom up as we start from the sentence and work our way back to the start symbol.
    • LL Top down parsers:
      • L: Scan left to right.
      • L: Traces leftmost derivation of the input string - starts from the beginning of the sentence.
      • This is top down as we start from the start symbol and work our way forward to the sentence.

    Within both types of parsers there is a dimension which reflects their efficiency. This is the maximum number of tokens it needs to ’look ahead’ to decide what rule to apply. This is referred to by $k$ and appended to the end of the type like $LL(k)$ for a top down parser that looks ahead $k$ tokens. Functionally we want to keep $k$ as low as possible to make compiling code as fast as possible.

  • Week 5 - Router design and algorithms
    Last edited: 2026-02-05

    Part 1

    # Additional reading

    # Important Readings

    Survey and Taxonomy of IP Address Lookup Algorithms https://pdfs.semanticscholar.org/71d9/018e900f99ff60653b3769160131e775873f.pdfLinks to an external site.

    # Book References

    Kurose-Ross, 6th Edition, Section 4.3

  • Week 5 - Scans and list ranking
    Last edited: 2026-02-06

    In this lecture we will look at how to parallelise algorithms that use linked lists. They are notoriously hard to parallelise as we just have a single linked list with the head pointer.

    # Scan

    Given a operation like sum, product, max a scan takes a sequence and returns a new one where that operation is performed on the entries up to that value.

  • Week 5 - Thread design considerations
    Last edited: 2026-02-05

    # Additional reading

    # Thread level

    We will be revisiting the thread level from Thread level .

    Thread Level Summary

  • Week 5 - Top down parsing
    Last edited: 2026-02-05

    In the previous lecture we described a LL(1) parser that required humans to still write the functions for each expression. In this lecture we will produce an LL(1) parser that can be automatically generated from the grammar itself.

    # General overview

    This algorithm will have two stacks: