Machine Learning as a field has grown considerably over the past few decades. In this course, we will explore both classical and modern approaches, with an emphasis on theoretical understanding. There will be a significant math component (statistics and probability in particular), as well as a substantial implementation component (as opposed to using high-level libraries). However, during the last part of the course we will use a few modern libraries such as Pytorch. By the end of this course, you should be able to form a hypothesis about a dataset of interest, use a variety of methods and approaches to test your hypothesis, and be able to interpret the results to form a meaningful conclusion. We will focus on real-world, publicly available datasets, not generating new data.
This course is an introduction to the fundamental data structures of computer science: linked lists, stacks, queues, trees (BSTs, heaps, AVLs and other self-balancing BSTs), hash tables, sets (Union-Find), graphs and their accompanying algorithms. Principles of algorithmic analysis and object reasoning and design will be introduced using mathematical techniques for the notions of both complexity and correctness. More practical issues, such as memory management, searching, sorting and hashing, will also be covered.
This course covers foundational concepts in computational text analysis. The course is designed for Computer Science students interested in using text analysis methods to discover and measure concepts and phenomena in large amounts of text. Topics include core computational text analysis concepts, text-based machine learning, deep learning, basic statistical methods, and data collection. The course will culminate around research projects where groups of students will formulate and iteratively refine an empirical question; collect relevant textual data; implement appropriate methods of analysis; and interpret and present their results.
This class covers the basics of computer programming using Java. Students will learn fundamental programming concepts, such as variables, conditional statements, loops, functions, and classes. Students will develop their ability to write programs to solve a variety of problems, read existing programs, and find and fix errors in existing code.
Computational Text Analysis is the method of using computational tools to analyse and discover insights from large amounts of text. This research based course will introduce students to the methods and tools used in computational text analysis, aka text as data. Students will learn how to use quantitative methods to discover, measure, and infer latent information and phenomena from large amounts of text. The course will involve hands-on analysis of real-world textual datasets from social media (Twitter and Reddit), newswire (Wall Street Journal or NYTimes), and other corpora. This class is ideal for students who are interested in learning how to aggregate large amount of text and apply statistical methods to discover, measure, and infer phenomena from text. Some prior programming experience is expected, though all necessary skills, including an overview of Unix and Python, will be covered in the beginning of the course.
This course will introduce students to the methods and tools used for developing Natural Language Processing and Machine Learning software. Students will work as a team to develop a machine learning system that can compete in a range range of increasingly challenging problems in natural language semantics. Teams will choose which challenge to tackle from a collection of tasks for computational semantic analysis. Students will have an opportunity to compare their systems against teams from other institutions and present their results. Participation requires permission of the instructor.
This course will introduce students to the methods and tools used in data science to obtain insights from data. Students will learn how to analyze data arising from real-world phenomenon while mastering critical concepts and skills in computer programming and statistical inference. The course will involve hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. This class is ideal for students looking to increase their digital literacy and expand their use and understanding of computation and data analysis across disciplines. No prior programming or math background is required.
The class is recommended for all scientists and engineers with a genuine curiosity about the fundamental obstacles to getting machines to perform tasks such as deduction, learning, planning and prediction, and how to overcome those obstacles. The course covers in-depth methods for automated reasoning, automatic problem solvers and planners, knowledge representation mechanisms, game playing, machine learning, and statistical pattern recognition.