logoIntroduction toMachine Learnin

Module 6: Preprocessing Categorical Variables

This module will teach you different encoding methods for categorical variables (ordinal and one-hot encoding) and appropriately set them up. We will also introduce ColumnTransformer and CountVectorizer from the sklearn library and show you how to implement them.

0Module Learning Outcomes

1Categorical Variables: Ordinal Encoding

2Categorical Variables

3True or False: Ordinal Encoding

4Try Ordinal Encoding Yourself!

5Categorical Variables: One-Hot Encoding

6One-Hot Encoding Questions

7One-Hot Encoding - Output

8One Hot encoding True or False

9Encoding - One-Hot Style!


11Transforming Columns with ColumnTransformer

12Transforming True or False

13 Your Turn with Column Transforming

14Make - Pipelines & Column Transformers

15Making pipelines

16Transforming True or False

17Making Pipelines with make_pipeline()

18Handeling Categorical Features: Binary, Ordinal and More

19Transforming Categorical Features

20Categorical True or False

21Transforming the Fertility Dataset

22Text Data

23Text Data Questions

24Text Data True or False

25CountVectorizer with Disaster Tweets

26What Did We Just Learn?

About this course

This course covers the data science perspective on the introductory concepts in machine learning, with a focus on making predictions. It covers how to build different models such as K-NN, decision trees and linear classifiers as well as important concepts such as data splitting and fundamental rules and laws. In addition, this course will teach you how to evaluate models properly and question their validity all while streamlining the process with pipelines.

About the program

The University of British Columbia (UBC) is a comprehensive research-intensive university, consistently ranked among the 40 best universities in the world. The Key Capabilities in Data Science program was launched in September 2020 and is developed and taught by many of the same instructors as the UBC Master of Data Science program.