What is Supervised and Unsupervised Learning?

Ayse Yaman
4 min readMay 23, 2021
image

In this article, we’ll explore the basics of two data science approaches and these relationship with machine learning.

Supervised Learning

Supervised learning is a machine learning approach that’s defined by its use of labeled datasets and it’s important to note that the data is labeled. These datasets are designed to train or “supervise” algorithms into classifying data or predicting outcomes accurately.

The goal is to use the inputs to predict the values of the outputs. In supervised learning, the algorithm “learns” from the training dataset by iteratively making predictions on the data and adjusting for the correct answer.
It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process.

Supervised learning can be separated into two types of problems:

Classification: A classification problem is when we predict qualitative outputs. Qualitative variables are typically represented numerically by codes. The easiest case is when there are only two classes or categories, such as “success” or “failure,” “survived” or “died.”
Naive bayes, logistic regression, decision trees and K-nearest neighbors are popular classification algorithms.

Regression: A regression problem is when we predict quantitative outputs. It is commonly used to make projections, such as for sales revenue for a given business.
Linear regression, logistic regression, and polynomial regression are popular regression algorithms.

Unsupervised Learning

Unsupervised learning is a type of algorithm that learns patterns from unlabeled data. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

In the context of unsupervised learning, there is no such direct measure of success. It is difficult to ascertain the validity of inferences drawn from the output of most unsupervised learning algorithms. One must resort to heuristic arguments not only for motivating the algorithms, as is often the case in supervised learning as well, but also for judgments as to the quality of the results. Algorithms are left to their own devises to discover and present the interesting structure in the data.
These algorithms discover hidden patterns in data without the need for human intervention (hence, they are “unsupervised”).

Unsupervised learning can be separated into two main of methods these are clustering and association.

Cluster analysis, also called data segmentation, has a variety of goals. All relate to grouping or segmenting a collection of objects into subsets or “clusters,”.
The goal is sometimes to arrange the clusters into a naturalhierarchy. (such as grouping customers by purchasing behavior.)
Cluster analysis is also used to form descriptive statistics to ascertain whether or not the data consists of a set distinct subgroups, each group representing objects with substantially different properties.

Association analysis has emerged as a popular tool for mining commercial data bases.
It is most often applied to binary-valued data Xj ∈ {0,1}, where it is referred to as “market basket” analysis.
Those variables that frequently have joint values of one represent items that are frequently purchased together. (such as people that buy X also tend to buy Y.)
This information can be quite useful for stocking shelves, cross-marketing in sales promotions, catalog design, and consumer segmentation based on buying patterns.

Semi - Supervised Learning

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data).

Unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render large, fully labeled training sets infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value.

Difference Between Supervised and Unsupervised Learning

The biggest difference between Supervised and Unsupervised Learning is that supervised learning deals with labeled data while Unsupervised Learning deals with unlabeled data.

Supervised learning model uses training data to learn a link between the input and the outputs. In contrast that, Unsupervised learning algorithm does not use output data.

The goal of supervised learning is to train the model so that it can predict the output when it is given new data.
The goal of unsupervised learning is to find the hidden patterns and useful insights from the unknown dataset.

Supervised learning needs supervision to train the model.
Unsupervised learning models, in contrast, work on their own to discover the inherent structure of unlabeled data.

References:

The Elements of Statistical Learning — Stanford University

https://www.ibm.com

https://machinelearningmastery.com

https://en.wikipedia.org/wiki/Semi-supervised_learning

--

--