Below lists the major classification of Machine Learning Algorithms
Clustering
Clustering involves taking unlabeled data and grouping the items in the data together based on similarity. It achieves this through using a metric known as a similarity measure. If the features increase, then the similarity increases. Let’s look at a simple example to further illustrate this definition. We may want a machine to scan an image of a cat and a dog and identify which is which. It will look at a series of features that a cat and a dog have, such as whiskers on a cat. Those features would help the model in placing the cats in one group and the dogs in another.
There are various clustering types
- Exclusive clustering or hard clustering, a type where one data point can belong to only one cluster
- Overlapping cluster or soft clustering, a type where data points can belong to more than one cluster.
- Hierarchical clustering, which consists of creating a hierarchy of clustered data items.
The various clustering types are implemented using clustering algorithms.
- K-means is a popular clustering algorithm that includes putting data points into the redefined number of clusters known as K. The algorithm does not give us the number of clusters. Instead, you must determine the number of clusters by specifying the number of clusters known as K. Each data item then gets assigned to the nearest cluster center called centroids. We may be able to run the algorithm with different values of K to determine the best possible solution or clusters.
- Another algorithm is called a fuzzy K-means. This is used to perform overlapping clustering. Data points can belong to more than one cluster with a certain level of closeness towards each other.
- And lastly, we have the hierarchical clustering model. This algorithm may start with each data point assigned to a separate cluster. Two clusters that are close to one another are then merged into one single cluster. The process repeats until all of the clusters are encased within one cluster. This is known as the bottom up approach.
To conclude, clustering is a type of machine learning technique used to grouped unknown data into clusters.
Classification
Classification is the process of categorizing a given set of data into classes. It can be used for both structured and unstructured data and will help you in predicting a class, also known as a target, label, or even categories of data points. There are four main types of classification tasks.
- The first is Binary Classification. As the name suggests, this refers to predicting one of two classes, such as spam or not spam, apples or bananas, cancer detected or cancer not detected.
- The second type would be Multi-class Classification, which refers to those classification tasks that have more than one class. Examples of where this could be used would be plant species, as there are a variety of plants. Unlike the binary classification, a piece of data can belong to as many labels as possible. The number of class labels can be large.
- The third type of classification would involve the Multi Label Classification task. This refers to those classification tasks which have two or more class labels. For instance, you may have a picture and within the picture, various objects can be linked to various labels. Perhaps the picture has books, flowers, and a lamp. Each of those objects will be assigned a label.
- Finally, we have Imbalanced Classification. This refers to classification tasks where the number of examples in each class is unequally distributed. Examples of imbalanced classification are outlier detection or medical diagnostic tests.
Below are some of classification algorithms.
- Logistic regression is a statistical analysis method to predict a binary outcome such as a yes or no scenario, based on a prior observation of a data set.
- K-Nearest neighbor is a type of supervised learning algorithm used for both regression and classification.
- Decision trees use multiple algorithms to decide whether to split a node into two or more sub nodes. These nodes are like leaves in a tree which represent a class label.
- Random Forest is a supervised machine learning algorithm that is used widely in classification and regression problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression.
In summary, classification has many forms, but it involves assigning a class label to input examples.
Regression
Regression is a type of supervised learning based on real numbers. There are a family of machine learning algorithms for regression, such as linear regression, logistic regression, and Bayesian linear regression.
We will focus on the linear regression model as it is the most popular model and will allow you to understand the more enhanced models. Imagine you want to compare a group of individuals’ expenses based on their income. We can take this data plus the regression algorithm to produce a model. That model can help us predict how much money an individual spends based on their income. For instance, this could be on clothes or food. An easy way to visualize what the model is doing is to have that data plotted on a graph. A regression model can be used to predict the prices of a house given the features such as the size, the condition, and the location.
Anomaly Detection
In this image, we have a group of animated eggs, can you spot the outlier? Well, we can see the red egg is the outlier. Anomaly detection looks at what is out of the norm. In other words, it looks at the deviation from an established normal pattern. Anomaly detection can be used to detect anomalies in images. This is helpful when identifying whether a patient has cancer, or even coronavirus. Time series data, anomaly detection enables us to detect if, within a time window, a shop has run out of stock on a particular item which does not usually occur. This will allow the shop owner to understand and take action. For example, the output may show that during a distinct holiday season, such as Halloween, a particular type of candy becomes out of stock. This insight will enable the shop owner to plan well ahead for next year.
There are many ways to detect anomaly.
- The first being manual detection, which we perform using our naked eye, which is, of course, not an ideal approach and subject to human error.
- The second is automatic detection, this is commonly used in IT for maintenance. For instance, if a user was to hit a specific quota threshold when building Azure resources in a lab, the operations team could trigger an alert every time a threshold is met.
- Machine learning, this requires learning the normal patterns in the data using AI. So why use machine learning models for anomaly detection? Well first of all, it works in real time. Imagine going through a series of data manually to find anomalies, that could be time critical.