Machine Learning is the study of computer algorithms that improves automatically through experience. It seems like a subset of Artificial Intelligence. ML algorithms build a mathematical model based on sample data, known as trained data. In this blog we will look at various key factors to consider while choosing machine learning models.
Well, there is no straightforward and sure-shot answer for key factors on Machine Learning. The answer depends on many factors like the problem statement and the kind of output one wants to type and size of the data, features, and observations in the data.
Here are some important considerations while choosing an algorithm.
- Size of the trained data:
It is usually recommended to gather a good amount of data to get reliable predictions. However, many a time, the availability of data is a constraint. So, if the trained data is smaller or if the dataset has a fewer number of observations and a higher number of features.
If the trained data is sufficiently large and the number of observations is higher as compared to the number of features, one can go for low bias/ high variance algorithms like KNN, Decision trees, or Kernel SVM.
Accuracy of a model means that the function predicts a response value for a given observation, which is close to the true response value for that observation. A highly interpretable algorithm means that one can easily understand how any individual predictor is associated with the response while the flexible models give higher accuracy at the cost of low interpretability.
Now, to use which algorithm depends on the objective of the business problem.
- Speed or Training time:
Higher accuracy typically means higher training time. Also, algorithms require more time to train on large training data. In real-world applications, the choice of algorithm is driven by these two factors predominantly.
Algorithms like Naïve Bayes and Linear and Logistic regression are easy to implement and quick to run.
Algorithms like SVM, which involve tuning of parameters, Neural networks with high convergence time, and random forests, need a lot of time to train the data.
Many algorithms work on the assumption that classes can be separated by a straight line. Examples include logistic regression and support vector machines.
The linear regression algorithm assumes that data follows a straight line. If the data is linear, then these algorithms perform quite well.
The best way to find out the linearity is to either fit a linear line or run a logistic regression or SVM and check for residual errors.
- Several numbers of features:
The dataset may have a large number of features that may not all be relevant ad significant. For a certain type of data, such as genetics or textual, the number of features can be very large compared to the number of data points. A large number of features can bog down some learning algorithms, making training time unfeasibly long.
- SVM is better suited in case of data with large feature space and lesser observations.
- PCA and feature selection techniques should be used to reduce dimensionality and select important features.
Here is a simple overview of some techniques and algorithms in machine learning. Furthermore, there are more and more techniques that apply to machine learning as a solution. In the future, machine learning will play an important role in our daily life. So, these were just some of the key factors for choosing machine learning models.