Tag: Machine Learning Workflow Management

Machine Learning Workflow | Process Steps

Machine Learning-


Before you go through this article, make sure that you have gone through the previous article on Machine Learning.


We have discussed-

  • Machine learning is building machines that can adapt and learn from experience.
  • Machine learning systems are not explicitly programmed.


In this article, we will discuss machine learning workflow.


Machine Learning Workflow-


Machine learning workflow refers to the series of stages or steps involved in the process of building a successful machine learning system.


The various stages involved in the machine learning workflow are-



  1. Data Collection
  2. Data Preparation
  3. Choosing Learning Algorithm
  4. Training Model
  5. Evaluating Model
  6. Predictions


Let us discuss each stage one by one.


1. Data Collection-


In this stage,

  • Data is collected from different sources.
  • The type of data collected depends upon the type of desired project.
  • Data may be collected from various sources such as files, databases etc.
  • The quality and quantity of gathered data directly affects the accuracy of the desired system.


2. Data Preparation-


In this stage,

  • Data preparation is done to clean the raw data.
  • Data collected from the real world is transformed to a clean dataset.
  • Raw data may contain missing values, inconsistent values, duplicate instances etc.
  • So, raw data cannot be directly used for building a model.


Different methods of cleaning the dataset are-

  • Ignoring the missing values
  • Removing instances having missing values from the dataset.
  • Estimating the missing values of instances using mean, median or mode.
  • Removing duplicate instances from the dataset.
  • Normalizing the data in the dataset.


This is the most time consuming stage in machine learning workflow.


3. Choosing Learning Algorithm-


In this stage,

  • The best performing learning algorithm is researched.
  • It depends upon the type of problem that needs to solved and the type of data we have.
  • If the problem is to classify and the data is labeled, classification algorithms are used.
  • If the problem is to perform a regression task and the data is labeled, regression algorithms are used.
  • If the problem is to create clusters and the data is unlabeled, clustering algorithms are used.


The following chart provides the overview of learning algorithms-



4. Training Model-


In this stage,

  • The model is trained to improve its ability.
  • The dataset is divided into training dataset and testing dataset.
  • The training and testing split is order of 80/20 or 70/30.
  • It also depends upon the size of the dataset.
  • Training dataset is used for training purpose.
  • Testing dataset is used for the testing purpose.
  • Training dataset is fed to the learning algorithm.
  • The learning algorithm finds a mapping between the input and the output and generates the model.



5. Evaluating Model-


In this stage,

  • The model is evaluated to test if the model is any good.
  • The model is evaluated using the kept-aside testing dataset.
  • It allows to test the model against data that has never been used before for training.
  • Metrics such as accuracy, precision, recall etc are used to test the performance.
  • If the model does not perform well, the model is re-built using different hyper parameters.
  • The accuracy may be further improved by tuning the hyper parameters.



6. Predictions-


In this stage,

  • The built system is finally used to do something useful in the real world.
  • Here, the true value of machine learning is realized.


To gain better understanding about Machine Learning Workflow,

Watch this Video Lecture


Next Article- Linear Regression


Get more notes and other study material of Machine Learning.

Watch video lectures by visiting our YouTube channel LearnVidFun.