What is Data Science?
    
     Data Science can be infer as union of mathematical analysis (Statistics), Data analysis, machine learning and their inter relation. It can be used for meaningful analysis of structured and non structured data.

Venn Diagram for Data Science:





Process basically comprises of flow of steps and and it's agile improvement, formal and popular process of Data Science is CRISP-DM.

CRISP-DM:
1- Business Understanding.
2-Data Understanding.
3-Data Preparation.
4-Data Modelling.
5-Evaluation.
6-Deployment.

1-Business Understanding:
  • What are business activities?
  • Whether data science can achieve required objectives?
  • How do we define success metrics?
  • Are there ethical consideration in data usage?
  • What have other industries achieve(SOTA)?
2-Data Understanding:
  • What are the source of data?
  • Does new data need to be collected?
  • What are the quantity and quality of data available?
  • What do different data items represent?
  • Which data is relevant to the objective?
3-Data Preparation:
  • What are the different data forms?
  • Is there need for annotating data?
  • How data can be extracted, transformed and loaded?
  • How to Standardize and normalize data?
  • How to efficiently store data for analysis?  
4-Data Modelling:
  • What assumptions to make for the models?
  • Statistical or algorithmic modelling?
  • Is clean data is sufficient for modelling?
  • Is compute budget sufficient for modelling?
  •  Are result statistically significant?
5-Evaluation:
  • Does model work correctly on test data.
  • Does model achieve business objective?
  • Does model meet performance requirement?
  • Is the model unbiased and robust?
  • What are the ways to improve the model?
6-Deployment:
  • Where is the model to be deployed?
  • What is the HW/SW stack for deployment?
  • Does it meet performance requirements?
  • Does it violates privacy requirements?
  • Does it meet user expectation?

Programming Tools:
1-No code enviroment:
  • H2O
  • IBM Watson
  • Data Robot etc...
2-Spread sheets, BI tools(Business Intelligence tools):
  • Microsoft Excel
  • Power BI
  • Google sheets etc...
3-Programming language:
  • Python
  • R
  • MATLAB etc...

       By:         Roshan Kumar Singh
       Source:  PadhAI

Comments