What is Data Science?
Process basically comprises of flow of steps and and it's agile improvement, formal and popular process of Data Science is CRISP-DM.
CRISP-DM:
1- Business Understanding.
2-Data Understanding.
3-Data Preparation.
4-Data Modelling.
5-Evaluation.
6-Deployment.
1-Business Understanding:
- What are business activities?
- Whether data science can achieve required objectives?
- How do we define success metrics?
- Are there ethical consideration in data usage?
- What have other industries achieve(SOTA)?
2-Data Understanding:
- What are the source of data?
- Does new data need to be collected?
- What are the quantity and quality of data available?
- What do different data items represent?
- Which data is relevant to the objective?
3-Data Preparation:
- What are the different data forms?
- Is there need for annotating data?
- How data can be extracted, transformed and loaded?
- How to Standardize and normalize data?
- How to efficiently store data for analysis?
4-Data Modelling:
- What assumptions to make for the models?
- Statistical or algorithmic modelling?
- Is clean data is sufficient for modelling?
- Is compute budget sufficient for modelling?
- Are result statistically significant?
5-Evaluation:
- Does model work correctly on test data.
- Does model achieve business objective?
- Does model meet performance requirement?
- Is the model unbiased and robust?
- What are the ways to improve the model?
6-Deployment:
- Where is the model to be deployed?
- What is the HW/SW stack for deployment?
- Does it meet performance requirements?
- Does it violates privacy requirements?
- Does it meet user expectation?
Programming Tools:
1-No code enviroment:
- H2O
- IBM Watson
- Data Robot etc...
2-Spread sheets, BI tools(Business Intelligence tools):
- Microsoft Excel
- Power BI
- Google sheets etc...
3-Programming language:
- Python
- R
- MATLAB etc...
By: Roshan Kumar Singh
Source: PadhAI
Comments
Post a Comment