Data profiling¶
Data profiling is the process of reviewing source data, understanding structure, content and interrelationships.

Method Definition
Data Profiler¶
| Class Name | DataProfiler | |
|---|---|---|
| Method Name | get_data_profile | |
| Method Description | This method will be used to give various insighst about data. | |
| Input parameter names | self, dataframe | |
| Input Parameter Description | dataframe: the inpt data just loaded from source | |
| ouptput | a) The number of rows | |
| b) The number of columns | ||
| c) Number of missing values per column and their percentage | ||
| d) Total missing values and it’s percentage | ||
| e) Number of categorical columns and their list | ||
| f) Number of numerical columns and their list | ||
| g) Number of duplicate rows | ||
| h) Number of columns with zero standard deviation and their list | ||
| i) Size occupied in RAM | ||
| On Exception | Write the exception in the log file. Raise an exception with the appropriate error message |