Data upload
Data Ingestion
Data ingestion is the transportation of data from assorted sources to a storage medium where it can be accessed, used, and and analyzed by the application.

upload file formats allowed |
csv files |
Excel file(xlsx) |
HTML |
JSON |
txt files |
MongoDB |
MySql |
Methods to upload Data
Read from CSV
Method Name |
read_data_from_csv |
|
|
Method Description |
This method will be used to read data from a csv file or a flat file |
|
Input parameter names |
self,file_name, header,names, use_cols, separator |
|
Input Parameter Description |
file_name: name of the file to be read header: Row number(s) to be used as column names names : array-like, optional List of column names to use. If file contains no header row, then you should explicitly pass header=None . Use_cols: To load a subset of columns Separator: Delimiter to use |
|
ouptput |
A pandas Dataframe |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Read from Excel
Method Name |
read_data_from_excel |
|
|
Method Description |
This method will be used to read data from an MS Excel File |
|
Input parameter names |
self,file_name,sheet_name, header,names, use_cols, separator |
|
Input Parameter Description |
file_name: name of the file to be read sheet_name: Lists of strings/integers are used to request multiple sheets. Specify None to get all sheets. header: Row number(s) to be used as column names names : array-like, optional List of column names to use. If file contains no header row, then you should explicitly pass header=None . Use_cols: To load a subset of columns Separator: Delimiter to use |
|
ouptput |
A pandas Dataframe |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Read from HTML
Method Name |
read_data_from_html |
|
|
Method Description |
This method will be used to read data from an HTML web page Input parameter names self,url |
|
Input Parameter Description |
url: URL of the HTML page to be read. |
|
ouptput |
A pandas Dataframe |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Read from JSON
Method Name |
read_data_from_json |
|
|
Method Description |
This method will be used to read data from a json file. |
|
Input parameter names |
self,file_name |
|
Input Parameter Description |
file_name: name of the file to be read |
|
ouptput |
A pandas Dataframe |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Read from text
Method Name |
read_data_from_text |
|
|
Method Description |
This method will be used to read data from a text file. |
|
Input parameter names |
self,file_name, delimiter |
|
Input Parameter Description |
file_name: name of the file to be read header: Row number(s) to be used as column names names : array-like, optional List of column names to use. If file contains no header row, then you should explicitly pass header=None . Use_cols: To load a subset of columns Separator: Delimiter to use |
|
ouptput |
A pandas Dataframe |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Connecting to SQLDB
Method Name |
Connect_to_sqldb |
|
|
Method Description |
This method will be used to connect to a SQL Databases |
|
Input parameter names |
self,host,port, username, password |
|
Input Parameter Description |
host: the server hostname/IP where the DB server is hosted Port: the port at which the DB Server is running username: The username to connect to the DB server password: The password to connect to the DB server |
|
ouptput |
A DB connection object |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Read from SQLDB
Method Name |
read_data_from_sqldb |
|
|
Method Description |
This method will be used to read data from SQL Databases |
|
Input parameter names |
self,db_name,host,port, username, password, schema_name,query_string |
|
Input Parameter Description |
db_name: For example, SQL, MySQL, SQLLite etc. host: the server hostname/IP where the DB server is hosted Port: the port at which the DB Server is running username: The username to connect to the DB server password: The password to connect to the DB server schema_name: The name of the DB schema the user wants to connect to. query_string: the query to be executed to load the data |
|
ouptput |
A Pandas Dataframe |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Read from MONGODB
Method Name |
read_data_from_mongdb |
|
|
Method Description |
This method will be used to read data from Mongo DB |
|
Input parameter names |
self,host,port, username, password, db_name,collection_name, query_string. Input Parameter Description |
|
output |
A Pandas Dataframe |
|
On Exception |
Write the exception in the log file. Raise an exception with the appropriate error message |
Exceptions Scenarios
Step |
Exception |
Mitigation |
User gives Wrong Data Source |
Give proper error message |
Ask the user to re-enter the details |
User gives corrupted data |
Give proper error message |
|