I'm a data enthusiast, a music lover, a foodie and a travel buff
I have a Master's degree in Information Management with a specialization in Data Science from Syracuse University, New York. I enjoy working with data and love to generate visualizations, predictive models in order to understand what the data is trying to say.
I believe that data is the most valuable resource in today's age and if harnessed and used effectively, it can transform human lives in a positive way.
Also, I've developed my skills in the domain of Data Analytics, Data Visualization, Machine Learning by using Python, R Programming, SQL, Tableau and Power BI. I'm currently working as a Data Scientist at an organization called Ascend innovations
Utilizing healthcare data from SQL databases, and incorporating Python and R for data analysis and developing machine learning algorithmns for providing insights and suggestions to healthcare organizations about general public health in the area
Focused on utilizing ETL pipeline for data generated in Syracuse city for data cleaning and effective management. Created Power BI dashboards to determine trends for understanding the cause of violations and complaints in the city
Compiled 1M domain of websites for web scraping for extracting medical domains from those and implemented topic modeling for determining most important medical terms
Collaborated with multiple teams to understand patient data in Syracuse region. Implemented SSIS packages for data extraction and storage in database and created Tableau visualizations to determine the diseases prevalent in Syracuse
Relevant Coursework: Data Science, Big Data, Database Management, Neural Networks
Relevant Coursework: Data Structures, Statistics, Data Analysis, Economics & Management
Here are some of the projects that I've worked on
(Click on the images for detailed information)
The goal of the project was to determine the chances of patient being readmitted to a hospital. The different features included gender, weight, race, admission type and also some patients had some underlying conditions for diabetes and other illness. The patient data was collected from 1991-2008 and teh dataset was obtained from UCI Machine Learning Repository
Skills: Python, PySpark, Numpy, Pandas, Seaborn, Regression, Random Forest, Jupyter Notebook, Tableau
The goal of the project was to determine the satisfaction ratings between 1-5 (1 being lowest) of traveling with different airlines in the States. The different features that were utilized where age, gender, class of travel, airport, airlines, etc.
Skills: R, R Studio, ggplot, Regression, Associative Rule Mining, SVM
The objective of the project was to classify the questions posted the users on Quora as sincere or insincere and to understand the distribution and the reason behind the classification. The dataset was obtained from Kaggle.
Skills: Python, Numpy, Pandas, Seaborn, Logistic Regression, SVM, CNN, LSTM, Jupyter Notebook
This project aimed to develop a data warehouse for the merger of two corporations, the first one being a movie renting service and the other one being an online retailer. The second objective was to identify the number of lag days from when a particular order is placed and to genrate insights from the data obtained for improving the delivery service.
Skills: MS SQL Server, Visual Studio, SSIS, SSAS, SQL, MS Excel, MS Power BI