Technical Projects
Types of accidents in Victoria Neighbourhood
The application was built in R shiny. HTML was also integrated with the shiny app to build a better visualization. Leaflets library was used to visualize the maps and find the correlation between the accidents and house types. Different types of graphs were created to contribute to the analysis of the accidents. For instance, a Stacked bar chart was plotted to understand the number of accidents in the Suburbs.
Analyze the egg Depositions of the Lake Huron Bloasters
Converted the data frame into time series data using R. SARIMA models were built to find best model specification ACF/PACF plots BIC and EACF plots were plotted and analyzed. Estimated the model parameters and significance tests using LSE and/or MLE. The best fit model was selected and forecasted the values for the upcoming 5 years.
Analyzing and Tracking the sentiment and topics on Social Media
Connected with twitter was established by Tweepy API and fetched around 10k tweets over one week. The tweets were then cleaned and then normalized and tokenized to find the sentiment of the words/sentences. The trend of the Oneplus was found by the hashtags which were visualized to find the interesting findings. Implemented Topic modeling to find the relevant topics discussed in the tweets. Performed sentiment analysis on the tweets using the Vader sentiment.
Performance Comparison of Map-Reduce Algorithm
Analyzed two big datasets Common Crawl and Amazon Review from AWS S3 bucket. Established a connection with the jump host and accessed the AWS EMR Master. Analyzed the Performance with boosting the clusters, changing the data sizes among the datasets, changing the number of mappers and reducers.
Predicting the revenue decline for Portuguese Banking institution(Kaggle Project)
Using Jupyter Notebook the Portuguese Banking dataset was loaded from Kaggle. Data Exploration was done in Tableau to find interesting insights. KNN and Random Forest Machine learning models were built to predict the decline in the revenue in the institution.
Deconstruct-Reconstruct-Web-Report using Python
The visualisation has multiple issues that you can fix or improve. These issues can relate to one or more of the following: Ethical issues Issues with data integrity Perceptual or colour issues The data visualisation is based on real data and includes variables that are sufficient to produce an interesting data visualisation.
DataExploration-with-Automobile dataset using Python
Libraries such as Pandas, Numpy, Matplotlib are imported for ease use of data structures and data analysis in python.Data set is imported into the jupyter notebook in order to analyse and give good insights into the data. It is imported with function “read_csv” with a separator of “#” and giving the corresponding column names.
Titanic-Survival-using-Logistic-Regression
The sinking of the Titanic on April 15th, 1912 is one of the most tragic tragedies in history. The Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers. The numbers of survivors were low due to the lack of lifeboats for all passengers and crew. Some passengers were more likely to survive than others, such as women, children, and upper-class.
Predicting-Loan-status using Python
Insurance stream is one of the largest use of data analytics method when compared to other fields. This domain gives the challenging part of working on the data from the insurance company. This is the classification dataset, which mainly depends on the type of strategies used, and variable that plays an important role in the target variable.
Forecasting-on-the-Penguin-arrival-in-Auckland in R
The purpose of this project to analyse the penguin arrival in Auckland, New Zealand, and forecast the arrival for the next two years. To find the existence of seasonality trend/ pattern if any and to predict their future pattern/ trend according to the analysis conducted on the data.
Forecasting-ozone-Layer-thickness using R
The sun emits UV radiation and ozone layer protects the earth from it. The thickness of the ozone layer is calculated in the Dobson Unit. The data contains the thickness of the ozone layer from the year 1927 to 2016, where the negative value implies the decrease in the ozone thickness layer and vice versa for the positive value.
Apple-vs-Samsung-with-higher-customer-satisfaction using Python
Every business in the world, relies on clients as their source of making profits. Customer satisfaction is one of the most important criteria that any business would like to excel in. Our primary focus, in this project, is to see whether Apple or Samsung, despite the brand value held, is successful in retaining the customers' trust and satisfaction.