Description
- I am a Data Analyst, Business Analyst, Data Scientist.
- Primary Skills: R, Regression Models, Predictive Analytics, Decision Tress, Random Forest, Clustering, Python, HDFS, MapReduce, Click Schedule, Click SO Admin tool.
- Secondary Skills: Hadoop, Business Analysis, Data Analysis, Logistic Regression, Predictive Modeling, SQL, Cluster Analysis, Java, Linear Regression, Regression Analysis.
Technical Skills -
- Data Processing: R, MS Excel and SQL.
- Databases: Oracle 10g, SQL Server 2005, Hadoop.
- Languages: C, C++, Java, J2EE, Data Structures, HIVE, Python basics.
- CRM and Build tools: Click Schedule, Click SO Admin tool, GIT, Continuous Integration Tools, BMC Remedy.
Work Experience (Projects as well) -
- The company is a debt advisory firm who wants to add a value-added service by helping its new client understand how the banks stack up with respect to each other in terms of their interest rates for the specific client.
- Built predictive model for the company which will predict interest rates for their clients.
- Used Linear Regression and Random Forest for apprehending variables importance and to predict interest rates.
- Applied packages like DPLYR and TIDYR for data cleaning, preparation and variables selection to ensure data is meaningful and analyzable.
- Conducted Hypothesis Testing to make probability statements about significant variables and measured the performance of the model along with the validation of assumptions.
2. The company is a bike sharing system where user can be able to easily rent a bike from a position and return back at another position. Apart from interesting real-world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research.
- Quantifying the impact of environmental and macro factors on sales of bike sharing system.
- Predicted the total number of bikes rented any day using Linear Regression Model.
3. Did a project on census-income data from Kaggle.
- From this data, we need to categorize the users based on the salary greater than 50K and less than equal to 50K.
- Conducted data analysis using logistic regression model to classify the users base.