Overall, 10+ years of experience in as Azure data engineer using Microsoft Azure with Databricks, Databricks workspace for Business Analytics, Manage Clusters in Databricks, Managing the Machine Learning Lifecycle.
Hands on exp Data extraction (extract, Schemas, corrupt record handling and parallelize code), transformations and loads (user -defined functions, join optimizations) and Production (Optimize and automate Extract, Transform and Load)
Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer
Hands on exp on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
Design and develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Worked on projects in waterfall and agile methodology.
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL database, Presto from data bricks.
Experience on configure the connection to Presto
Experience in Developing Spark applications using Spark – SQL in Databricks for data
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
Created Azure SQL database, performed monitoring and restoring of Azure SQL database. Performed migration of Microsoft SQL server to Azure SQL database.
Machine Learning: Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), K-Means Clustering, K-Nearest Neighbors (KNN), Random Forest, Gradient Boosting Trees, Ada Boosting, PCA, LDA, Natural Language Processing
Good understanding of Bigdata Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager and Kafka.
Expertise in understanding, solving big data problems using Hadoop ecosystem components such as HDFS, Map Reduce, Hive, Oozie, Autosys.
Expertise in various phases of project life cycles (Design, Analysis, Implementation and testing).
QWIRK is a brand owned by Gigart Solutions Incorporation, Delaware, USA. The company is a freelance marketplace to help firms find quality professionals when needed, to be a part of your workforce.