Data Science

Professional Highlights
Support transformations of SQL/PLSQLS from various Datasources to targets - Spark, Hive, Azure, Redshit, Snowflake
Automated query conversion from source to target by one click.
We are replicating Mainframe Data Processing into Datalake by migrating Mainframe Application into Datalake.
Ingestion – CDC Enabler (IBM Tool) pulls data from Mainframe to landing location in Datalake (.data, .meta, .ctl) and scheduler runs through them and load data in hive tables.
Extraction - Processing and transformation on the loaded data and give it to Reporting team.
Automated data pipelines through Oozie
Databridge pulls data from different metastores (Oracle, db2, mySql) and dump it to the required format on HDFS. The huge amount of data is then processed by firing SQL queries by end users which are internally parsed into Pig and Scala scripts through SQL Parser
TECHNICAL SKILLSET:
Technologies and Languages: Big Data, Spark , Hadoop, Hive, Oracle, SQL, JAVA, Scala, JAVA, Python, C Programing, Node JS
Brief knowledge on: Teradata, Greenplum, Vertica, Redshift, Azure, Snowflake, Shell scripting, Python, Spark streaming
Operating Systems: Windows, Linux