Skills Required:
·
Spark, python, Scala, Logstash,
pig, Hive, impala, Splunk, Hadoop
Key Roles and Responsibilities:
·
Understand functional
requirements related to Data processing
·
Deliver scalable, and
high-performance, real-time data processing applications
·
Development of various KPIs
in Spark, Map Reduce, Hive
·
Migration of PIG KPI scripts
to Hadoop M/R Java jobs
·
Develop scripts/code to
schedule jobs and automate file management
·
Deliver solutions to User
Story based requirements
·
Engaging with the Scrum Team
and delivery of Sprint commitments
·
Share responsibility for all
team deliverables, and favor informal communications with Product Owner
·
Responsible for
implementation and support of Hadoop environment
·
Involved in fine- tunning, performance
tuning and scaling
·
Monitoring Hadoop cluster
connectivity and security
·
Scheduling the Job using tool
like Oozie and monitoring the schedule
·
Should be able to perform
backup, space management and recovery task
Experience and Skills Required:
·
Good knowledge of Spark in
Python/Scala/Java, Hbase, Hive, Pig and Impala
·
Knowledge on architecture of
Hadoop 2.0
·
Good to have knowledge on
Java, Kafka or any message broker (Active MQ, Rabbit MQ)
·
Good knowledge of Linux and
tool like Splunk, Tableau and Data Meer
·
Familiarity with open source
configuration management and deployment tools such as Puppet or Chef
·
Know ledge of any of the
scripting language (Bash, Perl, and Python). Experience in statistics methods,
machine learning and AI is a plus
·
Hand on experience on
Cloudera or MapR