· Work with the Saas Operations and Product
Development teams to develop monitoring platform using Azure
monitoring and APM tools (Appdynamics or similar tools)
· Participates as a member of project team of other
product developers to develop and execute reliable, cost effective
and high quality site reliability solutions.
· Design, develop, and improve application,
Kubernetes and infrastructure logging, metrics, and monitoring
· Triage and troubleshoot production defects across the stack
· Identify toil, document fix processes, and automate them away
· Troubleshooting, performing root cause analysis, and resolving
production issues from the application and network layers all the way down to
the system level. This might include anything from digging into source code
(our own or from open source projects), hunting memory leaks, tracing
bottlenecks in upstream networks, or database query optimization.
· Designing & implementing the monitoring
platform for collecting metrics, crunching data and improving service
monitoring to detect problems before they're visible to our customers
· Build to monitor Service Level Indicators, Service Level
Objective's and Service Level Agreement requirements
· To understand and communicate every characteristic of their
service stack, such as:
o
Degradation
and behavior under load of the services and their dependencies
o
End-to-end
tuning needs, optimizing resource utilization, as load patterns fluctuate
o
Instrumentation
and metrics that clearly describe the service behaviors
o
Scaling
requirements and patterns
o
Resiliency
and recoverability, ensuring that backup / restore and disaster recovery
capabilities are implemented, tested and maintained
· Participate in code reviews: verify maintainability,
extensibility and assure complexity has been minimized.
· Participate in a collaborative, supportive, and fun environment
to bring out the best work in those around you.
Nice to Have’s
· Experience with the DevOps tool chain.
· Comfortable in client and server side web development using
JavaScript, HTML, REST.
· Experience with SQL query development (SQLServer /
Oracle/ MySql).
· Knowledge of High Availability, Disaster Recovery,
and Scaleability best practices in a cloud environment.
· Experience with GIT, Jenkins, Gradle, Docker, and Kubernetes
· Experience with at least one major cloud
platform such as GCP or Azure
· Nice to have, but not required: experience with emerging
technologies such as Spark, Hadoop, Elasticsearch, Redis, Kafka, and machine
learning
· Defining and documenting technical architecture of complex and
highly scalable products
· Bachelor’s degree in Computer Science or related field.