-
Highly skilled Senior Site Reliability Engineer adept in Automation, Observability, and CI/CD.
-
Proficient in Shell and Python scripting for task automation, resulting in increased efficiency.
-
Experience in building end-to-end monitoring solutions using Grafana, Prometheus, Splunk, ELK, and ITRS.
-
Implementation of SRE best practices, including the "Five Golden Signals."
-
Professional Experience:
-
Automation of manual tasks through Shell and Python scripting, reducing operational overhead.
-
Establishment of monitoring solutions with Grafana, Prometheus, Splunk, ELK, and ITRS for rapid troubleshooting.
-
Implementation of CI/CD pipelines for continuous software delivery, enhancing deployment speed and reliability.
-
Leadership in the adoption of SRE practices, focusing on the "Five Golden Signals" to improve system reliability.
-
Management and optimization of cloud environments including Openshift, ECS Clusters, Linux VMs, EKS, Starfleet, and Lightspeed.
-
Application of machine learning techniques for anomaly detection using Prophets ML, enhancing dashboard interpretability.
-
Configuration and maintenance of monitoring agents like Prometheus and Grafana agents for continuous performance data collection.
-
-
Skills:
-
Automation: Shell, Python
-
Observability: Grafana, Prometheus, Splunk, ELK, ITRS
-
CI/CD: Jenkins, GitLab CI
-
Cloud Technologies: Openshift, ECS Clusters, Linux VM, EKS, Starfleet, Lightspeed
-
Anomaly Detection: Prophets ML
-
Dashboard Development: Grafana Enterprise
-
Collaboration: Cross-functional teamwork, stakeholder communication
-