Available for SRE Opportunities

PRODUCTION SUPPORT & OPERATIONS ENGINEER

KAVIN MANICKANNAN

>> _

8 years of keeping production systems alive at scale.
Banking · E-Commerce · Hospitality · Now levelling up to SRE.

0+ Years Exp.
0 Industries
0+ APM Tools
0% Unplanned Downtime
SCROLL

Who I Am

KM
📍 Bangalore, India
🏢 Accenture — Senior Analyst
🎯 Targeting SRE Roles
🎓 PSM I Certified

I'm a Production Support & Operations Engineer with 8 years of hands-on experience keeping high-stakes systems running at scale across banking, e-commerce, and hospitality sectors.

At Accenture, I've led the charge on transforming reactive fire-fighting into proactive, alert-driven operations — building dashboards that catch problems before they become incidents, cutting MTTR across P1/P2 events, and driving structural root cause analysis that stops recurring failures.

I've worked across the full incident lifecycle: triage, coordination, resolution, and post-mortems. I understand both the technical layer (Java microservices, observability stacks, APM tools) and the operational layer (SLA governance, change management, cross-functional coordination).

Currently upskilling in AWS and platform reliability engineering to transition into a full-time SRE role — where my ops depth meets infrastructure-as-code thinking.

💰 Investment Banking
🛍️ E-Commerce
🏨 Hospitality

Tech Arsenal

Operations & Incident Mgmt

P1/P2 Incident Response Root Cause Analysis MTTR Reduction SLA Governance Change Management On-Call Support Post-Mortem Reviews
👁️

Observability & Monitoring

Dynatrace Splunk AppDynamics Sumo Logic Synthetic Monitoring Custom Dashboards APM Log Analysis
🏗️

Platforms & Infrastructure

Java Microservices AWS (Foundational) SAP (Cron/Batch) ServiceNow Postman SOAP UI
🔄

Methodologies & Tools

Agile / Scrum Jira Team Mentoring Cross-functional Coordination Release Planning Deployment Execution

Certifications

🏅
Professional Scrum Master I (PSM I) Scrum.org · 2026
Earned
☁️
AWS Certified Cloud Practitioner Amazon Web Services
In Progress
📋
ITIL 4 Foundation Axelos
In Progress

Career Timeline

ACCENTURE
Custom Software Engineering · Senior Analyst May 2017 – Jun 2025  ·  8 Years
Truist – Investment Banking

Reliability & Operations Lead

Apr 2024 – Jun 2025
💰 Banking
  • Managed production stability for Java-based Wealth & Trust microservices in high-volume banking environment.
  • Shifted team from reactive to proactive monitoring — built Splunk dashboards separating system errors from user errors and added threshold-based alerts, catching problems before P1/P2 escalation.
  • Led full incident lifecycle in ServiceNow: triage, coordination with dev/ops, post-mortem reviews, driving MTTR reduction.
  • Conducted structured RCA on recurring application failures and coordinated permanent fixes.
  • Managed release planning and deployment execution across QA, Dev, and Ops — zero unplanned downtime during production changes.
Splunk ServiceNow Java Microservices RCA P1/P2
Chanel – E-Commerce

Operations & Process Lead

May 2023 – Mar 2024
🛍️ E-Commerce
  • Replaced hourly manual webpage checks with Dynatrace synthetic monitoring simulating user journeys — cut routine monitoring effort from ~15 min/hr to under 5 and enabled outage detection without human watch.
  • Oversaw production monitoring across multiple global regions supporting e-commerce platform rollout into new markets.
  • Trained and mentored junior team members on Dynatrace, log analysis, and incident triage best practices.
  • Monitored and troubleshot global scheduled batch jobs (crons) via SAP to ensure uninterrupted business processing across time zones.
Dynatrace Synthetic Monitoring SAP Mentoring
Marriott – Hospitality (CIAM)

Technical Operations Analyst – CIAM

Aug 2021 – May 2023
🏨 Hospitality
  • Led observability stack migration from AppDynamics & Sumo Logic to Dynatrace & Splunk — tested monitoring coverage continuity and ramped team up on new tools.
  • Coordinated endpoint migrations to securely manage customer identities, user registration, and application access.
  • Served as technical point of contact for onboarding new partner integrations with Marriott.com.
  • Provided real-time technical support alongside dev teams during critical deployment windows.
Dynatrace Splunk AppDynamics Sumo Logic CIAM
Marriott – Hospitality (Profile Services)

Technical Operations Analyst – Profile Services

Jun 2017 – Jul 2021
🏨 Hospitality
  • Supported data migration and system integration across legacy platforms during the Marriott, Starwood, and Ritz-Carlton systems merger.
  • Transitioned team from manual status-checking to alert-driven monitoring by redesigning operational workflows and building dashboards for early system degradation detection during high-traffic release windows.
  • Onboarded and trained new resources on application architecture, internal support processes, and monitoring tools.
AppDynamics Sumo Logic Data Migration Training

What I Bring

🚨

Incident Management

Full P1/P2 lifecycle ownership — from rapid triage and war-room coordination to structured post-mortems that prevent recurrence. MTTR reduction is my measurable output.

  • P1/P2 Triage & Escalation
  • War-room Coordination
  • Post-mortem Facilitation
  • MTTR Tracking
📊

Observability Design

Build dashboards and alert frameworks that shift teams from reactive fire-fighting to proactive operations — using Dynatrace, Splunk, AppDynamics, and Sumo Logic.

  • Custom Dashboard Design
  • Threshold-based Alerting
  • Synthetic Monitoring Setup
  • APM Configuration
⚙️

Reliability Engineering

SLA governance, production availability management, and structured reliability improvement for Java microservice environments at scale.

  • SLA/SLO Governance
  • Production Availability
  • Microservice Stability
  • AWS Foundational
🔄

Operations Process Design

Transform legacy manual workflows into alert-driven, automated operational processes — reducing toil, eliminating human error, and improving team scalability.

  • Workflow Redesign
  • Toil Reduction
  • Release Planning
  • Change Management
🎓

Team Training & Mentoring

Ramp up junior engineers on APM tools, incident triage, and ops best practices. I've trained teams across Dynatrace, Splunk, and log analysis across multiple projects.

  • APM Tool Onboarding
  • Incident Triage Training
  • Knowledge Transfer
  • Process Documentation
🤝

Cross-functional Coordination

Bridge dev, ops, and business teams during critical events — release windows, platform migrations, and partner integrations — ensuring alignment and zero unplanned downtime.

  • Dev/Ops Bridging
  • Partner Integration
  • Platform Migrations
  • Deployment Windows

Let's Connect

Looking for a production-tested operations engineer ready to grow into SRE?
Let's talk.

Open to SRE / Platform Engineering / Senior Ops roles
📨

Send Me a Message

Click below to open your mail client with my address pre-filled — I read every message and reply within 24 hours.

Email Me