Summary
Platform engineer with 12+ years of cloud experience and 14+ years in tech, specializing in AWS infrastructure, cloud cost optimization, and AI-powered operational tooling.
Track record of delivering measurable cost savings (>$100K/yr), architecting multi-agent AI systems used by 100+ engineers weekly, and building cross-company observability infrastructure. AWS/CKA-certified with expertise in Kubernetes, Python, Terraform, and data-driven infrastructure management.
Experience
- Built and maintain CloudOps Agent, an AI-powered operations platform on EKS using the Strands Agents SDK with 30+ subagents connected via the A2A protocol. Used by 100+ engineers weekly, handling 1,300+ queries per week with full observability and traceability via OpenTelemetry. Adopted as the standard agent pattern across multiple teams and group companies
- Architected the AWS infrastructure for a multi-tenant observability platform serving several business units. Built on VictoriaMetrics, OpenTelemetry, ClickHouse, and Grafana running on dedicated multi-region EKS clusters
- Reduced cloud spend by over $100K/yr within the first year across 100+ AWS accounts spanning multiple organizations. Built Savings Plans and Reserved Instance dashboards with renewal alerts, CUDOS reporting, and cost anomaly detection integrated directly into the CloudOps Agent
- Orchestrated mission-critical workload migration from ECS Fargate to EKS with zero downtime, increasing system reliability while preserving all customer functionality
- Reduced AWS infrastructure costs by over 50% within first year through strategic resource optimization, Savings Plans purchasing, and architecture refinements
- Strengthened application reliability by implementing Helm-based deployments with robust rollback capabilities, streamlining CI/CD pipelines, and enhancing observability through Prometheus, Grafana and distributed tracing with Datadog
- Implemented comprehensive cost observability including Amazon CUDOS and custom pipelines into Holistics, providing leadership with cost visibility dashboards, anomaly detection, and proactive budget alerts
- Provided architectural and practical guidance to cloud engineering and software development teams to improve resiliency, efficiency, performance, and costs
- Assisted enterprise customers in the formulation and improvement of overall workload observability posture and reporting on service level objectives
- Assisted customers in capacity planning and management for AWS hosting based on E2E user flow profiles
- Boosted enterprise cloud operations team velocity by escalating support cases while simultaneously helping troubleshoot and optimize services in AWS
- Engineered an internal automation platform using PHP7 (Laravel), JavaScript, and MySQL that reduced manual processes across various departments
- Designed and implemented CI/CD pipelines with Atlassian Bamboo, increasing deployment frequency while reducing errors
- Administered self-hosted Atlassian ecosystem (Jira, Confluence, Bamboo, Bitbucket, HipChat) on VMware infrastructure, including server maintenance, application upgrades, and MySQL database optimization
- Performed international system integration for C4ISR environments in Saudi Arabia, completing deployments 15% ahead of schedule by helping develop automation scripts that reduced system provisioning time from days to hours
- Earned achievement award for completing System Level Use Case 200+ hours under budget while exceeding all customer requirements
- Implemented and managed modern CI/CD pipelines with Jenkins and Git, replacing legacy ClearCase systems and reducing build times and deployment complexity
- Provided comprehensive IT support including desktop troubleshooting, SOP creation, SharePoint optimization, OS deployments, and hardware management
- Managed end-to-end support for diverse technology stack including desktops, printers, telecom equipment, servers, networks, surveillance systems, and mobile devices across multiple facilities
- Enhanced infrastructure security and performance through server health monitoring, trend analysis, and strategic system upgrades while collaborating with the infrastructure team
- Assisted in the daily operations of the Information Systems department working both independently and in cohesive teams
- Took on several large projects including the installation of roughly 30 wireless access points, hard wiring a newly constructed building with category 6 cables, and the installation of server hardware and software
- Provided an additional communication channel between the IS manager and other department staff working closely with the CIO
Skills & Proficiencies
Proficient
Amazon Web Services (AWS), Google Cloud Platform (GCP), Cost Engineering, Kubernetes, Python, Infrastructure as Code (Terraform, CDK, CloudFormation), AI/ML Agent Systems (Strands SDK, A2A Protocol, Bedrock), Observability (VictoriaMetrics, Grafana, ClickHouse, OpenTelemetry, Prometheus), DevOps, Linux, SQL (Postgres, MySQL, Athena), CI/CD (GitHub Actions, Jenkins, ArgoCD)
Projects
This project attempts to cut through AWS announcement noise by analyzing actual AWS usage through Cost Explorer data, fetching recent AWS announcements, and using Amazon Bedrock (Default: Amazon Nova Lite) to determine which announcements are relevant to your services. Notifications can be viewed in the CLI or sent directly to a Slack channel.
The k8s-autoscaler-benchmarker can be a useful tool for administrators and developers looking to optimize the scaling capabilities of their EKS clusters. The tool offers a streamlined process for benchmarking the performance of Karpenter and Cluster Autoscaler for EKS workloads.
This resume site deployed on CloudFront using the AWS CDK. Entirely serverless, leveraging S3, CloudFront with Origin Access Control, ACM, and Route53. Replicated worldwide via CloudFront's globally distributed edge networks.