Cloud Infrastructure Maintenance

Explore top LinkedIn content from expert professionals.

  • View profile for Ajit Inamdar

    CTO & Co-Founder, Ampity | Featured in ET

    11,317 followers

    After working with Kubernetes and EKS for more than 4 years, here are a few pointers that can help you ace the microservices game (by no means am I an expert in all these areas): 1. Optimize your Docker images by using techniques like:   ➜ Multi-stage Docker builds   ➜ Tools such as Docker Slim, and create distress images to ensure lightweight, efficient containers. 2. Get familiar with using ingress controllers like the AWS ALB Ingress Controller. 3. Use Helm for deployments instead of traditional Kubernetes manifest files. 4. Learn how to observe your infrastructure and applications using tools like:   ➜ Dynatrace   ➜ Datadog. 5. Gain experience in upgrading your EKS clusters and related components, including add-ons. 6. Master different deployment strategies such as:   ➜ Blue-green deployments   ➜ Canary deployments   These approaches can help you deploy updates with zero downtime. 7. Learn about service discovery and service mesh tools like Istio. Understand the problems it solves and why it’s necessary. 8. Get acquainted with the Horizontal Pod Autoscaler (HPA) and cluster autoscalers like Karpenter to scale your EKS and Kubernetes components. 9. Learn cost optimization techniques such as:   ➜ Right-sizing your worker nodes   ➜ Setting service quotas   ➜ Managing CPU and memory allocations in your pods. 10. Understand security best practices in EKS:   ➜ Use Secrets Manager, KMS, and IRSA   ➜ Access private databases securely   ➜ Secure images and containers with tools like Twistlock   ➜ Manage users, create private clusters, encrypt data   ➜ Securely deploy containers from private repositories like ECR, JFrog, or RedHat Quay. 11. Use ArgoCD for GitOps to manage your Kubernetes applications declaratively. 12. Implement topology spread constraints to ensure high availability and resilience in your applications. Also, use readiness and liveness probes to monitor application health and ensure smooth operation. 13. Deploy your applications using CI/CD tools like GitHub Actions or Jenkins. 14. Understand when and why to use StatefulSets in Kubernetes for managing stateful applications. 15. Finally, learn how to manage multiple environments (development, UAT, production) effectively, considering cost and unique customer use cases. Please let me know if I have missed anything in the comments! #kubernetes #eks #bestpractices

  • View profile for Hijmen Fokker

    A smarter way to run Kubernetes for non-enterprise companies | Pionative

    8,775 followers

    I’ve spent 7 years obsessing over the perfect Kubernetes Stack. These are the best-practices I would recommend as a basis for every Kubernetes cluster. 1. Implement an Observability stack A monitoring stack prevents downtime and helps with troubleshooting. Best-practices: - Implement a Centralised logging solution like Loki. Logs will otherwise disappear, and it makes it easier to troubleshoot. - Use a central monitoring stack with pre-built dashboards, metrics and alerts. - For microservices architectures, implement tracing (e.g. Grafana Tempo). This gives better visibility in your traffic flows. 2. Setup a good Network foundation Networking in Kubernetes is abstracted away, so developers don't need to worry about it. Best practices: - Implement Cilium + Hubble for increased security, performance and observability - Setup a centralised Ingress Controller (like Nginx Ingress). This takes care of all incoming HTTP traffic in the cluster. - Auto-encrypt all traffic on the network-layer using cert-manager. 3. Secure your clusters Kubernetes is not secure by default. Securing your production cluster is one of the most important things for production. Best practices: - Regularly patch your Nodes, but also your containers. This mitigates most vulnerabilities - Scan for vulnerabilities in your cluster. Send alerts when critical vulnerabilities are introduced. - Implement a good secret management solution in your cluster like External Secrets. 4. Use a GitOps Deployment Strategy All Desired State should be in Git. This is the best way to deploy to Kubernetes. ArgoCD is truly open-source and has a fantastic UI. Best practices: - Implement the app-of-apps pattern. This simplifies the creation of new apps in ArgoCD. - Use ArgoCD Autosync. Don’t rely on sync buttons. This makes GIT your single-source-of-truth. 5. Data Try to use managed (cloud) databases if possible. This makes data management a lot easier. If you want to run databases on Kubernetes, make sure you know what you are doing! Best practices - Use databases that are scalable and can handle sudden redeployments - Setup a backup, restore and disaster-recovery strategy. And regularly test it! - Actively monitor your databases and persistent volumes - Use Kubernetes Operators as much as possible for management of these databases Are you implementing Kubernetes, or do you think your architecture needs improvement? Send me a message, I'd love to help you out! #kubernetes #devops #cloud

  • View profile for Vivek Anandaraman

    SRE/FinOps as a Service | Devops Community | Mentor | Speaker

    9,700 followers

    Your EC2 instances are running wild at 3 AM. Here's how I cut our AWS bill by 63% without disrupting prod 👀 Last month, I discovered our team was burning through AWS credits faster than expected. The culprit? Development instances running 24/7 when our team only works 8 hours a day. Here's what I implemented: 1. Created an instance scheduler using AWS Lambda + EventBridge 2. Tagged all non-prod instances with 'AutoStop: true' 3. Set up start/stop times aligned with our global team's working hours 4. Added override protection for critical testing periods The results were immediate: 1. Monthly EC2 costs dropped from $8,500 to $3,145 2. Dev environment uptime matched actual usage patterns 3. Zero impact on production workloads 4. Automated Slack notifications for any manual overrides Pro tip: Don't just stop instances. Also check for: 1. Orphaned EBS volumes 2. Unused Elastic IPs 3. Over-provisioned RDS instances Bonus: I created a simple AWS Lambda function that checks for resources without cost allocation tags and sends daily reports. Caught $950 worth of untagged resources in the first week! Want the CloudFormation template for this setup? Drop a comment below, and I'll share the GitHub repo. #AWS #CloudCost #DevOps #CloudComputing #AWSCommunity

  • View profile for Danny Steenman

    Helping startups build faster on AWS while controlling costs, security, and compliance | Founder @ Towards the Cloud

    10,803 followers

    I recently completed a client's AWS infrastructure audit. The issues that uncovered are surprisingly common. Here's what I found: 𝟭. 𝗨𝗻𝗲𝗻𝗰𝗿𝘆𝗽𝘁𝗲𝗱 𝗘𝗕𝗦 𝗩𝗼𝗹𝘂𝗺𝗲𝘀   Data at rest was not encrypted, posing a significant security risk. 𝟮. 𝗖𝗹𝗼𝘂𝗱𝗧𝗿𝗮𝗶𝗹 𝗗𝗶𝘀𝗮𝗯𝗹𝗲𝗱   The account lacked crucial audit logs, limiting visibility into account activities. 𝟯. 𝗣𝘂𝗯𝗹𝗶𝗰 𝗦𝟯 𝗕𝘂𝗰𝗸𝗲𝘁𝘀   Several S3 buckets were publicly accessible, potentially exposing sensitive data. 𝟰. 𝗦𝗦𝗛 (𝗣𝗼𝗿𝘁 𝟮𝟮) 𝗢𝗽𝗲𝗻 𝘁𝗼 𝘁𝗵𝗲 𝗪𝗼𝗿𝗹𝗱   Unrestricted SSH access increased the attack surface unnecessarily. 𝟱. 𝗩𝗣𝗖 𝗙𝗹𝗼𝘄 𝗟𝗼𝗴𝘀 𝗗𝗶𝘀𝗮𝗯𝗹𝗲𝗱   Network traffic insights were missing, hampering security analysis capabilities. 𝟲. 𝗗𝗲𝗳𝗮𝘂𝗹𝘁 𝗩𝗣𝗖 𝗦𝘁𝗶𝗹𝗹 𝗶𝗻 𝗨𝘀𝗲   The default VPC was being used, often lacking proper segmentation and security controls. These findings aren't unusual. Many organizations, from startups to enterprises, overlook these aspects of AWS security and best practices. That's why doing regular AWS account audits are crucial. They help identify potential vulnerabilities before they become problems. 𝗞𝗲𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀 𝗮𝗻𝗱 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝘀: 1. Encrypt data at rest: Enable default EBS encryption at the account level. 2. Implement comprehensive logging: Enable CloudTrail across all regions and set up alerts. 3. Restrict public access: Use S3 Block Public Access at the account level and audit existing buckets. 4. Use modern, secure access methods: Implement AWS Systems Manager Session Manager instead of open SSH. 5. Enable network monitoring: Turn on VPC Flow Logs and set up automated analysis. 6. Design your network architecture intentionally: Create custom VPCs with proper security controls. By addressing these common issues, you significantly enhance your AWS security posture. It's not about perfection, but continuous improvement. When's the last time you audited your AWS environment?

  • View profile for Govardhana Miriyala Kannaiah

    I help businesses with Digital & Cloud Transformation Consulting | Runs Job Surface helping job seekers find hidden DevOps & Cloud roles | 59,000+ read my Practical DevOps & Cloud newsletter

    135,765 followers

    If I were starting on AWS today, I would do this as a beginner using the 12 Month Free Tier to stay within AWS Free Tier limits and avoid billing shocks. 1) 𝗞𝗻𝗼𝘄 𝗪𝗵𝗮𝘁’𝘀 𝗔𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗙𝗿𝗲𝗲 12 month Free Tier includes: - 750 hours/month of t2.micro or t3.micro - 5 GB S3 storage - 750 hours of RDS db.t2.micro so on… 2) 𝗦𝗲𝘁 𝗮 𝗕𝗶𝗹𝗹𝗶𝗻𝗴 𝗔𝗹𝗮𝗿𝗺 𝗼𝗻 𝗗𝗮𝘆 1 - Go to Billing → Budgets → Create Budget - Set an alert at $1 or $5 to stay notified early 3) 𝗠𝗼𝗻𝗶𝘁𝗼𝗿 𝘄𝗶𝘁𝗵 𝗖𝗼𝘀𝘁 𝗘𝘅𝗽𝗹𝗼𝗿𝗲𝗿 - Use filters by service or region - Spot growing usage before it crosses Free Tier limits 4) 𝗧𝗮𝗴 𝗔𝗹𝗹 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 Use tags like FreeTierTest or CleanupLater, makes cleanup easier and usage visible 5) 𝗖𝗹𝗲𝗮𝗻 𝗨𝗽 𝗔𝗰𝘁𝗶𝘃𝗲𝗹𝘆 Even stopped resources can be billed: - EC2 stopped = OK - EBS volumes = Charged until deleted - RDS = Charged for storage even if paused so on., 6) 𝗥𝗲𝘀𝘁𝗿𝗶𝗰𝘁 𝗪𝗵𝗮𝘁 𝗖𝗮𝗻 𝗕𝗲 𝗟𝗮𝘂𝗻𝗰𝗵𝗲𝗱 - Use IAM to allow only Free Tier services - Block things like NAT Gateway and large EC2 types If you're starting today, save this list. An hour setup can save you from a five figure mistake. 45K+ read my free newsletter: https://lnkd.in/gg3RQsRK What do we cover: DevSecOps, Cloud, Kubernetes, IaC, GitOps, MLOps 🔁 Consider a Repost if this is helpful

  • View profile for Vikash Kumar

    Senior Platform Engineer | DevOps Architect | Specializing in Multi-Cloud, AI/ML & Kubernetes | Mentor & Tech Content Creator

    8,273 followers

    𝐘𝐨𝐮𝐫 𝐄𝐂2 𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞𝐬 𝐚𝐫𝐞 𝐫𝐮𝐧𝐧𝐢𝐧𝐠 𝐰𝐢𝐥𝐝 𝐚𝐭 3 𝐀𝐌. 𝐇𝐞𝐫𝐞'𝐬 𝐡𝐨𝐰 𝐈 𝐜𝐮𝐭 𝐨𝐮𝐫 𝐀𝐖𝐒 𝐛𝐢𝐥𝐥 𝐛𝐲 63% 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐝𝐢𝐬𝐫𝐮𝐩𝐭𝐢𝐧𝐠 𝐩𝐫𝐨𝐝 👀 Last month, I discovered our team was burning through AWS credits faster than expected. The culprit? Development instances running 24/7 when our team only works 8 hours a day. 𝐇𝐞𝐫𝐞'𝐬 𝐰𝐡𝐚𝐭 𝐈 𝐢𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐞𝐝: 1️⃣. Created an instance scheduler using AWS Lambda + EventBridge 2️⃣. Tagged all non-prod instances with 'AutoStop: true' 3️⃣. Set up start/stop times aligned with our global team's working hours 4️⃣. Added override protection for critical testing periods 𝐓𝐡𝐞 𝐫𝐞𝐬𝐮𝐥𝐭𝐬 𝐰𝐞𝐫𝐞 𝐢𝐦𝐦𝐞𝐝𝐢𝐚𝐭𝐞: 1️⃣. Monthly EC2 costs dropped from $8,500 to $3,145 2️⃣. Dev environment uptime matched actual usage patterns 3️⃣. Zero impact on production workloads 4️⃣. Automated Slack notifications for any manual overrides 𝐏𝐫𝐨 𝐭𝐢𝐩: 𝐃𝐨𝐧'𝐭 𝐣𝐮𝐬𝐭 𝐬𝐭𝐨𝐩 𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞𝐬. 𝐀𝐥𝐬𝐨 𝐜𝐡𝐞𝐜𝐤 𝐟𝐨𝐫: 1️⃣. Orphaned EBS volumes 2️⃣. Unused Elastic IPs 3️⃣. Over-provisioned RDS instances 𝐁𝐨𝐧𝐮𝐬: I created a simple AWS Lambda function that checks for resources without cost allocation tags and sends daily reports. Caught $950 worth of untagged resources in the first week! Want the CloudFormation template for this setup? Drop a comment below, and I'll share the GitHub repo. Credit : Vivek Anandaraman #AWS #CloudCost #DevOps #CloudComputing #AWSCommunity

  • View profile for Brijesh Akbari

    I will reduce your AWS bill by 30% or I’d do it for free | Founder @Signiance

    10,526 followers

    I have used this method on 100+ projects, Now, I am giving it here for free. Battle-tested playbook I’ve used with 100+ teams from startups to enterprise to reduce the AWS bill by 30% No fluff. No fancy dashboards. Just what actually works. Day 1–2: Cost Explorer + Tagging Audit → Open [AWS Cost Explorer] → Enable hourly + resource-level granularity → Filter by service, then by linked accounts → Identify top 3 spend categories (e.g., EC2, S3, Data Transfer) Now tag everything: - `Project` - `Owner` - `Environment` (dev/stage/prod) - `CostCenter` (if needed) Why? Untagged = invisible = unaccountable. Without tags, you’re flying blind. Pro tip: Use AWS Resource Groups to group untagged items. Day 3–4: Right-size Your Compute → Use AWS Compute Optimizer → Check EC2 instances with <20% CPU and Memory over 7–30 days → Consider: - Downgrading (e.g., m5 → t3) - Switching to **Graviton** (ARM-based, 20–40% cheaper) - Moving to **Fargate or Lambda** if infra is idle often Also review: - RDS instances: auto-pause in dev - ECS services: scale down unused services Why? Compute is often 60–70% of your bill. Fix this first. Day 5: Delete Zombie Infra → Use [Trusted Advisor] + [AWS Config] to find: - Orphaned EBS volumes (attached to terminated EC2s) - Idle Load Balancers (no traffic for 14+ days) - Old RDS snapshots (more than 7–14 days old) - Elastic IPs not attached to running instances - Unused S3 buckets storing logs from years ago Set deletion policies where safe. For dev resources, enforce auto-termination tags. Why? These don’t show up in dashboards But quietly drain your budget. Day 6: Set Storage Lifecycle Policies → For S3 buckets: - Archive logs after 30 days (Glacier or Deep Archive) - Delete test files after 90 days - Enable versioning cleanup → For EBS volumes: - Schedule snapshot pruning - Auto-delete unused volumes post-instance termination Why? Storage rarely gets optimized until it explodes. But small tweaks = big gains over time. Day 7: Set Budgets + Alerts → Go to [AWS Budgets] → Create: - Overall budget (with 80%, 90%, 100% thresholds) - Service-specific budgets (e.g., EC2, S3) - Linked account budgets if using Organizations → Set alerts via email or Slack (SNS integration) → Bonus: Add alerts for sudden cost spikes using anomaly detection Why? No alert = no awareness = no action. What happens after 7 days? You’ve got: ✅ Visibility ✅ Ownership ✅ Quick wins ✅ A repeatable process And most teams save 25–40% in the first month alone. We do this for AWS customers all the time. Want me to run this playbook for your infrastructure? DM me “audit” and I’ll spend 30 mins on your AWS account for free. Let’s make your cloud cost-efficient, not chaotic.

  • View profile for BRINE NDAM KETUM

    Lead Cloud Platform Engineer with Hands-on in AWS| Azure | AIOps| VMware |DevOps | DevSecOps | Kubernetes | SRE | Solution Architect| SDLC| Network Security | Flutter Flow| Ansible | Golang| Python I GenAI/ ML | Author

    10,824 followers

    🚀 Kubernetes Best Practices You Can’t Ignore Managing Kubernetes at scale is tough — one wrong step can cause downtime or security risks. I’ve been diving into some battle-tested practices that every engineer should know: 1. Multi-tenancy & Isolation: • Use Namespaces for logical separation of teams/workloads. • Apply RBAC and Azure AD for precise access control. 2. Scheduling & Resource Management: • Enforce resource quotas and Pod Disruption Budgets (PDBs). • Use taints & tolerations to dedicate nodes for critical workloads. 3. Security First: • Scan container images and disable root privileges. • Regularly patch and upgrade Kubernetes clusters. 4. Networking & Storage: • Implement network policies and WAF for traffic security. • Use dynamic provisioning and regular backups for persistent volumes. 5. Enterprise Workloads: • Plan for multi-region deployments with traffic routing and geo-replication. ⸻ 🔔 Follow me for more Kubernetes & DevOps insights. ⸻ #Kubernetes #K8s #CloudNative #DevOps #InfrastructureAsCode #KubernetesBestPractices #AzureKubernetesService #Security #RBAC #Helm #CI_CD #PlatformEngineering #CloudEngineering

  • View profile for Vijay Roy

    Founder | OpsRabbit.io | AI for ITOps | Applied AI Consulting |Product Engineering | AI Agents

    10,454 followers

    Most cloud problems aren’t technical. They’re the result of architecture decisions no one questioned. After auditing 100+ AWS setups, these are the 3 mistakes I see again and again. Let’s break them down 👇 1️⃣ “We’ll fix it later” architecture Rushed MVPs. Stacked services. No documentation. Six months later? Deployments are fragile. Costs are up. DevOps is chaos. ✅ Start with simple, well-tagged infra. ✅ Go serverless where you can. ✅ Leave breadcrumbs for future-you. 2️⃣ Treating cloud like on-prem Big mistake. Old habits don’t work in the cloud. → Oversized EC2s “just in case” → S3 used as a dumping ground → Logs kept forever ✅ Use autoscaling. ✅ Set lifecycle rules. ✅ Use managed services when possible. The cloud rewards smart, not heavy. 3️⃣ No cost visibility If you’re only looking at costs after finance flags it... You’re already in trouble. What I see often: → Untagged resources → Zombie infra → No budgets or alerts ✅ Set up AWS Budgets on Day 1 ✅ Track spend like KPIs ✅ Forecast, don’t guess If your cloud setup feels bloated or unpredictable... You’re not alone. But you don’t need a rebuild. You need a reset — guided by someone who knows what matters. I’ve helped teams save 30–60% without changing a single line of code. Want that? Drop a “review” in the comments or DM me. Let’s clean up the mess. And turn your cloud into a growth engine.

  • View profile for Dwan Bryant

    Sr. DevOps Engineer | Azure DevOps Certified | Empowering Cloud Infrastructure with CI/CD & Automation

    1,608 followers

    🚀 Whether you’re deploying your first Kubernetes cluster or managing production workloads at scale, following best practices can make or break your infrastructure. This visual from DevOpsCube nails the essentials: 🔹 Networking – Collaborate with your networking team on CIDRs, ingress/egress, and proxy setup 🔐 Security – Address CIS benchmarks, pod security, and vulnerability scans with your security team 🧑💼 RBAC – Apply policy as code, use service accounts, and enforce user auditing 📦 High Availability – Focus on pod topology, availability zones, and chaos experiments 🌐 Ingress – Use ingress controllers, enforce SSL/TLS, and consider API gateways 💾 Backup/Restore – Plan etcd backups, disaster recovery, and data migration strategies 🛡 Patching – Patch nodes and containers regularly and run image scans ⬆️ Cluster Upgrades – Test in parallel, upgrade in-place, and validate networking changes 📊 Capacity Planning – Optimize for multiple vs single clusters, stateful workloads, and throughput 📈 Logging & Monitoring – Centralized logging, KPIs, and monitoring are non-negotiable Solid infrastructure is never an accident. It’s engineered with care, cross-team communication, and a clear roadmap.

Explore categories