By Ajibola Aiyedogbon
Server Monitoring
(scaling while bootstrapped)
About me
Co-founder Amebo App
Mobile Developer (Jobberman, GTBank WP, etc)
DevOps enthusiast
Before before...
1 server for everything
-1 users
J2ME only
What throughput!
Cloudinary as CDN
Deployment fails
High costs, ignorance is very expensive
Now
5+ servers
Hundreds of thousands of users
Multi Platform apps
18000 req/min throughput
Cloudflare as CDN
Deployments with zero downtime
Managed costs
Scaling rhymes with Failing!
Server Stack
Server Stack
1 load balancer (layer 4, high availability, failover*)
3 web servers (vertically & horizontally scalable)
1 database server (replication*, redundancy*)
1 staging server
$65 monthly serving over 100 million requests
Cloudflare secret weapon, caches static requests (70%).
Technology Stack
Technology Stack
Haproxy (load balancer)
Nginx, Php-fpm (web server, php interpreter)
Phalcon, Php-Resque (framework, scheduler)
Redis, MongoDB, MariaDB (in-memory cache, datastores)
Git (BitBucket), Packer, Ansible (server provisioning, code provisioning)
SetCronJob, CloudFlare, Fastly (3rd party)
Why Iaas not Paas?
All about the pricing page!
Bandwidth costs too high
Code optimizations are hidden
behind computing power
Mission critical? Offload to PaaS
selectively, e.g. Parse EOL, death by
acquisition...
Why Monitor?
Don’t end up like these guys...
Why monitor?
Get Visibility
Improve usability & stability
Complicated technology stacks with
hard to trace errors
Mission critical
More sleep!
What to monitor apart from everything?
Server Metrics (infrastructure)
Ram usage, spikes
Bandwidth usage, highs vs lows
CPU usage over time, peak usage
Disk I/O
Open source vs Saas
Free mostly
Server Metrics (services)
Haproxy stats
Nginx Stats
Mysql performance etc
Service *something* status
Application Errors
Catch all exception php
User defined errors
3rd party Library errors
Tech Stack (Application Performance Monitoring)
Request throughput
Resource usage
Service Health
Database monitoring
Infrastructure bottlenecks
Failure Alerts
Code Errors
High level overview with deep dive
Log Tracking
Better way to tail -f
Http stack errors & anomalies
Multiple log files from diff services
Manual tailing is difficult
Get pre configured graphs based on logs
All server traffic is logged, access_log
Client Errors (Mobile)
Client side stack traces post deployment
Valuable version & device insight
Very handy at debug time & post
Catch all errors …. mostly
Memory leaks & stack traces
3rd party library errors or platform errors
Open Source vs Proprietary
Vendor lockin
Community support
DIY vs training
Industry standards & experience
Fault tolerance
Enterprise customer experience
3rd Party vs Native monitoring tools
Core business?
Pricing again!
Support lifecycle and responsiveness
Product version, beta or 5.0?
Dashboard simplicity
Security implications? firewalled?
https? localhost only? Install certs?
Too many alerts….!
What now?
Congratulations, you reward is more work!
Customize alerts
Fix errors
Webhooks
Send to slack
Ignore at own risk
Be like this guy….or not!
Graphs on graphs on graphs on graphs
Information overload is real
Customize dashboard
Overviews only
Deep dive early to be familiar with dashboard
What Next?
Setup BugSnag
Conclusion
Why Monitor
What to Monitor
How to monitor
Pricing
Dashboards
Discuss your stack with peers
Thank You
@Ajibz

Server Monitoring (Scaling while bootstrapped)

  • 1.
    By Ajibola Aiyedogbon ServerMonitoring (scaling while bootstrapped)
  • 2.
    About me Co-founder AmeboApp Mobile Developer (Jobberman, GTBank WP, etc) DevOps enthusiast
  • 3.
    Before before... 1 serverfor everything -1 users J2ME only What throughput! Cloudinary as CDN Deployment fails High costs, ignorance is very expensive
  • 4.
    Now 5+ servers Hundreds ofthousands of users Multi Platform apps 18000 req/min throughput Cloudflare as CDN Deployments with zero downtime Managed costs
  • 5.
  • 6.
  • 7.
    Server Stack 1 loadbalancer (layer 4, high availability, failover*) 3 web servers (vertically & horizontally scalable) 1 database server (replication*, redundancy*) 1 staging server $65 monthly serving over 100 million requests Cloudflare secret weapon, caches static requests (70%).
  • 8.
  • 9.
    Technology Stack Haproxy (loadbalancer) Nginx, Php-fpm (web server, php interpreter) Phalcon, Php-Resque (framework, scheduler) Redis, MongoDB, MariaDB (in-memory cache, datastores) Git (BitBucket), Packer, Ansible (server provisioning, code provisioning) SetCronJob, CloudFlare, Fastly (3rd party)
  • 10.
    Why Iaas notPaas? All about the pricing page! Bandwidth costs too high Code optimizations are hidden behind computing power Mission critical? Offload to PaaS selectively, e.g. Parse EOL, death by acquisition...
  • 11.
  • 12.
    Don’t end uplike these guys...
  • 13.
    Why monitor? Get Visibility Improveusability & stability Complicated technology stacks with hard to trace errors Mission critical More sleep!
  • 14.
    What to monitorapart from everything?
  • 15.
    Server Metrics (infrastructure) Ramusage, spikes Bandwidth usage, highs vs lows CPU usage over time, peak usage Disk I/O Open source vs Saas Free mostly
  • 18.
    Server Metrics (services) Haproxystats Nginx Stats Mysql performance etc Service *something* status
  • 19.
    Application Errors Catch allexception php User defined errors 3rd party Library errors
  • 20.
    Tech Stack (ApplicationPerformance Monitoring) Request throughput Resource usage Service Health Database monitoring Infrastructure bottlenecks Failure Alerts Code Errors High level overview with deep dive
  • 21.
    Log Tracking Better wayto tail -f Http stack errors & anomalies Multiple log files from diff services Manual tailing is difficult Get pre configured graphs based on logs All server traffic is logged, access_log
  • 24.
    Client Errors (Mobile) Clientside stack traces post deployment Valuable version & device insight Very handy at debug time & post Catch all errors …. mostly Memory leaks & stack traces 3rd party library errors or platform errors
  • 26.
    Open Source vsProprietary Vendor lockin Community support DIY vs training Industry standards & experience Fault tolerance Enterprise customer experience
  • 27.
    3rd Party vsNative monitoring tools Core business? Pricing again! Support lifecycle and responsiveness Product version, beta or 5.0? Dashboard simplicity Security implications? firewalled? https? localhost only? Install certs?
  • 28.
  • 29.
    What now? Congratulations, youreward is more work! Customize alerts Fix errors Webhooks Send to slack Ignore at own risk
  • 30.
    Be like thisguy….or not!
  • 31.
    Graphs on graphson graphs on graphs Information overload is real Customize dashboard Overviews only Deep dive early to be familiar with dashboard
  • 32.
  • 36.
    Conclusion Why Monitor What toMonitor How to monitor Pricing Dashboards Discuss your stack with peers
  • 37.