πŸ’‘ Introduction to Observability

Md. Ashraf Bhuiya
3 min readOct 12, 2024

--

  • Observability is the ability to understand the internal state of a system by analyzing the data it produces, including logs, metrics, and traces.
  • Monitoring(Metrics): involves tracking system metrics like CPU usage, memory usage, and network performance. Provides alerts based on predefined thresholds and conditions
  • Monitoring tells us what is happening.
  • Logging(Logs): involves the collection of log data from various components of a system.
  • Logging explains why it is happening.
  • Tracing(Traces): involves tracking the flow of a request or transaction as it moves through different services and components within a system.
  • Tracing shows how it is happening.

Why Monitoring?

  • Monitoring helps us keep an eye on our systems to ensure they are working properly.
  • Purpose: maintaining the health, performance, and security of IT environments.
  • It enables early detection of issues, ensuring that they can be addressed before causing significant downtime or data loss.
  • We use monitoring to:
  • Detect Problems Early
  • Measure Performance:
  • Ensure Availability

Why Observability?

  • Observability helps us understand why our systems are behaving the way they are.
  • It’s like having a detailed map and tools to explore and diagnose issues.
  • We use observability to:
  • Diagnose Issues:
  • Understand Behavior:
  • Improve Systems:

πŸ†š What is the Exact Difference Between Monitoring and Observability?

  • πŸ”₯ Monitoring is the when and what of a system error, and observability is the why and how

πŸ”­ Does Observability Cover Monitoring?

  • Yes!! Monitoring is a subset of Observability
  • Observability is a broader concept that includes monitoring as one of its components.
  • monitoring focuses on tracking specific metrics and alerting on predefined conditions
  • observability provides a comprehensive understanding of the system by collecting and analyzing a wider range of data, including logs, metrics, and traces

πŸ–₯️ What Can Be Monitored?

  • Infrastructure: CPU usage, memory usage, disk I/O, network traffic.
  • Applications: Response times, error rates, throughput.
  • Databases: Query performance, connection pool usage, transaction rates.
  • Network: Latency, packet loss, bandwidth usage.
  • Security: Unauthorized access attempts, vulnerability scans, firewall logs.

πŸ‘€ What Can Be Observed?

  • Logs: Detailed records of events and transactions within the system.
  • Metrics: Quantitative data points like CPU load, memory consumption, and request counts.
  • Traces: Data that shows the flow of requests through various services and components.

πŸ†š Monitoring on Bare-Metal Servers vs. Monitoring Kubernetes

  • Bare-Metal Servers:
  • Direct Access: Easier access to hardware metrics and logs.
  • Fewer Layers: Simpler environment with fewer abstraction layers.
  • Kubernetes:
  • Dynamic Environment: Challenges with monitoring ephemeral containers and dynamic scaling.
  • Distributed Nature: Requires tools that can handle distributed systems and correlate data from multiple sources.

πŸ†š Observing on Bare-Metal Servers vs. Observing Kubernetes

  • Bare-Metal Servers:
  • Simpler Observability: Easier to collect and correlate logs, metrics, and traces due to fewer components and layers.
  • Kubernetes:
  • Complex Observability: Requires sophisticated tools to handle the dynamic and distributed nature of containers and microservices.
  • Integration: Necessitates the integration of multiple observability tools to get a complete picture of the system.

βš’οΈ What are the Tools Available?

  • Monitoring Tools: Prometheus, Grafana, Nagios, Zabbix, PRTG.
  • Observability Tools: ELK Stack (Elasticsearch, Logstash, Kibana), EFK Stack (Elasticsearch, FluentBit, Kibana) Splunk, Jaeger, Zipkin, New Relic, Dynatrace, Datadog.

--

--

Md. Ashraf Bhuiya
Md. Ashraf Bhuiya

No responses yet