Amazon CloudWatch: Metrics, Logs, and Alarms
When operating systems on AWS, monitoring the state of your resources and detecting anomalies early is essential. Amazon CloudWatch is a managed service that centralizes monitoring and observability (the ability to understand a system's internal state from the outside) for AWS resources and applications. This article covers CloudWatch's core concepts, how metrics, logs, and alarms work, and basic operations using the AWS CLI.
What is Amazon CloudWatch
CloudWatch is a fully managed monitoring and observability service from AWS. It automatically collects metrics from AWS resources such as EC2 instances, RDS, and Lambda, and provides a wide range of monitoring capabilities including dashboards, notifications, and log management.
With CloudWatch, you can monitor your system's health in real time, receive alarm notifications when issues occur, and analyze logs to identify root causes.
What is Amazon CloudWatch? - Amazon CloudWatch
Monitor your AWS resources and applications using Amazon CloudWatch to collect and track metrics on ...
Key Features
CloudWatch is made up of four main capabilities.
| Feature | Overview |
|---|---|
| Metrics | Collects and displays performance data such as CPU usage and network throughput as a time series |
| Logs | Collects, stores, and searches logs output by applications and AWS services |
| Alarms | Triggers notifications or automated actions when a metric exceeds a threshold |
| Dashboards | Visualizes multiple metrics and logs on a single screen |
By combining these features, you can build comprehensive monitoring across both infrastructure and application layers.
Metrics
Metrics are the core of CloudWatch. They store performance data from AWS services as time series and visualize it as graphs.
AWS services like EC2 and RDS automatically send standard metrics. If you want to send your own application-specific data, you can publish custom metrics with any value to CloudWatch.
Metrics are grouped by namespace — AWS service metrics fall under namespaces like AWS/EC2 or AWS/RDS. Each metric is identified by key-value pairs called dimensions. For example, EC2 metrics use the instance ID as a dimension.
| Item | Details |
|---|---|
| Resolution | AWS services default to 5-minute intervals (1-minute with detailed monitoring enabled, 1-second for high-resolution metrics) |
| Retention | Stored for 3 hours to 15 months depending on resolution |
| Free tier | Up to 10 custom metrics and 1 million API requests per month at no charge |
Metrics in Amazon CloudWatch - Amazon CloudWatch
View, graph, and publish data about the performance of your systems.
CloudWatch Logs
CloudWatch Logs is a feature for centralizing the management of logs output by applications and AWS services.
Logs are organized into units called log groups, each containing multiple log streams. For example, Lambda functions create one log group per function, and a new log stream is added for each invocation.
To search and analyze logs, you use CloudWatch Logs Insights — a query feature with SQL-like syntax for filtering and aggregating log data. For instance, you can aggregate error log counts over time or filter logs by a specific request ID.
What is Amazon CloudWatch Logs? - Amazon CloudWatch Logs
Describes the fundamentals, concepts, and terminology you need to know for using CloudWatch Logs to ...
Alarms
Alarms continuously monitor metric values and automatically trigger notifications or actions when a configured condition is met.
When configuring an alarm, the two most important settings are the threshold and the evaluation period. The threshold is the boundary value at which the alarm fires, and the evaluation period specifies how many consecutive times the threshold must be exceeded before the alarm state is triggered. If you don't want to fire on temporary spikes, set the evaluation period to multiple consecutive readings.
An alarm has three states and transitions from OK to ALARM when its condition is met.
| State | Meaning |
|---|---|
| OK | The metric is within the configured threshold |
| ALARM | The metric has exceeded the threshold and the alarm condition is met |
| INSUFFICIENT_DATA | Not enough data to determine the alarm state |
When an alarm enters the ALARM state, you can send notifications to email or Slack via Amazon SNS topics, or trigger actions such as stopping or restarting an EC2 instance.
Using Amazon CloudWatch alarms - Amazon CloudWatch
Create a CloudWatch alarm that sends an Amazon SNS message or performs an action when the alarm chan...
Trying CloudWatch with the AWS CLI
Let's use the AWS CLI to explore metrics and practice creating and deleting alarms.
Listing Metrics
List CPU utilization metrics for an EC2 instance. Use --namespace to specify the target namespace and --metric-name to specify the metric name.
❯ aws cloudwatch list-metrics \
--namespace AWS/EC2 \
--metric-name CPUUtilization
{
"Metrics": [
{
"Namespace": "AWS/EC2",
"MetricName": "CPUUtilization",
"Dimensions": [
{
"Name": "InstanceId",
"Value": "i-0a1b2c3d4e5f67890"
}
]
}
]
}
Getting Metric Statistics
Fetch the average CPU utilization for the past hour in 5-minute intervals. Use --start-time and --end-time to define the time range, and --period to set the aggregation interval in seconds.
❯ aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-0a1b2c3d4e5f67890 \
--start-time 2026-04-25T01:30:00Z \
--end-time 2026-04-25T02:30:00Z \
--period 300 \
--statistics Average
{
"Label": "CPUUtilization",
"Datapoints": [
{
"Timestamp": "2026-04-25T11:00:00+09:00",
"Average": 0.2966666666666667,
"Unit": "Percent"
},
{
"Timestamp": "2026-04-25T10:55:00+09:00",
"Average": 0.2502405052421769,
"Unit": "Percent"
}
]
}
Creating an Alarm
Configure an alarm to trigger when CPU utilization reaches 80%. By specifying an SNS topic ARN with --alarm-actions, you can receive notifications.
❯ aws cloudwatch put-metric-alarm \
--alarm-name high-cpu-alarm \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--dimensions Name=InstanceId,Value=i-0a1b2c3d4e5f67890 \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold 80 \
--comparison-operator GreaterThanOrEqualToThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789012:exrecord-topic
Checking Alarm State
Check the current state of the alarm you created. A StateValue of OK means the metric is within normal range. ALARM means the threshold has been exceeded.
Output
❯ aws cloudwatch describe-alarms --alarm-names high-cpu-alarm
{
"MetricAlarms": [
{
"AlarmName": "high-cpu-alarm",
"AlarmArn": "arn:aws:cloudwatch:us-east-1:123456789012:alarm:high-cpu-alarm",
"AlarmConfigurationUpdatedTimestamp": "2026-04-25T11:13:07.691000+09:00",
"ActionsEnabled": true,
"OKActions": [],
"AlarmActions": [
"arn:aws:sns:us-east-1:123456789012:exrecord-topic"
],
"InsufficientDataActions": [],
"StateValue": "OK",
"StateReason": "Threshold Crossed: 2 datapoints [0.3124812407599757 (25/04/26 02:09:00), 0.3033163083601058 (25/04/26 02:04:00)] were not greater than or equal to the threshold (80.0).",
"StateReasonData": "{\"version\":\"1.0\",\"queryDate\":\"2026-04-25T02:14:13.262+0000\",\"startDate\":\"2026-04-25T02:04:00.000+0000\",\"statistic\":\"Average\",\"period\":300,\"recentDatapoints\":[0.3033163083601058,0.3124812407599757],\"threshold\":80.0,\"evaluatedDatapoints\":[{\"timestamp\":\"2026-04-25T02:09:00.000+0000\",\"sampleCount\":4.0,\"value\":0.3124812407599757}]}",
"StateUpdatedTimestamp": "2026-04-25T11:14:13.263000+09:00",
"MetricName": "CPUUtilization",
"Namespace": "AWS/EC2",
"Statistic": "Average",
"Dimensions": [
{
"Name": "InstanceId",
"Value": "i-0a1b2c3d4e5f67890"
}
],
"Period": 300,
"EvaluationPeriods": 2,
"Threshold": 80.0,
"ComparisonOperator": "GreaterThanOrEqualToThreshold",
"StateTransitionedTimestamp": "2026-04-25T11:14:13.263000+09:00"
}
],
"CompositeAlarms": []
}
Deleting an Alarm
Delete an alarm that's no longer needed. Deleting an alarm does not affect the underlying metric data — it remains intact.
❯ aws cloudwatch delete-alarms --alarm-names high-cpu-alarm
Summary
- Amazon CloudWatch is a managed service that handles monitoring and observability for both AWS resources and applications
- Metrics let you collect and visualize performance data like CPU and network usage as time series
- CloudWatch Logs centralizes log management for applications and AWS services, with Logs Insights for query-based analysis
- Alarms monitor metric thresholds and automatically execute actions like SNS notifications or EC2 operations when conditions are met