You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: High CPU Utilization Across Azure Database for PostgreSQL Elastic Clusters
3
-
description: Troubleshoot high CPU utilization across Azure Database for PostgreSQL elastic clusters.
2
+
title: Troubleshoot High CPU Utilization in Elastic Clusters
3
+
description: How to troubleshoot high CPU utilization across Azure Database for PostgreSQL Elastic Clusters.
4
4
author: GayathriPaderla
5
5
ms.author: gapaderla
6
-
ms.reviewer: jaredmeade
7
-
ms.date: 01/28/2026
6
+
ms.reviewer: jaredmeade, maghan
7
+
ms.date: 02/17/2026
8
8
ms.service: azure-database-postgresql
9
9
ms.subservice: performance
10
10
ms.topic: troubleshooting-general
11
11
---
12
12
13
-
# Troubleshoot High CPU Utilization in Azure Database for PostgreSQL Elastic Clusters
13
+
# Troubleshoot high CPU utilization in Azure Database for PostgreSQL Elastic Clusters
14
14
15
15
This article describes how to identify the root cause of high CPU utilization. It also provides possible remedial actions to control CPU utilization when using [Elastic clusters in Azure Database for PostgreSQL](../elastic-clusters/concepts-elastic-clusters.md).
16
16
17
17
In this article, you learn about:
18
18
19
-
- How to use tools like Azure Metrics, pg_stat_statements, citus_stat_activity, and pg_stat_activity to identify high CPU utilization.
20
-
- How to identify root causes, such as long running queries and total connections
21
-
- How to resolve high CPU utilization by using EXPLAIN ANALYZE and vacuuming tables.
19
+
- How to use tools like Azure Metrics, `pg_stat_statements`, `citus_stat_activity`, and `pg_stat_activity` to identify high CPU utilization.
20
+
- How to identify root causes, such as long running queries and total connections.
21
+
- How to resolve high CPU utilization by using `EXPLAIN ANALYZE` and vacuuming tables.
22
22
23
-
## Tools to Identify High CPU Utilization
23
+
## Tools to identify high CPU utilization
24
24
25
-
Consider the use of the following list of tools to identify high CPU utilization:
25
+
Use the following tools to identify high CPU utilization:
26
26
27
27
### Azure Metrics
28
28
29
-
Azure Metrics is a good starting point to check the CPU utilization for a specific period. Metrics provide information about the resources utilized during the period in which you are monitoring. You can use the **Apply splitting** option and **Split by Server Name** to view the details of each individual node in your elastic cluster. You can then compare the performance of **Write IOPs, Read IOPs, Read Throughput Bytes/Sec**, and **Write Throughput Bytes/Sec** with **CPU percent**, to view the performance of individual nodes when you observe your workload consuming high CPU.
29
+
Azure Metrics is a good starting point to check the CPU utilization for a specific period. Metrics provide information about the resources utilized during the period in which you're monitoring. You can use the **Apply splitting** option and **Split by Server Name** to view the details of each individual node in your elastic cluster. You can then compare the performance of **Write IOPs, Read IOPs, Read Throughput Bytes/Sec**, and **Write Throughput Bytes/Sec** with **CPU percent**, to view the performance of individual nodes when you observe your workload consuming high CPU.
30
30
31
-
Once you have identified a particular node (or nodes) with higher than expected CPU utilization, you can connect directly to one more nodes in question and perform a more in-depth analysis using the following Postgres tools:
31
+
After you identify a particular node (or nodes) with higher than expected CPU utilization, you can connect directly to one or more nodes in question and perform a more in-depth analysis by using the following Postgres tools:
32
32
33
33
### pg_stat_statements
34
34
35
35
The `pg_stat_statements` extension helps identify queries that consume time on the server. For more information about this extension, see the detailed [documentation](https://www.postgresql.org/docs/current/pgstatstatements.html).
36
36
37
-
#### Calls/Mean & Total Execution Time
37
+
#### Calls/Mean and total execution time
38
38
39
39
The following query returns the top five SQL statements by highest total execution time:
40
40
@@ -47,7 +47,7 @@ DESC LIMIT 5;
47
47
48
48
### pg_stat_activity
49
49
50
-
The `pg_stat_activity` view shows the queries that are currently being executed on the specific node. Monitor active queries, sessions, and states on that node.
50
+
The `pg_stat_activity` view shows the queries that are currently running on the specific node. Use it to monitor active queries, sessions, and states on that node.
51
51
52
52
```sql
53
53
SELECT*, now() - xact_start AS duration
@@ -58,7 +58,7 @@ ORDER BY duration DESC;
58
58
59
59
### citus_stat_activity
60
60
61
-
The `citus_stat_activity` view shows the distributed queries that are executing on all nodes, and is a superset of `pg_stat_activity`. This view also shows tasks specific to subqueries dispatched to workers, task state, and worker nodes.
61
+
The `citus_stat_activity` view is a superset of `pg_stat_activity`. It shows the distributed queries that are running on all nodes. It also shows tasks specific to subqueries dispatched to workers, task state, and worker nodes.
62
62
63
63
```sql
64
64
SELECT*, now() - xact_start AS duration
@@ -67,18 +67,18 @@ WHERE state IN ('idle in transaction', 'active') AND pid <> pg_backend_pid()
67
67
ORDER BY duration DESC;
68
68
```
69
69
70
-
## Identify Root Causes
70
+
## Identify root causes
71
71
72
-
If CPU consumption levels are high in general, the following scenarios could be possible root causes:
72
+
If CPU consumption levels are high, the following scenarios might be the root causes:
73
73
74
74
### Long-running transactions on specific node
75
75
76
-
Long-running transactions can consume CPU resources that lead to high CPU utilization.
76
+
Long-running transactions consume CPU resources and lead to high CPU utilization.
77
77
78
78
The following query provides information on long-running transactions:
79
79
80
80
```sql
81
-
SELECT
81
+
SELECT
82
82
pid,
83
83
datname,
84
84
usename,
@@ -91,19 +91,19 @@ SELECT
91
91
wait_event,
92
92
wait_event_type,
93
93
query
94
-
FROM pg_stat_activity
95
-
WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
94
+
FROM pg_stat_activity
95
+
WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
96
96
ORDER BY now() - query_start DESC;
97
97
```
98
98
99
99
### Long-running transactions on all nodes
100
100
101
-
Long-running transactions can consume CPU resources that lead to high CPU utilization.
101
+
Long-running transactions consume CPU resources and lead to high CPU utilization.
102
102
103
103
The following query provides information on long-running transactions across all nodes:
104
104
105
105
```sql
106
-
SELECT
106
+
SELECT
107
107
global_pid, pid,
108
108
nodeid,
109
109
datname,
@@ -117,19 +117,19 @@ SELECT
117
117
wait_event,
118
118
wait_event_type,
119
119
query
120
-
FROM citus_stat_activity
121
-
WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
120
+
FROM citus_stat_activity
121
+
WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
122
122
ORDER BY now() - query_start DESC;
123
123
```
124
124
125
125
### Slow query
126
126
127
-
Slow queries can consume CPU resources that lead to high CPU utilization.
127
+
Slow queries consume CPU resources and cause high CPU utilization.
128
128
129
-
The following query helps identify queries taking longer run times:
129
+
The following query helps you identify queries that take longer run times:
130
130
131
131
```sql
132
-
SELECT
132
+
SELECT
133
133
query,
134
134
calls,
135
135
mean_exec_time,
@@ -151,94 +151,95 @@ ORDER BY total_exec_time DESC;
151
151
152
152
### Total number of connections and number of connections by state on a node
153
153
154
-
Many connections to the database might also lead to increased CPU utilization.
154
+
Many connections to the database lead to increased CPU utilization.
155
155
156
156
The following query provides information about the number of connections by state on a single node:
157
157
158
158
```sql
159
-
SELECT state, COUNT(*)
160
-
FROM pg_stat_activity
161
-
WHERE pid <> pg_backend_pid()
162
-
GROUP BY state
159
+
SELECT state, COUNT(*)
160
+
FROM pg_stat_activity
161
+
WHERE pid <> pg_backend_pid()
162
+
GROUP BY state
163
163
ORDER BY state ASC;
164
164
```
165
165
166
166
### Total number of connections and number of connections by state on all nodes
167
167
168
-
Many connections to the database might also lead to increased CPU utilization.
168
+
Many connections to the database lead to increased CPU utilization.
169
169
170
170
The following query gives information about the number of connections by state across all nodes:
171
171
172
172
```sql
173
-
SELECT state, COUNT(*)
174
-
FROM citus_stat_activity
175
-
WHERE pid <> pg_backend_pid()
176
-
GROUP BY state
173
+
SELECT state, COUNT(*)
174
+
FROM citus_stat_activity
175
+
WHERE pid <> pg_backend_pid()
176
+
GROUP BY state
177
177
ORDER BY state ASC;
178
178
```
179
179
180
-
### Vacuum and Table Stats
180
+
### Vacuum and table stats
181
+
182
+
Keeping table statistics up to date helps improve query performance. Monitor whether regular autovacuuming is happening.
181
183
182
-
Keeping table statistics up to date helps improve query performance. Monitor whether regular auto vacuuming is being carried out.
184
+
The following query helps you identify the tables that need vacuuming:
183
185
184
-
The following query helps to identify the tables that need vacuuming:
185
186
```sql
186
-
SELECT*
187
-
FROM run_command_on_all_nodes($$
188
-
SELECT json_agg(t)
189
-
FROM (
187
+
SELECT*
188
+
FROM run_command_on_all_nodes($$
189
+
SELECT json_agg(t)
190
+
FROM (
190
191
SELECT schemaname, relname
191
192
,n_live_tup, n_dead_tup
192
193
,n_dead_tup / (n_live_tup) AS bloat
193
194
,last_autovacuum, last_autoanalyze
194
-
,last_vacuum, last_analyze
195
-
FROM pg_stat_user_tables
196
-
WHERE n_live_tup >0AND relname LIKE'%orders%'
197
-
ORDER BY n_dead_tup DESC
195
+
,last_vacuum, last_analyze
196
+
FROM pg_stat_user_tables
197
+
WHERE n_live_tup >0AND relname LIKE'%orders%'
198
+
ORDER BY n_dead_tup DESC
198
199
) t
199
200
$$);
200
201
```
201
202
202
-
The following image highlights the output resulting from the above query. The "result" column is a json datatype containing information on the stats.
203
+
The following image highlights the output from the preceding query. The `result` column is a JSON data type containing information on the stats.
203
204
204
205
:::image type="content" source="./media/how-to-high-cpu-utilization-elastic-clusters/elastic-clusters-cpu-utilization-result.png" alt-text="Results returned from query response - including `result` column as a json datatype " lightbox="./media/how-to-high-cpu-utilization-elastic-clusters/elastic-clusters-cpu-utilization-result.png":::
205
206
206
-
The last_autovacuum and last_autoanalyze columns provide the date and time when the table was last auto vacuumed or analyzed. If the tables aren't being vacuumed regularly, take steps to tune autovacuum.
207
+
The `last_autovacuum` and `last_autoanalyze` columns provide the date and time when the table was last autovacuumed or analyzed. If the tables aren't autovacuumed regularly, take steps to tune autovacuum.
207
208
208
-
The following query provides information regarding the amount of bloat at the schema level:
209
+
The following query provides information about the amount of bloat at the schema level:
209
210
210
211
```sql
211
-
SELECT*
212
-
FROM run_command_on_all_nodes($$
213
-
SELECT json_agg(t) FROM (
212
+
SELECT*
213
+
FROM run_command_on_all_nodes($$
214
+
SELECT json_agg(t) FROM (
214
215
SELECT schemaname, sum(n_live_tup) AS live_tuples
215
216
, sum(n_dead_tup) AS dead_tuples
216
-
FROM pg_stat_user_tables
217
-
WHERE n_live_tup >0
218
-
GROUP BY schemaname
217
+
FROM pg_stat_user_tables
218
+
WHERE n_live_tup >0
219
+
GROUP BY schemaname
219
220
ORDER BYsum(n_dead_tup) DESC
220
-
) t
221
+
) t
221
222
$$);
222
223
```
223
224
224
-
## Resolve High CPU Utilization
225
+
## Resolve high CPU utilization
225
226
226
227
Use EXPLAIN ANALYZE to examine any slow queries and terminate any improperly long running transactions. Consider using the built-in PgBouncer connection pooler and clear up excessive bloat to resolve high CPU utilization.
227
228
228
229
### Use EXPLAIN ANALYZE
229
230
230
-
Once you know the queries that are consuming more CPU, use **EXPLAIN ANALYZE** to further investigate and tune them.
231
+
After you identify the queries that consume more CPUs, use **EXPLAIN ANALYZE** to further investigate and tune them.
231
232
232
-
For more information about the **EXPLAIN ANALYZE** command, review its [documentation](https://www.postgresql.org/docs/current/sql-explain.html).
233
+
For more information about the **EXPLAIN ANALYZE** command, see its [documentation](https://www.postgresql.org/docs/current/sql-explain.html).
233
234
234
235
### Terminate long running transactions on a node
235
236
236
-
You can consider terminating a long running transaction as an option if the transaction is running longer than expected.
237
+
Consider terminating a long running transaction if the transaction runs longer than expected.
237
238
238
-
To terminate a session's PID, you need to find its PID by using the following query:
239
+
To terminate a session's PID, first find the PID by using the following query:
239
240
240
241
```sql
241
-
SELECT
242
+
SELECT
242
243
pid,
243
244
datname,
244
245
usename,
@@ -251,29 +252,29 @@ SELECT
251
252
wait_event,
252
253
wait_event_type,
253
254
query
254
-
FROM pg_stat_activity WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
255
+
FROM pg_stat_activity WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
255
256
ORDER BY now() - query_start DESC;
256
257
```
257
258
258
-
You can also filter by other properties like usename (user name), datname (database name), etc.
259
+
You can also filter by other properties like `usename` (user name), `datname` (database name), and more.
259
260
260
-
Once you have the session's PID, you can terminate it using the following query:
261
+
After you get the session's PID, terminate it by using the following query:
261
262
262
263
```sql
263
264
SELECT pg_terminate_backend(pid);
264
265
```
265
266
266
-
Terminating the pid ends the specific sessions related to a node.
267
+
Terminating the PID ends the specific sessions related to a node.
267
268
268
269
### Terminate long running transactions on all nodes
269
270
270
-
You could consider ending a long running transaction as an option.
271
+
Consider ending a long running transaction.
271
272
272
-
To terminate a session's PID, you need to find its PID, global_pid by using the following query:
273
+
To terminate a session's PID, find its PID and global_pid by using the following query:
273
274
274
275
```sql
275
-
SELECT
276
-
global_pid,
276
+
SELECT
277
+
global_pid,
277
278
pid,
278
279
nodeid,
279
280
datname,
@@ -287,22 +288,22 @@ SELECT
287
288
wait_event,
288
289
wait_event_type,
289
290
query
290
-
FROM citus_stat_activity WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
291
+
FROM citus_stat_activity WHERE state !='idle'AND pid <> pg_backend_pid() AND state IN ('idle in transaction', 'active')
291
292
ORDER BY now() - query_start DESC;
292
293
```
293
294
294
-
You can also filter by other properties like usename (user name), datname (database name), etc.
295
+
You can also filter by other properties like `usename` (user name), `datname` (database name), and more.
295
296
296
-
Once you have the session's PID, you can terminate it using the following query:
297
+
After you get the session's PID, terminate it by using the following query:
297
298
298
299
```sql
299
300
SELECT pg_terminate_backend(pid);
300
301
```
301
302
Terminating the pid ends the specific sessions related to a worker node.
302
303
303
-
The same query running on different worker nodes might have same global_pid’s. In that case, you can end long running transaction on all worker nodes use global_pid.
304
+
The same query running on different worker nodes might have same global_pid's. In that case, you can end long running transaction on all worker nodes use global_pid.
304
305
305
-
The following screenshot shows the relativity of the global_pid’s to session pid’s.
306
+
The following screenshot shows the relativity of the global_pid's to session pid's.
> To terminate long running transactions, it is advised to set server parameters `statement_timeout` or `idle_in_transaction_session_timeout`.
315
+
> To terminate long running transactions, set server parameters `statement_timeout` or `idle_in_transaction_session_timeout`.
315
316
316
317
## Clearing bloat
317
318
318
-
A short-term solution would be to manually vacuum and then analyze the tables where slow queries are seen:
319
+
A short-term solution is to manually vacuum and then analyze the tables where slow queries appear:
319
320
320
321
```sql
321
322
VACUUM ANALYZE <table>;
322
323
```
323
324
324
-
## Managing Connections
325
+
## Managing connections
325
326
326
-
In situations where there are many short-lived connections, or many connections that remain idle for most of their life, consider using a connection pooler like PgBouncer.
327
+
If your application uses many short-lived connections or many connections that stay idle for most of their life, consider using a connection pooler like PgBouncer.
327
328
328
329
## PgBouncer, a built-in connection pooler
329
330
330
-
For more information about PgBouncer, see [connection pooler](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/not-all-postgres-connection-pooling-is-equal/ba-p/825717) and [connection handling best practices with PostgreSQL](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/connection-handling-best-practice-with-postgresql/ba-p/790883)
331
+
For more information about PgBouncer, see [connection pooler](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/not-all-postgres-connection-pooling-is-equal/ba-p/825717) and [connection handling best practices with PostgreSQL](https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/connection-handling-best-practice-with-postgresql/ba-p/790883).
331
332
332
333
Azure Database for PostgreSQL Elastic Clusters offer PgBouncer as a built-in connection pooling solution. For more information, see [PgBouncer](../connectivity/concepts-pgbouncer.md).
333
334
334
335
## Related content
335
336
336
-
-[Server parameters in Azure Database for PostgreSQL](../server-parameters/concepts-server-parameters.md).
337
-
-[Autovacuum tuning in Azure Database for PostgreSQL](how-to-autovacuum-tuning.md).
337
+
-[Server parameters in Azure Database for PostgreSQL](../server-parameters/concepts-server-parameters.md)
338
+
-[Autovacuum tuning in Azure Database for PostgreSQL](how-to-autovacuum-tuning.md)
0 commit comments