At $WORK we were running a MySQL setup that required an unusually large number of concurrent connections. Most of them were idle. Only a few dozen connections were running queries. The total number of connections was around twelve thousand and was growing slowly.
When the total number of connections reached close to 12,200, things started to break. New clients could no longer connect to the database. The error in the MySQL log was:
Can't create thread to handle request (errno=11)
Once the database reached that state, it was effectively stuck. Any administrative action required opening a new database connection first, and those connections could no longer be created. Shutting the database down cleanly was not possible.
First response
We lowered max_connections and enabled a shell script that aggressively killed sleeping connections every minute.
That kept the database running and prevented it from entering a fully stuck state.
Many applications relied on connection pools with long-lived connections. Killing those connections underneath the pools introduced secondary issues. From the application side, logs started filling up with driver warnings, connection resets, and retry messages.
At that point it was clear that killing connections at the database layer bought us some time, but it was not a sustainable solution.
Why this was hard to diagnose
errno=11 (EAGAIN) is not a very helpful error.
It only indicates that the operating system refused a resource request due to resource limits.
Searching for this error returns many threads and articles.
Most focus on adjusting process limits in limits.conf.
A few discuss memory allocation and thread stack sizes.
Our process limits were much higher than 12K. Memory was less clear. The instance exhibiting the issue had a few hundred gigabytes of RAM, most of it used by the InnoDB buffer pool.
We could not reproduce the issue in a test environment. Most load-testing tools fail long before they can create tens of thousands of threads. There are also no tools that create ten thousand idle connections alongside a small number of active ones.
Percona’s own investigations describe multiple possible causes without converging on a single root explanation:
https://www.percona.com/blog/mysql-challenge-100k-connections/
https://www.percona.com/blog/cant_create_thread_errno_11/
At that point, we wanted to determine whether the issue was caused by a process limit or by a memory allocation limit.
Testing thread creation directly
Can Linux create this many threads at all?
To answer that question without introducing additional variables, I wrote a small C program that does nothing except create threads and keep them alive.
create_threads.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#define NUM_THREADS 100000
#define STACK_SIZE (64 * 1024)
/*
* Simple thread function.
* Threads do nothing and stay alive.
*/
void* thread_function(void* arg) {
pause();
return NULL;
}
int main() {
pthread_t threads[NUM_THREADS];
pthread_attr_t attr;
size_t stacksize;
int i, ret;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, STACK_SIZE);
pthread_attr_getstacksize(&attr, &stacksize);
printf("Stack size set to %zu bytes\n", stacksize);
for (i = 0; i < NUM_THREADS; i++) {
ret = pthread_create(&threads[i], &attr, thread_function, NULL);
if (ret != 0) {
fprintf(stderr,
"Error: pthread_create failed at iteration %d: %s (errno=%d)\n",
i, strerror(ret), ret);
exit(EXIT_FAILURE);
}
if (i % 1000 == 0) {
printf("Created %d threads\n", i);
}
}
printf("Successfully created %d threads\n", NUM_THREADS);
pause();
return 0;
}
compile as:
gcc -pthread -o create_threads create_threads.c
create_threads failed very consistently at around 12,288 threads. One detail stood out - when the output was piped through another process:
./create_threads | grep error
The maximum number of threads dropped by one, which suggested external task accounting limit rather than memory exhaustion.
systemd and UserTasksMax
Focusing the search on the 12K thread limit yielded:
https://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux https://www.percona.com/blog/tasksmax-another-setting-that-can-cause-mysql-error-messages/
These showed that the 12K ceiling matched a systemd limit called UserTasksMax.
On many systems, the default value is UserTasksMax=12288.
This limit is enforced per user and implemented using cgroups, tt is independent of traditional ulimit settings.
MySQL creates one Linux thread per connection, once this limit is reached, thread creation fails and MySQL surfaces it as errno=11.
After increasing UserTasksMax, create_threads was able to create 100,000 threads without issue.
Verifying with real MySQL connections
At that point it was still necessary to confirm that we could create 30K actual MySQL connections, not just idle threads, 30K is roughly three times our production connection count.
A second C utility was written to create real MySQL connections using the MySQL client library.
create_mysql_threads.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <mysql.h>
#define STACK_SIZE (256 * 1024)
typedef struct {
pthread_t thread_id;
MYSQL *conn;
int thread_num;
} ThreadInfo;
ThreadInfo *threads_info;
int NUM_THREADS;
void* thread_function(void* arg) {
ThreadInfo *thread_info = (ThreadInfo *)arg;
/*
* MySQL requires per-thread initialization.
*/
if (mysql_thread_init()) {
fprintf(stderr, "mysql_thread_init() failed\n");
pthread_exit(NULL);
}
thread_info->conn = mysql_init(NULL);
mysql_options(thread_info->conn,
MYSQL_READ_DEFAULT_GROUP,
"create_mysql_threads");
/*
* NULL means mysql client library will look at my.cnf for credentials
*/
if (!mysql_real_connect(thread_info->conn,
NULL, NULL, NULL, NULL, 0, NULL, 0)) {
fprintf(stderr,
"mysql_real_connect() failed: %s\n",
mysql_error(thread_info->conn));
pthread_exit(NULL);
}
/*
* Keep the connection open long enough
* to observe total connection count.
*/
sleep(10);
mysql_close(thread_info->conn);
mysql_thread_end();
pthread_exit(NULL);
}
int main(int argc, char *argv[]) {
pthread_attr_t attr;
size_t stacksize;
int i;
mysql_library_init(0, NULL, NULL);
NUM_THREADS = atoi(argv[1]);
threads_info = malloc(NUM_THREADS * sizeof(ThreadInfo));
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, STACK_SIZE);
pthread_attr_getstacksize(&attr, &stacksize);
printf("Stack size set to %zu bytes\n", stacksize);
for (i = 0; i < NUM_THREADS; i++) {
threads_info[i].thread_num = i;
pthread_create(&threads_info[i].thread_id,
&attr,
thread_function,
&threads_info[i]);
}
for (i = 0; i < NUM_THREADS; i++) {
pthread_join(threads_info[i].thread_id, NULL);
}
free(threads_info);
mysql_library_end();
return 0;
}
Compile and run as:
sudo apt-get install --reinstall libmysqlclient-dev gdb # adjust for your OS
gcc -g -pthread -o create_mysql_threads create_mysql_threads.c $(mysql_config --cflags --libs) # mysql_config returns list of -I and -L flags needed to compile against mysql client libs`
# run
./create_mysql_threads 1000 # num of mysql connections to create
With UserTasksMax adjusted, this test was able to create 30K connections, which was sufficient for our workload.
Later on, using the same C-based tools, we were also able to validate both thread creation and MySQL connection handling at the 100,000 level in a test environment.
Final change
Using configuration management, UserTasksMax was adjusted only on systems where the MySQL max_connections setting exceeded the default systemd limit.
No other changes were required. Both the thread-only test and the MySQL connection test were used to validate the fix before and after the rollout.
Takeaway
This incident happened before these workloads were moved to CloudSQL, when the primary databases were running on-prem.
Running a heavily loaded MySQL system on-prem requires understanding how MySQL maps its execution model onto the operating system and where OS-level limits apply.
Managed platforms like CloudSQL remove this entire class of problem, as host-level tuning is handled by the platform.
On-prem, host-level boundaries are your responsibility. At scale, you eventually run into them.
References
MySQL 100K Connections Challenge (Percona) https://www.percona.com/blog/mysql-challenge-100k-connections/
“Can’t create thread to handle request (errno=11)” (Percona) https://www.percona.com/blog/cant_create_thread_errno_11/
TasksMax: another setting that can cause MySQL errors (Percona) https://www.percona.com/blog/tasksmax-another-setting-that-can-cause-mysql-error-messages/
Maximum number of threads per process in Linux (Stack Overflow) https://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux
MySQL Client C API https://dev.mysql.com/doc/c-api/5.7/en/c-api-function-descriptions.html
POSIX Threads Basics https://www.cs.cmu.edu/afs/cs/academic/class/15492-f07/www/pthreads.html#BASICS