Avoid loading certificates from system store for every connection by SReicheltPTV · Pull Request #6418 · npgsql/npgsql

SReicheltPTV · 2026-01-21T18:39:41Z

Cache loaded custom root certificates, in particular if the PGSSLROOTCERT environment variable is set. This fixes #6417.

Original text:

This PR contains two changes:

An improvement for commit 7320260: After that commit, it should no longer be necessary to validate against two different certificate stores.

An attempt to fix Severe performance issue in Npgsql 10 when PGSSLROOTCERT is set to system store #6417, by guessing whether the value of PGSSLROOTCERT matches the system store.

I hope the changes are OK; unfortunately I can only test them in limited ways.

roji · 2026-01-21T22:29:18Z

src/Npgsql/Internal/NpgsqlConnector.cs

-            if (verifyFull && sslPolicyErrors.HasFlag(SslPolicyErrors.RemoteCertificateNameMismatch))
-                return false;
+            // Also check against the default systme certificates location on Debian/Ubuntu.
+            if (certRootPath == "/etc/ssl/certs/ca-certificates.crt")


FWIW we avoid hard-coding filesystem paths in the source code like this, as different Linux distributions may have their certificate store in completely different places etc.

I understand. Unfortunately, the SSL_CERT_FILE env var doesn't usually seem to be set. As I wrote in #6417, I'm a bit out of ideas here.
In the meantime, it occurred to me that a completely different alternative could be to just cache the custom certificates similarly to how the system certificates are cached. Then there would no longer be a need to figure out what the path points to. I'll try to come up with something.

/cc @vonzshik on all this, who actually knows stuff about all the certificate stuff

vonzshik · 2026-01-22T11:40:13Z

So, from my understanding, in your particular case you had certificate in store and specified in PGSSLROOTCERT, which after 7320260 resulted in it being verified twice.
If we consider loading certificates from disk per each physical open as the main issue, then we can just cache them in memory (looking at OpenSSL's code, it seems that's exactly what they're doing) after the first load. Building the chain twice should be quite fast, as long as certificate revocation isn't involved (and even then I'm pretty sure the results are cached for all certificates).
Personally, I'm just very hesitant to touch CertificateChainPolicy as it seems to be very easy to introduce potential security issues through it (for example, there is a comment on SslClientAuthenticationOptions.CertificateChainPolicy which says If not null, CertificateRevocationCheckMode and SslCertificateTrust are ignored. And while CertificateRevocationCheckMode seems to be easy enough, though it doesn't say what instead should be set, where is 0 information about SslCertificateTrust).

roji · 2026-01-22T11:42:16Z

If we consider loading certificates from disk per each physical open as the main issue, then we can just cache them in memory (looking at OpenSSL's code, it seems that's exactly what they're doing) after the first load.

Just noting that this seems somewhat unlikely - a physical connection open is quite a heavy thing (with potentially multiple network roundrips etc.), and I'd expect just loading a certificate from disk to be negligible. It may be a good idea to first prove that there's a problem via a benchmark.

vonzshik · 2026-01-22T11:48:04Z

One more thing. libpq also supports the special value system for sslrootcert (Root Certificate in npgsql). The way it works is similar to npgsql as if Root Certificate isn't specified, at which point it'll load certificate from the store (and also changes the ssl mode to VerifyFull). We can also add this to allow ignoring environment variables and force using the store.

SReicheltPTV · 2026-01-22T14:06:14Z

So, from my understanding, in your particular case you had certificate in store and specified in PGSSLROOTCERT, which after 7320260 resulted in it being verified twice.

That's correct, though I suspect the problem may not be the double verification of this specific certificate, but the size of the system store if PGSSLROOTCERT is set to it.

If we consider loading certificates from disk per each physical open as the main issue, then we can just cache them in memory (looking at OpenSSL's code, it seems that's exactly what they're doing) after the first load. Building the chain twice should be quite fast, as long as certificate revocation isn't involved (and even then I'm pretty sure the results are cached for all certificates).

I'll check whether caching the loaded certificates is sufficient, and will report back on this tomorrow.

Personally, I'm just very hesitant to touch CertificateChainPolicy as it seems to be very easy to introduce potential security issues through it (for example, there is a comment on SslClientAuthenticationOptions.CertificateChainPolicy which says If not null, CertificateRevocationCheckMode and SslCertificateTrust are ignored. And while CertificateRevocationCheckMode seems to be easy enough, though it doesn't say what instead should be set, where is 0 information about SslCertificateTrust).

While I don't have a good grasp on these details, note that the current Npgsql code ignores the result of the verification against the system certificates if PGSSLROOTCERT is set, and then performs its own verification using the custom chain policy. So my (uncertain) impression is that whatever could be ignored is already being ignored.

One more thing. libpq also supports the special value system for sslrootcert (Root Certificate in npgsql). The way it works is similar to npgsql as if Root Certificate isn't specified, at which point it'll load certificate from the store (and also changes the ssl mode to VerifyFull). We can also add this to allow ignoring environment variables and force using the store.

That sounds useful, but just to clarify: In my case, simply unsetting PGSSLROOTCERT did the job. But since the problem was somewhat difficult to pin down, and the situation with PGSSLROOTCERT set to the system cert store seems somewhat standard, I'm trying to find a way to prevent others from running into the same issue.

vonzshik · 2026-01-22T14:13:43Z

While I don't have a good grasp on these details, note that the current Npgsql code ignores the result of the verification against the system certificates if PGSSLROOTCERT is set, and then performs its own verification using the custom chain policy. So my (uncertain) impression is that whatever could be ignored is already being ignored.

Yes but not exactly, as that while we do throw away the previous verification result, the only thing we do is we add specified certificate to custom store and ask the chain to trust whatever there is, after which we try to build the chain again. That is to say, if there were some other issues unrelated to root certificate not being trusted (for example, getting a completely different root certificate), it'll still return an error.

…onal collection

SReicheltPTV · 2026-01-23T10:05:13Z

If we consider loading certificates from disk per each physical open as the main issue, then we can just cache them in memory (looking at OpenSSL's code, it seems that's exactly what they're doing) after the first load. Building the chain twice should be quite fast, as long as certificate revocation isn't involved (and even then I'm pretty sure the results are cached for all certificates).

I can confirm that caching the certificates fixes the problem we had, and I have updated the PR accordingly.

Just noting that this seems somewhat unlikely - a physical connection open is quite a heavy thing (with potentially multiple network roundrips etc.), and I'd expect just loading a certificate from disk to be negligible. It may be a good idea to first prove that there's a problem via a benchmark.

I agree a benchmark would be useful, but I don't know exactly how to do it. In the web service where we experienced the problem, two factors appear to influence the result:

As PGSSLROOTCERT is set to the system store, quite a few certificates are loaded.
The CPU resources in the development stage are intentionally kept very low.

In that setting, the CPU usage had a rather serious side effect: If many requests reached the service at the same time, all database connection attempts ran into a timeout while validating the certificate chain. As a result, these connections also never made it into the connection pool, so every incoming request caused another certificate validation, making the problem even worse.

Yes but not exactly, as that while we do throw away the previous verification result, the only thing we do is we add specified certificate to custom store and ask the chain to trust whatever there is, after which we try to build the chain again. That is to say, if there were some other issues unrelated to root certificate not being trusted (for example, getting a completely different root certificate), it'll still return an error.

I see, thanks. I've removed this change from the PR.

roji · 2026-01-23T11:37:31Z

BTW just to state the obvious - IIUC caching certificates would mean that users can't deploy a new certificate without restarting the application. That might not be a problem, and if OpenSSL does this as @vonzshik indicated, then we definitely can do the same, I think.

SReicheltPTV · 2026-01-23T11:57:51Z

BTW just to state the obvious - IIUC caching certificates would mean that users can't deploy a new certificate without restarting the application. That might not be a problem, and if OpenSSL does this as @vonzshik indicated, then we definitely can do the same, I think.

Oh, right. We could check the file timestamp if necessary, of course.

roji · 2026-01-23T12:50:19Z

BTW just to state the obvious - IIUC caching certificates would mean that users can't deploy a new certificate without restarting the application. That might not be a problem, and if OpenSSL does this as @vonzshik indicated, then we definitely can do the same, I think.

Oh, right. We could check the file timestamp if necessary, of course.

If you're doing I/O to check the file timestamp, and certificate files are generally quite small, then it really would seem weird that just checking the file timestamp is so much faster than just loading the file... Seeing an actual benchmark around all this would be very helpful.

vonzshik · 2026-01-23T14:44:18Z

I decided to dive in deeper in regards to OpenSSL's certificate cache, and things are not as straightforward as I thought at first.

libpq uses SSL_CTX_load_verify_locations to load root cert in memory, which in the end calls X509_load_cert_crl_file_ex, with no caching in sight. Now, there seems to be some caching involved whenever SSL_CTX_load_verify_locations is called for a folder instead of a specific file, but that's not our case and not that interesting anyway.

vonzshik · 2026-01-23T14:49:50Z

@SReicheltPTV I see in your pr you added a reference to runtime issue dotnet/runtime#123058. Reading through it, I see that the dev there recommends to instead use CertificateChainPolicy with CustomTrustStore. If that's indeed enough to implement a one-time certificate verification with no other side effects (which seems to be the case, judging by their response), I think we can go back to the previous implementation and go from there.

After commit 7320260, it is no longer necessary to perform the validation twice against the two different certificate stores.

…crt file Fixes npgsql#6417

SReicheltPTV · 2026-01-23T17:45:19Z

If you're doing I/O to check the file timestamp, and certificate files are generally quite small, then it really would seem weird that just checking the file timestamp is so much faster than just loading the file... Seeing an actual benchmark around all this would be very helpful.

Checking the timestamp proved unproblematic in terms of performance. I've added it as a separate commit now so you can decide whether to include it.

I haven't done any systematic benchmarking, but for the instance in our low-resource dev environment, the service becomes completely overloaded when a certain number of database connections are opened simultaneously (triggering the certificate validation), and requests time out after 30-45s. (Not sure why it varies; I'm guessing that incoming requests are being blocked at some point.) Normal response times are around 50-300ms. I don't think the numbers can be compared meaningfully, though, because under normal circumstances, when the database connections succeed, they remain in the connection pool for a while.
The certificate file is 220KB in this case; I think it contains the standard root certificates of a Linux distribution plus some custom ones. (As I wrote above, although it may not be a good idea to set PGSSLROOTCERT to this file, the actual issue was that we had no idea that this was causing the problems.)

@SReicheltPTV I see in your pr you added a reference to runtime issue dotnet/runtime#123058. Reading through it, I see that the dev there recommends to instead use CertificateChainPolicy with CustomTrustStore. If that's indeed enough to implement a one-time certificate verification with no other side effects (which seems to be the case, judging by their response), I think we can go back to the previous implementation and go from there.

Oh, I hadn't seen that actually. Yes, they point to https://learn.microsoft.com/en-us/dotnet/core/extensions/sslstream-best-practices#custom-certificate-trust, but that page doesn't seem to address your specific objection. I'm not an expert on this at all unfortunately. Anyway, I have rebased onto that commit again as you suggested.

Fix debug assertion

b93487d

SReicheltPTV requested review from roji and vonzshik as code owners January 21, 2026 18:39

roji reviewed Jan 21, 2026

View reviewed changes

Minor cleanup: use caCertificates directly instead of creating additi…

e948041

…onal collection

SReicheltPTV force-pushed the main branch from a45e21e to 420e128 Compare January 23, 2026 10:02

SReicheltPTV added 3 commits January 23, 2026 17:23

Run validation only once in case of a custom certificate chain policy

30aeace

After commit 7320260, it is no longer necessary to perform the validation twice against the two different certificate stores.

Cache custom root certificates set via PGSSLROOTCERT env var or root.…

85e039e

…crt file Fixes npgsql#6417

Check file timestamp to avoid using stale cache entry

1908315

SReicheltPTV force-pushed the main branch from 420e128 to 1908315 Compare January 23, 2026 17:15

Conversation

SReicheltPTV commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roji Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

SReicheltPTV Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

roji Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

vonzshik commented Jan 22, 2026

Uh oh!

roji commented Jan 22, 2026

Uh oh!

vonzshik commented Jan 22, 2026

Uh oh!

SReicheltPTV commented Jan 22, 2026

Uh oh!

vonzshik commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SReicheltPTV commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roji commented Jan 23, 2026

Uh oh!

SReicheltPTV commented Jan 23, 2026

Uh oh!

roji commented Jan 23, 2026

Uh oh!

vonzshik commented Jan 23, 2026

Uh oh!

vonzshik commented Jan 23, 2026

Uh oh!

SReicheltPTV commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SReicheltPTV commented Jan 21, 2026 •

edited

Loading

vonzshik commented Jan 22, 2026 •

edited

Loading

SReicheltPTV commented Jan 23, 2026 •

edited

Loading