Avoid loading certificates from system store for every connection#6418
Avoid loading certificates from system store for every connection#6418SReicheltPTV wants to merge 5 commits intonpgsql:mainfrom
Conversation
| if (verifyFull && sslPolicyErrors.HasFlag(SslPolicyErrors.RemoteCertificateNameMismatch)) | ||
| return false; | ||
| // Also check against the default systme certificates location on Debian/Ubuntu. | ||
| if (certRootPath == "/etc/ssl/certs/ca-certificates.crt") |
There was a problem hiding this comment.
FWIW we avoid hard-coding filesystem paths in the source code like this, as different Linux distributions may have their certificate store in completely different places etc.
There was a problem hiding this comment.
I understand. Unfortunately, the SSL_CERT_FILE env var doesn't usually seem to be set. As I wrote in #6417, I'm a bit out of ideas here.
In the meantime, it occurred to me that a completely different alternative could be to just cache the custom certificates similarly to how the system certificates are cached. Then there would no longer be a need to figure out what the path points to. I'll try to come up with something.
There was a problem hiding this comment.
/cc @vonzshik on all this, who actually knows stuff about all the certificate stuff
|
So, from my understanding, in your particular case you had certificate in store and specified in |
Just noting that this seems somewhat unlikely - a physical connection open is quite a heavy thing (with potentially multiple network roundrips etc.), and I'd expect just loading a certificate from disk to be negligible. It may be a good idea to first prove that there's a problem via a benchmark. |
|
One more thing. libpq also supports the special value |
That's correct, though I suspect the problem may not be the double verification of this specific certificate, but the size of the system store if
I'll check whether caching the loaded certificates is sufficient, and will report back on this tomorrow.
While I don't have a good grasp on these details, note that the current Npgsql code ignores the result of the verification against the system certificates if
That sounds useful, but just to clarify: In my case, simply unsetting |
Yes but not exactly, as that while we do throw away the previous verification result, the only thing we do is we add specified certificate to custom store and ask the chain to trust whatever there is, after which we try to build the chain again. That is to say, if there were some other issues unrelated to root certificate not being trusted (for example, getting a completely different root certificate), it'll still return an error. |
I can confirm that caching the certificates fixes the problem we had, and I have updated the PR accordingly.
I agree a benchmark would be useful, but I don't know exactly how to do it. In the web service where we experienced the problem, two factors appear to influence the result:
In that setting, the CPU usage had a rather serious side effect: If many requests reached the service at the same time, all database connection attempts ran into a timeout while validating the certificate chain. As a result, these connections also never made it into the connection pool, so every incoming request caused another certificate validation, making the problem even worse.
I see, thanks. I've removed this change from the PR. |
|
BTW just to state the obvious - IIUC caching certificates would mean that users can't deploy a new certificate without restarting the application. That might not be a problem, and if OpenSSL does this as @vonzshik indicated, then we definitely can do the same, I think. |
Oh, right. We could check the file timestamp if necessary, of course. |
If you're doing I/O to check the file timestamp, and certificate files are generally quite small, then it really would seem weird that just checking the file timestamp is so much faster than just loading the file... Seeing an actual benchmark around all this would be very helpful. |
|
I decided to dive in deeper in regards to OpenSSL's certificate cache, and things are not as straightforward as I thought at first. libpq uses SSL_CTX_load_verify_locations to load root cert in memory, which in the end calls X509_load_cert_crl_file_ex, with no caching in sight. Now, there seems to be some caching involved whenever |
|
@SReicheltPTV I see in your pr you added a reference to runtime issue dotnet/runtime#123058. Reading through it, I see that the dev there recommends to instead use |
After commit 7320260, it is no longer necessary to perform the validation twice against the two different certificate stores.
Checking the timestamp proved unproblematic in terms of performance. I've added it as a separate commit now so you can decide whether to include it. I haven't done any systematic benchmarking, but for the instance in our low-resource dev environment, the service becomes completely overloaded when a certain number of database connections are opened simultaneously (triggering the certificate validation), and requests time out after 30-45s. (Not sure why it varies; I'm guessing that incoming requests are being blocked at some point.) Normal response times are around 50-300ms. I don't think the numbers can be compared meaningfully, though, because under normal circumstances, when the database connections succeed, they remain in the connection pool for a while.
Oh, I hadn't seen that actually. Yes, they point to https://learn.microsoft.com/en-us/dotnet/core/extensions/sslstream-best-practices#custom-certificate-trust, but that page doesn't seem to address your specific objection. I'm not an expert on this at all unfortunately. Anyway, I have rebased onto that commit again as you suggested. |
Cache loaded custom root certificates, in particular if the
PGSSLROOTCERTenvironment variable is set. This fixes #6417.Original text: