Skip to content
New issue

Have a question about this project?Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of serviceand privacy statement.We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SslInfoContributor and SslHealthIndicator #41205

Closed

Conversation

jonatan-ivanov
Copy link
Member

@jonatan-ivanov jonatan-ivanov commented Jun 21, 2024

Production incidents because of invalid certificates are common issues in the industry.SslInfoContributorandSslHealthIndicatorin this PR can help to mitigate them, they:

  • Provide extra information about the used certificates (issuer,subject,validity,etc.)
  • Indicate if a certificate is invalid and the reason of it
  • Indicate if a certificate will be invalid within a configurable threshold (e.g.: in a week)

Example/infoand/healthoutputs:

/infoof aVALIDcert (click here to expand)
{
"ssl":{
"bundles":[
{
"name":"ssldemo",
"certificateChains":[
{
"alias":"spring-boot",
"certificates":[
{
"version":"V3",
"signatureAlgorithmName":"SHA256withRSA",
"validityStarts":"2024-06-21T21:17:02Z",
"validityEnds":"2024-07-05T21:17:02Z",
"validity":{
"status":"VALID"
},
"serialNumber":"9fbcacfed83af328",
"issuer":"CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
"subject":"CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
}
]
}
]
}
]
}
}
/healthof aVALIDcert (click here to expand)
{
"status":"UP",
"components":{
"ping":{
"status":"UP"
},
"ssl":{
"status":"UP"
}
}
}
/infoof anEXPIREDcert (click here to expand)
{
"ssl":{
"bundles":[
{
"name":"ssldemo",
"certificateChains":[
{
"alias":"spring-boot-ssl-sample",
"certificates":[
{
"version":"V3",
"validity":{
"status":"EXPIRED",
"message":"Not valid after 2014-10-21T19:48:43Z"
},
"validityStarts":"2014-07-23T19:48:43Z",
"validityEnds":"2014-10-21T19:48:43Z",
"signatureAlgorithmName":"SHA256withRSA",
"serialNumber":"7207ee6e",
"issuer":"CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown",
"subject":"CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown"
}
]
}
]
}
]
}
}
/healthof anEXPIREDcert (click here to expand)
{
"status":"OUT_OF_SERVICE",
"components":{
"ping":{
"status":"UP"
},
"ssl":{
"status":"OUT_OF_SERVICE",
"details":{
"certificateChains":[
{
"alias":"spring-boot-ssl-sample",
"certificates":[
{
"version":"V3",
"validityEnds":"2014-10-21T19:48:43Z",
"validityStarts":"2014-07-23T19:48:43Z",
"signatureAlgorithmName":"SHA256withRSA",
"validity":{
"status":"EXPIRED",
"message":"Not valid after 2014-10-21T19:48:43Z"
},
"serialNumber":"7207ee6e",
"issuer":"CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown",
"subject":"CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown"
}
]
}
]
}
}
}
}
/infoof a cert thatWILL_EXPIRE_SOON(click here to expand)
{
"ssl":{
"bundles":[
{
"name":"ssldemo",
"certificateChains":[
{
"alias":"spring-boot",
"certificates":[
{
"version":"V3",
"validityEnds":"2024-06-22T21:32:22Z",
"validityStarts":"2024-06-21T21:32:22Z",
"signatureAlgorithmName":"SHA256withRSA",
"validity":{
"status":"WILL_EXPIRE_SOON",
"message":"Certificate will expire within threshold (PT168H) at 2024-06-22T21:32:22Z"
},
"subject":"CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
"serialNumber":"64d019d1dd94eee0",
"issuer":"CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
}
]
}
]
}
]
}
}
/healthof a cert thatWILL_EXPIRE_SOON(click here to expand)
{
"status":"UP",
"components":{
"ping":{
"status":"UP"
},
"ssl":{
"status":"WARNING",
"details":{
"certificateChains":[
{
"alias":"spring-boot-ssl-sample",
"certificates":[
{
"version":"V3",
"validityEnds":"2024-06-22T21:32:22Z",
"validityStarts":"2024-06-21T21:32:22Z",
"signatureAlgorithmName":"SHA256withRSA",
"validity":{
"status":"WILL_EXPIRE_SOON",
"message":"Certificate will expire within threshold (PT168H) at 2024-06-22T21:32:22Z"
},
"subject":"CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
"serialNumber":"64d019d1dd94eee0",
"issuer":"CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
}
]
}
]
}
}
}
}

If you want to play with it, startspring-boot-smoke-test-tomcat-ssl,the cert inresources/sample.jksis alreadyEXPIRED,you can generate aVALIDone via

keytool -genkeypair -storepass secret -keypass password -keystore sample.jks -storetype JKS -dname"CN=localhost, OU=Spring, O=VMware, L=Palo Alto, ST=California, C=US"-validity 14 -alias spring-boot -keyalg RSA -ext"SAN=DNS:localhost,IP:::1,IP:127.0.0.1"

or one thatWILL_EXPIRE_SOONvia:

keytool -genkeypair -storepass secret -keypass password -keystore sample.jks -storetype JKS -dname"CN=localhost, OU=Spring, O=VMware, L=Palo Alto, ST=California, C=US"-validity 1 -alias spring-boot -keyalg RSA -ext"SAN=DNS:localhost,IP:::1,IP:127.0.0.1"

@jonatan-ivanov jonatan-ivanov added the status: waiting-for-triage An issue we've not yet triaged label Jun 21, 2024
@jonatan-ivanov jonatan-ivanov mentioned this pull request Jun 21, 2024
15 tasks
@mhalbritter
Copy link
Contributor

mhalbritter commented Jun 27, 2024

Hey@jonatan-ivanov,that's quite a cool feature, thank you! The implementation looks good, too.

This works for all SSL bundles, and not only for the one used by the webserver, right?

@jonatan-ivanov
Copy link
Member Author

Great to hear! The main focus is the webserver but I think this should work for all SSL bundles (not tested yet) since using an expired cert can cause issues in every place they are used.

I can go ahead and work on the TODO items/docs/tests, in the meantime, can I get some feedback on two important items?

  1. Can/should we introduce a new (common) health status:WARNING(returns 200 but indicates that something is not right)? Or should we keep the current custom status in the PR (WILL_EXPIRE_SOON_STATUS)?
  2. Is callingWebServerSslBundle.getmultiple times ok? See in thePR,inAbstractConfigurableWebServerFactory,and inNettyRSocketServerFactoryor should there be just one instance (i.e.: a@Bean)?

@scottfrederick
Copy link
Contributor

Is callingWebServerSslBundle.getmultiple times ok?

I think this is fine. However, I'm not sure we need to useWebServerSslBundlehere.

That class adapts the discreteserver.ssl.*properties toSslBundledesign. If we ever decide to deprecate and remove the discreteserver.ssl.*properties and only support bundles, then that class would go away.

We could just focus theInfoContributoron SSL bundles and document that you would need to convert fromserver.ssl.*properties to bundle properties to get the benefit of theInfoContributor.

@jonatan-ivanov
Copy link
Member Author

That class adapts the discreteserver.ssl.*properties toSslBundledesign. If we ever decide to deprecate and remove the discreteserver.ssl.*properties and only support bundles, then that class would go away.

👍🏼 That's the exact same use-case I'm usingWebServerSslBundlein this PR for: to be backwards compatible with theserver.ssl.*properties. If only bundles will be supported andWebServerSslBundlewill be removed in the future, the small block of code inSslInfowhereWebServerSslBundleis used should also be removed since there won't be any properties to be backward compatible with.

We could just focus the InfoContributor on SSL bundles and document that you would need to convert from server.ssl.* properties to bundle properties to get the benefit of the InfoContributor.

That would simplify the changes in the PR a bit but also complicate the life of the users: simply enabling the feature might not do anything for them if they are not using bundles or it would work just half-way for them if they use bundles just not for the webserver. If it is not a big issue to useWebServerSslBundle,I would prefer supporting theserver.ssl.*properties unless you have plans to deprecate them (and I guess eventually remove them).

@scottfrederick
Copy link
Contributor

Along withserver.ssl.*(andmanagement.server.ssl.*), we havespring.rsocket.server.ssl.*,spring.rabbitmq.ssl.*,andspring.kafka.*.ssl.*properties where users can currently choose to use discrete keystore and truststore properties or use a bundle. The web servers are the only place where we have an adapter likeWebServerSslBundlethat can be used to easily implement support for the discrete properties in theInfoContributor.We could choose to treat the discrete web server SSL properties specially as you've done by using the adapter (they are almost certainly the most commonly used SSL properties), or only support bundles so everything is treated equally and document that only bundles are supported. Let's see what others think about this.

In either case, I don't think it's necessary to makeWebServerSslBundlea bean.

@jonatan-ivanov
Copy link
Member Author

Treating the discrete web server SSL properties specially somewhat makes sense to me since that's whatWebServerSslBundleis also doing but having more "non-bundle" properties (I forgot about rabbitmq, kafka, and rsocket) drives me to the other direction: even if it would be nice to have backward-compatible support for the web server properties, it might make more sense to only support bundles and treat everything equally.:)

@jonatan-ivanov
Copy link
Member Author

I removed support forserver.ssl.*properties (WebServerSslBundle) and added configprops support to set the warning threshold. Please let me know if this seems approximately ok and I can go ahed to write tests and docs.

@wilkinsona wilkinsona added for: team-meeting An issue we'd like to discuss as a team to make progress status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team type: enhancement A general enhancement and removed for: team-meeting An issue we'd like to discuss as a team to make progress status: waiting-for-triage An issue we've not yet triaged labels Jul 15, 2024
@jonatan-ivanov jonatan-ivanov force-pushed the cert-info branch 2 times, most recently from 1c6695d to 2cabf11 Compare August 3, 2024 00:17
@scottfrederick scottfrederick added the for: team-meeting An issue we'd like to discuss as a team to make progress label Aug 8, 2024
@mhalbritter
Copy link
Contributor

We need to discuss if we want to have this warning status and if we do, to which http code to map it.

@jonatan-ivanov
Copy link
Member Author

My two cents about having a dedicatedWARNINGstatus:

  • I imagine this status as something that means "Right now it's fine and not immediately actionable but if you don't do something, it might turn into an error sometime in the future".
  • Within this context, I think the appropriate HTTP status code is200since at the time of the response everything is still successfully working and load balancers should not remove the instance out of rotation, letting traffic flowing is totally fine.
  • Now if you ask me about the reason phrase... 😈 I think there is some wiggle room there since in the HTTP status line, the reason phrase is optional andOKis "only" recommended (not mandated) for200.So theoretically,200 Warningcan also be returned but200 OKwould describe a situation just fine: things are still OK. (But if you really want to make Spring Boot cool, my personal recommendation is200 🧨🧐🤔🤨🙊😬😱.🙃)

Other than the feature introduced in this PR (the certificate will expire soon), I think there could be a few more use-cases where aWARNINGstatus can be useful. For example for the disk space health indicator, when the disk is close to be full (within threshold), it's status can beWARNING.Another example can be updating local cache from an external resource periodically. If the update job fails the local cache might be stale. This could be fine for a while but can cause errors in the future.

@jonatan-ivanov jonatan-ivanov marked this pull request as ready for review August 14, 2024 05:30
@wilkinsona wilkinsona removed the status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team label Aug 14, 2024
@wilkinsona
Copy link
Member

We discussed this today and a couple of things came up:

  1. We're considering merging this without the custom status and then possibly adding that more broadly in a later milestone.
  2. It should be possible to see details of the certificates even when none of them have expired.

@wilkinsona wilkinsona removed the for: team-meeting An issue we'd like to discuss as a team to make progress label Aug 14, 2024
@jonatan-ivanov
Copy link
Member Author

What do you mean by the custom status? Do mean what is in the PR right now (WILL_EXPIRE_SOON) or the proposedWARNINGstatus? If he former, what should we use instead (UP)?

You can see all of the certs on the info endpoint, do you also want the health endpoint to show all of them under a ~valid/not-valid key?

@wilkinsona
Copy link
Member

Sorry, I quickly wrote that comment in the meeting and it was obviously too terse.

What do you mean by the custom status? Do mean what is in the PR right now (WILL_EXPIRE_SOON) or the proposedWARNINGstatus? If the former, what should we use instead (UP)?

We're not keen on an indicator-specific status such asWILL_EXPIRE_SOONbut quite like that idea of a more general purposeWARNINGstatus. We'd like to give ourselves a bit of time to confirm that it will indeed be general purpose.

In the meantime, I thinkUPwould be the status to use. Moritz is going to take a look at this.

You can see all of the certs on the info endpoint, do you also want the health endpoint to show all of them under a ~valid/not-valid key?

We'd like it to show something as additional details (so they'll only appear when details are visible) but we're not yet 100% sure exactly what it should be. Moritz is going to take a look at this too.

@jonatan-ivanov
Copy link
Member Author

jonatan-ivanov commented Aug 14, 2024

[..] but quite like that idea of a more general purposeWARNINGstatus. We'd like to give ourselves a bit of time to confirm that it will indeed be general purpose.

Great to hear! Btw since this is targeted to a milestone release would it make sense to usenew Status( "WARNING" )temporarily only in the health indicator without adding it to theStatusclass and see if we get any user feedback? Based on it (if any) we can introduce a generalWARNINGstatus in theStatusclass if it makes sense or change the health indicator to reportUPbefore the RC. /cc@mhalbritter

@mhalbritterfyi: I modified the health indicator a bit so that now it returns the whole chain instead of the separate certs since if a cert is invalid in a chain, the whole chain is invalid. I think this also helps troubleshooting if there is an invalid cert (I updated the examples in the description).

@mhalbritter
Copy link
Contributor

I'd remove the warning bit completely for M2. Then we can take our time and come up with aWARNINGstatus design, and if that's ready, we'll add the warning back to the ssl indicator (and possible to more health indicators). Does that sound good?

@jonatan-ivanov
Copy link
Member Author

jonatan-ivanov commented Aug 16, 2024

Makes sense, I changed it toUPalso rebased it onmainand squashed the changes.

@mhalbritter mhalbritter added this to the3.4.xmilestone Aug 19, 2024
mhalbritter pushed a commit that referenced this pull request Aug 19, 2024
@mhalbritter
Copy link
Contributor

Thanks@jonatan-ivanov!

The health endpoint now always adds details about invalid and valid chains.

@mhalbritter mhalbritter modified the milestones: 3.4.x, 3.4.0-M2 Aug 19, 2024
mhalbritter added a commit that referenced this pull request Aug 19, 2024
@jonatan-ivanov jonatan-ivanov deleted the cert-info branch August 19, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants