Cloud Security
This handbook aims to provide Cloud Architects and DevOps Engineers with information on securing your cloud accounts, correctly, Exoscale. As such, an understanding of the internet infrastructure is required.
Note that this handbook only represents an introductory guide to securing your cloud setup and is by no means a comprehensive guide. Before building a cloud service, be sure to educate yourself about the current best practices and evaluate your security risks.
Basics
The first and primary means of protecting your cloud environment is by protecting access to it. An attacker can get into your cloud environment by stealing either your API keys and accessing your cloud account programmatically or by stealing a member of your staff’s credentials.
A loss of your API keys could mean the compromise of the entire cloud account. Some cloud providers can restrict API keys to certain services, but operations personnel often need an unrestricted API key.
As such, the API key must be guarded carefully. If the keys are lost, unauthorized persons could access the cloud account and run servers. For example, mine bitcoin, or even worse, steal company data, which may have to be publicly reported under GDPR and can lead to irreparable damage to your companies reputation.
Therefore, it is recommended that the operations personnel machines be treated with the same scrutiny as a production server. Additionally, tools like HashiCorp Vault can be deployed to better manage access to keys and cloud resources.
Note
“Third Party Software, Services and Products” - This document is not intended as a statement of quality or suitability of HashiCorp vault or any other third-party software, services, or products. It is ultimately up to you to decide which tools you use.
A second point you need to consider is passwords: a user may misplace or leak their passwords, allowing an attacker to access the cloud console. Two-factor authentication can be deployed to prevent this. Two-factor authentication means that a staff member must provide more than one of the following factors to authenticate:
- Something you know (a password)
- Something you have (a hardware token or a generated time code)
- Something you are (biometric data)
A prevalent option to deploy MFA is to use TOTP, or Google Authenticator as it has become popular. TOTP is an open standard that can be implemented with very little work by anyone, and cloud providers often support it.
When access is secured, the next point of possible attack is services that are publicly available. For example, SSH or RDP remote access can be left open to the world with weak credentials, databases with default credentials, and unpatched services with known security vulnerabilities. The IPv4 address space can be scanned in a matter of hours for such weak points, so putting online such a service will lead to a breach within a day.
The first line of defense against such exposure is the Exsocale built-in firewall with security groups. All services that are not meant for the general public should be restricted to your companies network. If you need to access any such service while on the road, deploy a VPN instead of accessing them directly. Similarly, object storage buckets should be configured with a non-public ACL. Additionally, it is recommended that services that do not support encryption only be accessed using a VPN or an SSH jump host. The passwords would be traveling over the internet in plain text, and an IP limitation is not enough to protect them.
As a second layer of defense, it is imperative that you also harden your systems: use cryptographic keys instead of passwords, deploy TLS encryption wherever possible and update your systems regularly, ideally weekly. Failing to implement a second layer of defense means that your services will be wide open if somebody makes a mistake on the first level. Do not trust your firewall as your only security measure!
Deploy Scanning Tools
Trust is good, and control is better. However, we are humans, and humans make mistakes. As operations personnel, we should accept that somewhere, and someone will make a mistake.
Instead of assigning blame, it is recommended that operations teams follow a blameless culture and try to harden their systems by deploying tools to detect security vulnerabilities before others do. These tools can be run in an automated fashion regularly to identify issues before they become incidents.
The Exoscale API can be scripted to extract all IP addresses currently in use and to scan them automatically. Some noteworthy tools to perform the actual scan include nmap, the Burp Suite, Nessus or OpenVAS.
Firewall - Best Practices
As a general rule, your firewall should be implemented so that it filters both ingress (going towards your servers) and egress (going out from your servers). Furthermore, your servers should be deployed into separate security groups (security zones) based on their role.
For example, a database will allow connections from the application servers but not from the internet. Therefore, database servers should be put into their security groups and allow connections only from the application server security group.
Suppose remote access to the database servers is desired. In that case, it is recommended that this remote access be done using a jump host or a VPN, as database servers do not always allow for deploying encrypted connections using TLS (see below). Passwords may travel plain text over the internet if accessed directly.
Public services like HTTP (port 80) or HTTPS (port 443) should, of course, be allowed from anywhere (0.0.0.0/0 and ::/0), but private dashboards, again, should be limited to internal IPs or accessed via a VPN and be protected by a password.
Outbound access to HTTP and HTTPS should generally be enabled. Your operating system will need to download packages from the internet unless you deploy a package proxy, which is strongly recommended. DNS (port 53 TCP and UDP) should also be allowed outbound but not inbound.
ICMP should generally be allowed both inbound and outbound, otherwise IP fragmentation will not work, and users behind a VPN or other type of tunnel will not access your services. ICMP echo requests and responses (ping) can be disabled if needed, but minimal security benefit.
SSH - Best Practices
When deploying SSH as a service, you need to keep in mind two essential aspects: the security component itself, such as cipher suites and login credentials, and the user aspect, as in who has access.
Ensure that your SSH server uses only modern cipher suites (as mentioned in the TLS part above). It is also strongly recommended to deploy SSH keys and disable password login entirely. If you need to jump over one server to get to another, use SSH agent forwarding and never copy your SSH key to a server.
SSH keys should always be 4096 bits or better and should be protected by a password. Hardware tokens, like the Yubikey, can also store SSH keys, but the configuration can be pretty complicated and may not be supported on all platforms.
On the access side of things, it is crucial to think about who has access to a server and what the procedure is to add or remove an employee’s access. This can be achieved in multiple ways, ranging from configuration management tools that manage the keys to LDAP authentication. However, keep in mind that if your authentication relies on a third-party service such as an LDAP server being available, that server must be online for your employees to log in to the server.
In addition to managing who can access the server, it is also essential to make sure that people cannot log in from unauthorized locations. Therefore, the SSH service should be restricted to company IP ranges only. If an SSH service has to be run on public access, it is strongly recommended that the SSH access not be run in port 22 or 2222. This recommendation is to filter your logs of severe attacks, which is quite complicated if you have them full of script kiddie-style probes.
HTTP - Best Practices
HTTP or HyperText Transfer Protocol is a text-based communication standard that runs most of today’s internet. Nowadays, even database servers, such as Redis, can be accessed over HTTP, and many systems possess an HTTP-based API. This makes it especially important to secure HTTP servers properly.
Deploying TLS, as mentioned above, helps with MITM-type attacks where credentials and sensitive information are snooped off the wire. As TLS is free, both computationally and certificate-wise, it is strongly recommended that TLS be deployed on all services. (TLS brings a performance improvement for HTTP due to HTTP/2.)
When TLS is deployed, all HTTP traffic should be permanently redirected to the same page’s HTTPS version and a HSTS header should be used. After an initial period, the HSTS header should be set for a one-year expiration, so browsers and clients can cache that the site is only available over HTTPS.
Additionally, all headers mentioned in the service securityheaders.com should be considered for deployment, especially when sensitive company data or private data is involved. The Mozilla Observatory is an excellent resource for doing an all-in-one check on public services for these kinds of things.
HTTP services can provide a large attack surface, and describing all potential vulnerabilities is beyond this handbook’s scope. However, best practices in software development mitigate the most common problems such as XSS, SQL injection, CSRF, insecure URL access, etc. However, in addition to securing the application, deploying a Web Application Firewall is recommended, especially when the web application itself isn’t undergoing regular maintenance and security checks.
Server Side Request Forgery
When deploying services in a closed security group, these services should still use some form of authentication. This requirement is that the application running on the application server may be tricked into accessing internal services, such as a Redis database, and exposing data that way. This kind of attack is called Server Side Request Forgery.
Server Side Request Forgery is especially problematic in a cloud setup because almost every cloud provider offers a metadata API that the instance can use to fetch data about itself, including password hashes, SSH public keys, and most importantly, the user data, which is a provisioning data set that often contains sensitive information such as TLS private keys, etc.
Understanding TLS Encryption
In addition to authentication for all services, the connections should also use encryption wherever possible. This is an additional layer of security. It can even enable you to access these services over an otherwise unencrypted connection. Using TLS client certificates can also serve as an authentication mechanism.
TLS is based on the x509 certificate infrastructure (public-private key cryptography). Each service must present a certificate by a certificate authority accepted by the connecting client. The certificate must include the hostname of the service in the common name or the subjectAltName field. Otherwise, it will not be accepted. This check must remain enabled at all times. Otherwise, a man in the middle attack could go undetected.
The certificate authority signature ensures that the server is the server it claims to be, and no man-in-the-middle attack occurs.
Similarly, the client may also present a certificate signed by a certificate authority to authenticate itself. This is only required if the service is configured to do certificate authentication. In this case, the client certificate will often contain the username of the connecting client in the standard name fields.
The certificate authority can be one of the public CAs, like LetsEncrypt, or for internal services, it can be privately managed with tools like OpenSSL, cfssl, or Terraform TLS.
Once the TLS infrastructure is deployed, the server must provide the encryption to be configured correctly. Tools like Mozilla Observatory scan public services and report if old cipher suites are still used. A list of modern cipher suites to configure can be found on the Mozilla wiki.
As mentioned above, scanning tools provide a great way to automatically check if the cipher suites in use become inadequate.
Software Updates
Even the most secure system can fall prey quite easily if software updates are not applied regularly. These updates often come out together with announcements of security vulnerabilities (CVEs), so these security vulnerabilities will be exploited quite quickly if not done in a timely fashion.
You may be tempted to roll out security updates to only your most critical services, but keep in mind that most breaches happen due to a fringe system that was not appropriately secured. In addition, once inside your infrastructure on a fringe system, an attacker can work their way through to the more critical servers because systems often trust each other out of necessity to conduct business.
As you may know, applying security updates is quite hard. This is due to two reasons: first, updates often require downtime, and rolling back can be hard to impossible. Second, there are usually no tests for the system’s functionality to tell you if the upgrade went through properly.
The first problem can be alleviated by a concept called immutable infrastructure. This concept takes Infrastructure as Code tools, such as Terraform, Ansible, and Docker. Instead of upgrading a server, it directly installs a new server in an automated fashion and fails over the service. This, of course, is not always possible.
The issue of testing is closely related to Infrastructure as Code, as testing a system that can be installed in an automated fashion is much easier than when you have to build a test setup manually. Automated tests can be conducted in various ways, but more often than not, involve end-to-end testing tools of the service such as Selenium. A useful abstraction if Selenium is a behavior description language called Cucumber, which can be run, for example, with Python Behave or Behat. These tools allow you to decouple the high-level description from the low-level implementation and make it easier to maintain.
Once you have tests in place and Infrastructure as Code to help you build test environments, it becomes easier to test and roll out updates.
Logging and Backup
Your servers should send all relevant logs to a central log server that is not reachable. This log server should also be deployed to a separate Exoscale organization that does not have its access keys stored on any servers. These logs can be used to trace the steps of an attack in case of a security breach.
Similarly, all servers should be backed up so that even the compromise of the server doing the backups should not let the attacker destroy the backups. This can be achieved by a multi-tier backup architecture, such as the one used by Bacula/BareOS. Alternatively, a secondary backup of SOS buckets can be created into a second account to prevent backup loss.
Various backup solutions are supported on our platform.
Ongoing
Your system security is only as strong as the weakest link. Therefore you must follow a layered security approach and conduct regular tests of everything where humans can fail.
In general, no handbook can provide a comprehensive guide to securing your cloud service as each infrastructure presents unique challenges. Be sure to educate yourself on the current best practices and do your risk assessment.