This handbook aims to provide Cloud Architects and DevOps Engineers with information on securing your cloud accounts, correctly, Exoscale. As such, an understanding of the Internet infrastructure is required.

Note, that this handbook only represents an introductory guide to securing your cloud setup and is by no means a comprehensive guide. Before building a cloud service, be sure to educate yourself about the current best practices and evaluate your security risks.

Basics

The first and primary means of protecting your cloud environment is by protecting access to it. An attacker can get into your cloud environment by stealing either your API keys and access your cloud account programmatically, or by stealing one of your staff’s credentials.

A loss of your API keys could mean the compromise of the entire cloud account. Some cloud providers can restrict API keys to certain services, but ops personnel often need an unrestricted API key.

As such, the API key must be guarded carefully. If the keys are lost, unauthorized persons could access the cloud account and run servers. For example, mine BitCoin, or even worse, steal company data, which may have to be publicly reported under GDPR and can lead to irreparable damage to your companies reputation.

Therefore, it is recommended that the ops personnel machines be treated with the same scrutiny as a production server. Additionally, tools like HashiCorp Vault can be deployed to manage access to keys and cloud resources better.

Note

“Third Party Software, Services and Products” - This document is not intended as a statement of quality or suitability of HashiCorp vault or any other third party software, services, or products. It is ultimately up to you to decide which tools you use.

A second factor you need to consider is passwords: a user may misplace or leak their passwords, allowing an attacker to access the cloud console. To prevent this, multi-factor authentication can be deployed. Multi-factor authentication means that a staff member must provide more than one of the following factors to authenticate:

  • Something they know (e.g., a password)
  • Something they have (e.g., a hardware token or a generated time code)
  • Something they are (e.g., biometric data)

A prevalent option to deploy MFA is to use TOTP, or as it has become popular, Google Authenticator. TOTP is an open standard that can be implemented with very little work by anyone, and cloud providers often support it.

When access is secured, the next point of possible attack are services that are publicly available. These can be SSH or RDP remote access left open to the world with weak credentials, databases with default credentials, and unpatched services with known security vulnerabilities. The IPv4 address space can be scanned in a matter of hours for such weak points, so putting online such a service will lead to a breach within a day.

The first line of defense against such exposure is the Exsocale built-in firewall with security groups. All services that are not meant for the general public should be restricted to your companies network. If you need to access any such service while on the road, deploy a VPN instead of accessing them directly. Similarly, object storage buckets should be configured with a non-public ACL. Additionally, it is recommended that services that do not support encryption only be accessed using a VPN or an SSH jump host. The passwords would be traveling over the Internet in plain text, and an IP limitation is not enough to protect them.

As a second layer of defense, it is imperative that you also harden your systems: use cryptographic keys instead of passwords, deploy TLS encryption wherever possible and update your systems regularly, ideally weekly. Failing to implement a second layer of defense means that your services will be wide open if somebody makes a mistake on the first level. Do not trust your firewall as your only security measure!

Deploy Scanning Tools

Trust is good, and control is better. We are humans, and humans make mistakes. As ops personnel, we should accept that somewhere, and someone will make a mistake.

Instead of trying to assign blame, it is recommended that ops teams follow a blameless culture and try to harden their systems by deploying tools to detect security vulnerabilities before others do. These tools can be run in an automated fashion regularly to identify issues before they become incidents.

The Exoscale API can be scripted to extract all IP addresses currently in use and to scan them automatically. Some noteworthy tools to perform the actual scan include nmap, the Burp Suite, Nessus, OpenVAS or CScanner.

Firewall - Best Practices

As a general rule, your firewall should be implemented so that it filters both ingress (going towards your servers) and egress (going out from your servers). Furthermore, your servers should be deployed into separate security groups (security zones) based on their role.

For example, a database will allow connections from the application servers, but not from the Internet. Therefore, database servers should be put into their security groups and allow connections only from the application server security group.

If remote access to the database servers is desired, it is recommended that this remote access be done using a jump host or a VPN, as database servers do not always allow for deploying encrypted connections using TLS (see below). Passwords may travel plain text over the Internet if accessed directly.

Public services like HTTP (port 80) or HTTPS (port 443) should, of course, be allowed from anywhere (0.0.0.0/0 and ::/0), but private dashboards, again, should be limited to internal IPs or accessed via a VPN and be protected by a password.

Outbound access to HTTP and HTTPS should generally be enabled. Your operating system will need to download packages from the Internet unless you deploy a package proxy, which is strongly recommended. DNS (port 53 TCP and UDP) should also allowed outbound but not inbound.

ICMP should generally be allowed both inbound and outbound, otherwise IP fragmentation will not work, and users behind a VPN or other type of tunnel will not access your services. ICMP echo requests and responses (ping) can be disabled if needed, but the security benefit is minimal.

SSH - Best Practices

When deploying SSH as a service, you need to keep in mind two essential aspects: the security component itself, such as cipher suites and login credentials, and the user aspect, as in who has access.

On the security component, make sure that your SSH server uses only modern cipher suites (as mentioned in the TLS part above). It is also strongly recommended to deploy SSH keys and disable password login entirely. If you need to jump over one server to get to another, use SSH agent forwarding and never copy your SSH key to a server.

SSH keys should always be 4096 bits or better and should be protected by a password. Hardware tokens, like the Yubikey, can also store SSH keys, but the configuration can be quite complicated and may not be supported on all platforms.

On the access side of things, it is crucial to think about who has access to a server and what the procedure is to add or remove an employee’s access. This can be achieved in multiple ways, ranging from configuration management tools that manage the keys to LDAP authentication. However, keep in mind that if your authentication relies on a third-party service such as an LDAP server being available, that server must be online for your employees to log in to the server.

In addition to managing who can access the server, it is also essential to make sure that people cannot log in from unauthorized locations. The SSH service should be restricted to company IP ranges only. If an SSH service has to be run on public access, it is strongly recommended that the SSH access not be run in port 22 or 2222. This recommendation is that you should filter your logs of severe attacks, which is quite hard if you have them full of script kiddie-style probes.

HTTP - Best Practices

HTTP or HyperText Transfer Protocol is a text-based communication standard that runs most of today’s Internet. Nowadays, even database servers, such as Redis, can be accessed over HTTP, and many systems possess an HTTP-based API. This makes it especially important to secure HTTP servers properly.

Deploying TLS, as mentioned above, helps with MITM-type attacks where credentials and sensitive information are snooped off the wire. As TLS is free, both computationally and certificate-wise, it is strongly recommended that TLS be deployed on all services. (TLS brings a performance improvement for HTTP due to HTTP/2.)

When TLS is deployed, all HTTP traffic should be permanently redirected to the same page’s HTTPS version, and a HSTS header should be used. After an initial period, the HSTS header should be set for a one-year expiry so browsers and clients can cache that the site is only available over HTTPS.

Additionally, all headers mentioned in the service securityheaders.io should be considered for deployment, especially when sensitive company data or private data is involved. The Mozilla Observatory is an excellent resource for doing an all-in-one check on public services for these kinds of things.

HTTP services can provide a large attack surface, and describing all of the potential vulnerabilities is beyond this handbook’s scope. However, best practices in software development mitigate the most common problems such as XSS, SQL injection, CSRF, insecure URL access, etc. In addition to securing the application, however, it is recommended to deploy a Web Application Firewall, especially when the web application itself isn’t undergoing regular maintenance and security checks.

Server Side Request Forgery

When deploying services in a closed security group, these services should still use some form of authentication. This requirement is that the application running on the application server may be tricked into accessing internal services, such as a redis database, and exposing data that way. This kind of attack is called Server Side Request Forgery.

Server Side Request Forgery is especially problematic in a cloud setup because almost every cloud provider offers a metadata API that the instance can use to fetch data about itself, including password hashes, SSH public keys and most importantly, the user data, which is a provisioning data set that often contains sensitive information such as TLS private keys, etc.

Understanding TLS Encryption

In addition to authentication for all services, the connections should also use encryption wherever possible. This is an additional layer of security, and it can even enable you to access these services over an otherwise unencrypted connection and using TLS client certificates can also serve as an authentication mechanism.

TLS is based on the x509 certificate infrastructure (public-private key cryptography). Each service must present a certificate by a certificate authority accepted by the connecting client. The certificate must include the hostname of the service in the common name or the subjectAltName field. Otherwise, it will not be accepted. This check must remain enabled at all times. Otherwise, a man in the middle attack could go undetected.

The certificate authority signature ensures that the server is indeed the server it claims to be, and no man in the middle attack is taking place.

Similarly, the client may also present a certificate signed by a certificate authority to authenticate itself. This is only required if the service is configured to do certificate authentication. In this case, the client certificate will often contain the username of the connecting client in the standard name fields.

The certificate authority can be one of the public CAs, like LetsEncrypt, or for internal services, it can be privately managed with tools like OpenSSL, cfssl, or Terraform TLS.

Once the TLS infrastructure is deployed, the server must provide the encryption to be configured correctly. Tools like Mozilla Observatory scan public services and report if old cipher suites are still used. A list of modern cipher suites to configure can be found on the Mozilla wiki.

As mentioned above, scanning tools provide a great way to check if the cipher suites in use become inadequate automatically.

Software Updates

Even the most secure system can fall prey quite easily if software updates are not applied regularly. These updates often come out together with announcements of security vulnerabilities (CVEs), so these security vulnerabilities will be exploited quite quickly if not done in a timely fashion.

You may be tempted to roll out security updates to only your most critical services, but keep in mind that most breaches happen due to a fringe system that was not appropriately secured. Once inside your infrastructure on a fringe system, an attacker can work their way through to the more critical servers because systems often trust each other out of necessity to conduct business.

As you may know, applying security updates is quite hard. This is due to two reasons: first of all, using updates often requires downtime, and rolling back can be hard to impossible. Second, there are usually no tests for the system’s functionality to tell you if the upgrade went through properly.

The first problem can be alleviated by a concept called immutable infrastructure. This concept takes Infrastructure as Code tools, such as Terraform, Ansible, and Docker. Instead of upgrading a server, it directly installs a new server in an automated fashion and fails over the service to that server. This, of course, is not always possible.

The issue of testing is closely related to Infrastructure as Code, as testing a system that can be installed in an automated fashion is much easier than when you have to build a test setup manually. Automated tests can be conducted in various ways, but more often than not, involve end-to-end testing tools of the service such as Selenium. A useful abstraction if Selenium is a behavior description language called Cucumber, which can be run, for example, with Python Behave or Behat. These tools allow you to decouple the high-level description of a test from the low-level implementation and make it easier to maintain.

Once you have tests in place and Infrastructure as Code to help you build test environments, it becomes much easier to test and roll out updates.

Logging & Backup

Your servers should send all relevant logs to a central log server that is not reachable. This log server should also be deployed to a separate Exoscale organization that does not have its access keys stored on any servers. These logs can be used to trace the steps of an attack in case of a security breach.

Similarly, all servers should be backed up so that even the compromise of the server doing the backups should not let the attacker destroy the backups. This can be achieved by a multi-tier backup architecture, such as the one used by Bacula/BareOS. Alternatively, a secondary backup of SOS buckets can be created into a second account to prevent backup loss.

Ongoing

Your system security is only as strong as the weakest link. Therefore you must follow a layered security approach and conduct regular tests of everything where humans can fail.

In general, no handbook can provide a comprehensive guide to securing your cloud service as each infrastructure presents unique challenges. Be sure to educate yourself on the current best practices and do your risk assessment.