Nesting XenBlanket on AWS

image.jpeg

As a security company with a focus on utilizing hypervisor technology, the team at Star Lab wanted to research potential hypervisor-based solutions for cloud security. The demand for cloud services has skyrocketed worldwide, with an increasing amount of critical services being migrated to the cloud. Sophisticated adversaries will seek to exploit cloud platforms to access critical / sensitive information, and these new threats demand novel virtualization-based cyber security tools and techniques to keep pace and blunt the adversaries attacks.

Configuring XenBlanket on AWS EC2

The Amazon Web Service (AWS) platform is a well-known cloud computing platform, making it a natural choice for researching realistic solutions. For this effort, we utilized Amazon’s Elastic Compute Cloud (EC2) services. EC2 allows users to create virtual machines (VMs) on demand; however, Amazon does not provide users access to the hypervisor underlying the VMs.The figure below depicts a typical EC2 deployment in which Amazon utilizes Xen as the hypervisor:

A typical EC2 deployment utilizing a Xen hypervisor

A typical EC2 deployment utilizing a Xen hypervisor

In order to implement our new cyber defenses, we needed control of the hypervisor layer. Digging into existing research uncovered the XenBlanket project. XenBlanket uses nested virtualization to achieve similar goals; however, XenBlanket was not actively maintained and did not work with modern versions of Xen.

As part of our research efforts, we updated XenBlanket to work with modern versions of Xen and submitted our changes to the upstream project.

The figure below depicts an EC2 deployment utilizing nested virtualization to enable user control of the inner hypervisor:

An EC2 deployment utilizing a XenBlanket hypervisor

An EC2 deployment utilizing a XenBlanket hypervisor

XenBlanket modifies Xen to allow the Xen hypervisor to run on Amazon EC2 instances as well as VM instances from other cloud providers. XenBlanket acts as an abstraction layer between the VMs and the cloud provider’s platform. This provides a way to homogenize cloud infrastructure across cloud providers. This enables transparent deployment across diverse cloud platforms such as AWS, Microsoft Azure, and Google Cloud Platform. This increases flexibility since a user can freely transfer VMs without worrying about the underlying hypervisor constraints of a specific cloud-provider. Therefore, this cloud-agnostic behavior stops a user from being locked into a specific cloud-provider.

New Security Features Enabled by XenBlanket

With XenBlanket functioning, we were ready to move on to developing security mechanisms that leverage the hypervisor. We investigated techniques to provide cyber defense, threat detection, and response.

One technique to defend against attacks is a live migration function. Live migration allows transferring a running VM between two instances of the XenBlanket hypervisor (on separate physical machines) without interrupting the VM. This method is the foundation for a moving target defense strategy that may be be triggered by specific events / threats.

The figure below depicts the live migration concept:

Live migration between XenBlanket instances

Live migration between XenBlanket instances

It’s not enough to just defend a system. One must also assume that dedicated adversaries will eventually find a way around defenses and obtain some level of persistence or implantation in a VM. Without continuous monitoring or introspection of the VMs, an attacker will likely go undetected.

VM introspection allows us to run a monitoring agent outside of the infected VM. From outside the VM, we can observe the VM state and verify its status or trigger alerts when anomalies are detected. In addition, a Linux Security Module (LSM) runs within each Linux VM and reports events to an external monitor. Sensor fusion of the LSM-collected data with the VM introspection state feeds into our analysis tools to detect and respond to attacks.

Once the system determines that a VM is under attack, cyber response techniques are launched to combat the threat and mitigate damage. For example, if one VM is under attack, the other co-resident VMs can be migrated to safety while the compromised VM is shutdown. This technique drastically reduces the time an attacker has to compromise a neighboring VM.

We integrated these techniques into a complete security approach to cloud security for an R&D project. In our implementation of the system, we used a customized XenBlanket image which contains various introspection techniques. Each virtual machine that runs on top of the XenBlanket image uses a hardened Linux Kernel, similar to one that would be provided through the use of Titanium.

These techniques come together in our prototype to form a layered defense-in-depth strategy. We anticipate that given the state of modern software eventually an attacker will compromise a VM. When this happens we mitigate the risk by detecting the intrusion and then (a) preventing them from extending their attack to other VMs by migrating them away and (b) time limiting their access by halting the compromised VM. As an added bonus rapid shutdown allows us to capture data that is useful for forensics and incident response. If a virtual machine is compromised, the attacker will still be confined to XenBlanket image. However, the custom XenBlankets’s introspection techniques would detect that one of the guest VMs has been compromised. The XenBlanket images would take appropriate actions such as migrating the other VMs or stopping the compromised VM and saving related data for incident response teams.

Despite the virtualization and security isolation for each service running on the guest VM, the service still behaves as normal. Services can run as usual, without being aware that the service is in a VM that is migrated at various intervals.

Implementation Results

Our biggest concern with this approach was the usability of the service during migration events. We assumed users would notice considerable lag. To test this we built a representative system running target applications, such as Firefox and Thunderbird, and measured interactive performance during migrations. We were excited to discover that there was no perceptible lag.

Star Lab has integrated these techniques with a custom XenBlanket image and hardened Linux kernel for the running VMs. This defense-in-depth approach has proven to be successful as a more secure cloud platform. Moreover, initial testing with applications like Firefox and Thunderbird did not present any noticeable lag or interruption, even during migration events.

For more information, please see Kelli Little and Christopher Clark’s presentation at the 2019 Xen Summit.