How Beacon is Addressing Multi-Package Programming Environments and the Myth of the Golden Container

March 4, 2021
Kirat Singh

How Beacon is Addressing Multi-Package Programming Environments and the Myth of the Golden Container

One of the great things about cloud-based and open source software is how many options there are and how fast everything moves. That is also one of the most difficult things about it. Python, machine learning, data science, and other ecosystems that we work in are very diverse and move fast. Managing compatibility between containers, packages, and your own code is administrative overhead. Manage compatibility across your 400 developers and their competing needs can get out of hand very quickly.

The golden container

When we started Beacon, we thought that we could address this problem by providing a ‘golden container’ for everyone, carefully evaluating package compatibility and moving everyone forward together. This worked well initially, but became too confining as the needs of our clients grew and became more diverse. Since all our services are containers, we worked on a mechanism to build Docker images on the fly. We created templates that let you derive an image from a standard foundation and customize it for your needs. Now that you can build these on the fly, you can easily schedule tests against a new version to verify that it is compatible. You can run and evaluate applications, grid compute tasks, and batch jobs with this new ‘fork’. And once you have stabilized it, you can publish it to your team or end-users.

It was still hard to build an image from scratch. Since our ecosystem is primarily Python and C++ libraries, we built a package management framework based on Conda which encapsulates the Beacon Core environment and allows you to extend it. It ended up being super easy to then add things like Snowflake compatibility, C# language support, and GPU or machine learning packages as layers on top of our standard core environment. This then creates a very flexible and powerful package environment using layered containers that makes it easier for every client to operate at their optimal mix of consistency and independence. Clients can schedule containers onto elastic grid compute for a wide range of workloads, including application backends, batch jobs, or Python kernels within Jupyter or VSCode.

What the fork?

When you have multiple groups of developers, each with their own objectives, package dependencies, and sometimes working in different languages, trying to keep everyone in the same programming environment is unrealistic. As soon as one group needs the next version of a package, but that update breaks some older functions, people create different branches, or forks, in the code. This is often done with the best of intentions to bring everything back together after the next update. But after a few iterations of this, reconciling the number of branches becomes a herculean task, one that may slow or even stop new functionality from being implemented until it is complete. Since this is not an acceptable scenario for most firms in our industry, we needed to find another solution.

Adding layers of functionality

Working with the Conda package, dependency, and environment manager, we built a system of layered containers that delivers the flexibility of using different packages with the consistency of a common, tested environment for everyone. As developers add new packages or versions, we create additional containers for just the differences between your base environment and the additions. Workloads run in the minimum-acceptable layer, significantly improving operational stability. For example, functions that do not have additional packages or dependencies run in the base layer, while experimental ones run in a different layer.

Container layers are completely independent and isolated from one another, allowing developers to direct functions to a specific container and quickly test in a variety of different environments. For example, new or updated functions can be quickly tested against the production environment. Or experimental functions with unique dependencies can be quickly deployed and called from within production code, without affecting overall stability or having to re-test everything. This layered container model can greatly boost development speed and agility, without affecting overall performance or stability. Firms can get new and updated analytics to the front office faster, and more easily introduce experimental features such as machine learning.

Layers also provide other benefits

Beacon’s layered container architecture provides some additional useful benefits. Conda’s dependency management is very finely focused, only installing updates that are needed, and providing users with details and control over cascading dependencies, such as version control and conflict avoidance. This improves stability and reduces the likelihood of introducing unintended consequences.

Organizations can define environments for specific uses. For example, in addition to the base packages used in production, each developer can be given a standard configuration of so many CPUs, so much RAM. Or exploratory options can provide easy access to different hardware, types and versions of operating systems, GPUs, languages, and other valuable variations. Beacon also supports Docker Compose, for running multiple related containers in an isolated network stack, and is infrastructure agnostic to support your choice of cloud computing service.

Turning myth into reality

We did not find the golden container, but we think that our quest has turned up something much better. Combining flexible layers of containers with sophisticated package management provides everyone in the organization with the environment they need. Developers can safely experiment with a wide variety of packages and environments. And production gets control and stability without having to constantly manage compatibility for everyone.