MLflow’s Solutions to ML Systems Configuration Debt

8 min readApr 14, 2024

We all largely agree that machine learning and artificial intelligence constitute a powerful set of tools, facilitating a vast array of useful use cases that have enabled many businesses to conduct their activities differently — more efficiently and more accurately.

This may be why, in the previous decade, so many businesses hired data scientists and machine learning engineers and built their AI teams to explore these use cases.

However, as the saying goes, there is no such thing as a free lunch, harvesting these benefits comes with its costs. This “technical debt,” as termed by a 2015 paper published by Google, highlights the underlying challenges. In fact the paper is titled “Hidden Technical Debt in Machine Learning Systems” and I view this paper as a wonderful starting point for breaking down the components of “real-world” ML systems.

Honestly, I am particularly drawn to the sole figure presented in the paper, shown below. It distinctly illustrates that only a small fraction of real-world ML systems consist of the actual ML code (that black box there)!

Source: Sculley, David, et al. “Hidden technical debt in machine learning systems.” *Advances in neural information processing systems* 28 (2015).

In this post, I aim to explore one of the major themes (blocks in Figure 1) and illustrate how MLflow addresses it. So, I will discuss the “Configuration Debt”, a type of subtle technical debt that, although often underappreciated, plays a pivotal role in the scalability and efficiency of these systems.

The focus of the post: How MLflow can alleviate Configuration Debt

What We Mean by Configuration Debt in ML Systems

To understand what we mean by Configuration Debt, it might be helpful to first clarify what we do NOT mean! The configuration debt we are discussing does NOT include ML system infrastructure configuration, as this falls under the “Serving Infrastructure” block in the figure. Instead, we are referring to the configurations directly related to the ML models, including feature selection, algorithm selection, parameter settings, and so on.

In that sense, ML systems can have an enormous number of configurable options. For instance, there are countless combinations of features that can be selected, each potentially leading to a different model with varying performance levels. At the algorithm level, there are also numerous settings and parameters that can be configured, not to mention the various data preprocessing and post-processing options available.

The authors of the mentioned paper highlight that as ML systems evolve, the volume of configurations can surpass the traditional codebase, amplifying the potential for errors! Each line of configuration, whether it involves the inclusion of a specific data feature or the application of a particular algorithm parameter, carries the risk of mistakes. This complexity underscores the importance of careful management to maintain system integrity and performance.

What does this has to do with MLflow?

Now, as we move into my argument and exploration of how MLflow addresses these issues, let’s first understand what MLflow is. According to their official website, MLflow is defined as follows:

MLflow is an open-source platform, purpose-built to assist machine learning practitioners and teams in handling the complexities of the machine learning process.

Well, it’s so appealing to map the two. I mean the complexities that the Google paper highlighted and what MLflow promises to offer in terms of addressing and handling these complexities! This is what I am trying to do in this post, and I hope that I do well.

Four Main Challenges regarding Configuration Debt of ML systems:

Although the paper did not “number” these challenges, if we read the related section, we can clearly see that it outlined four main pitfalls of inadequate configuration management in ML systems.

Challenge 1: Dynamic and Inconsistent Feature Handling

This challenge is around the fact that ML models often require frequent updates in their configuration due to changes in data formats or feature availability over time. This necessity for constant adjustment can lead to errors if not managed carefully. In fact this is a very oftenly hapening challenge.

An example could be from the retail industry. Suppose we have a machine learning model designed to predict consumer behavior based on online shopping data. This model uses several features derived from user activity, such as the number of items viewed, time spent on the website, and recent purchases.

Imagine that the method for collecting data on the time users spend on the website is updated — from a session-based tracking to a more granular, event-driven tracking system. In other words, this change means that the data format for the time_spentfeature might now include more detailed timestamps or additional data points that weren’t previously captured. Now, our model that predicts consumer behavior might fail to adjust to changes in data collection methods, leading to incorrect predictions.

Challenge 2: Operational Constraints in Production

This is a pretty common challenge, that the features used in development may not be viable in production due to operational constraints, requiring configurations that can adapt without compromising system performance.

I clearly remember a predictive maintenance model that I worked on and used high-resolution images during development. However, the model’s performance was drastically lower in production because it had to work with lower-resolution images due to bandwidth constraints.

Challenge 3: Resource Optimization Issues

Even though the paper mentioned that 8 years ago, with the increasing applications of LLMs, this challenge has become more frequent. Such misconfigurations can lead to inefficient resource use during model training and execution, such as excessive memory consumption. For example, a natural language processing model might consume unnecessary GPU resources for certain tasks due to an oversight in configuration.

Challenge 4: Inter-feature Dependencies and Constraints

The dependencies between different features can introduce constraints, such as latency, which require careful management to ensure the model’s performance isn’t compromised.

This might be especially common in real-time systems. For example, a real-time recommendation system may struggle with latency issues if the model’s complex feature interactions are not optimally configured.

Principles of good configuration systems and how MLflow can help in each:

I feel one of the invaluable contributions of the authors regarding the configuration is that they came up with best practices or principles to alleviate this configuration debt, and I think it is easy to comply with if we integrate MLflow into our ML system. Here is how:

Principle 1: It should be easy to specify a configuration as a small change from a previous configuration.

What comes to my mind when thinking about this principle is MLflow Projects. I feel it is one of the underappreciated components of MLflow. In this context, MLflow Projects allow users to define their machine learning code and dependencies in a repeatable way, using MLproject files. This setup facilitates easy iteration and modification of configurations.

I always think about how easy and beneficial it is to use version control systems (e.g., Git) along with MLflow, and I think you would agree if you realized that each MLflow project can be either a directory of files or a Git repository containing your code!

Designers of ML systems can make and track small changes to configurations incrementally, ensuring that each alteration is documented and reversible. We can have multiple project files with different configurations and keep the changes between codes minimal!

Principle 2: It should be hard to make manual errors, omissions, or oversights.

In this context, I think MLflow offers several features that align with the principle of making it hard to make manual errors, omissions, or oversights. I see this in three features of MLflow.

First, MLflow centralizes the management of configurations, which is pivotal for ensuring consistency across various deployment environments like development, staging, and production. This centralized approach simplifies the management and auditing of configuration changes, helping maintain correctness wherever the models are deployed.

Moreover, by integrating seamlessly with version control systems such as Git, MLflow helps ensure that all modifications to configurations and code are meticulously tracked, reviewed, and approved. This integration aids in detecting potential errors early in the development cycle and preserves a history of changes, which is invaluable for quick rollbacks if needed.

With MLflow’s tracking capabilities, we can log detailed information about parameters, metrics, and models used in each experiment. This feature is instrumental in pinpointing which configurations work best and ensuring successful setups are easily reproducible and well-documented, minimizing the chance of oversight.

Principle 3: It should be easy to see, visually, the difference in configuration between two models.

I think this is one of the areas MLflow has pioneered in. We can always use its user-friendly web interface, which allows we to compare experiments side-by-side.

This interface visually displays differences in parameters and metrics, making it straightforward to see how configurations differ across experiments or model versions. It literally lifts a huge coding burden!

Principle 4: It should be easy to automatically assert and verify basic facts about the configuration.

In this sense, MLflow can be configured to incorporate automated validation checks within its workflows. This means we can set up scripts or use built-in features to verify that configurations meet certain predefined criteria before they are deployed or used in experiments. For example, we can automate checks to ensure that all required parameters are not only present but also fall within acceptable ranges. We only need MLflow’s tracking APIs along with some basic conditional logic to validate parameters.

Another way that we can integrate MLflow with existing testing frameworks. By leveraging these integrations, we can run automated tests against our configurations as part of our continuous integration (CI) pipelines. This ensures that any changes to configurations are tested for compliance with established rules and norms before being merged into production environments.

Principle 5: It should be possible to detect unused or redundant settings.

One thing that ML engineers might overlook is analyzing their logs — yes, the logs themselves. By using MLflow’s parameter tracking capabilities, we can regularly review the logged parameters for our experiments.

Over time, this analysis can help identify parameters that do not affect model performance or are consistently set to default values without variation. Such reviews can highlight candidates for removal or consolidation.

Principle 6: Configurations should undergo a full code review and be checked into a repository.

Well, MLflow itself doesn’t directly handle version control but can be seamlessly integrated with version control systems like Git. We can manage our ML project configurations (including MLflow project files and parameter configurations) under version control.

This integration allows changes to be reviewed through pull requests or merge requests, ensuring that all configurations undergo inspection before being finalized.

Another related feature to this is the MLflow Model Registry which allows for tracking model versions and their configurations. When models are registered or updated in the registry, MLflow can integrate with the repository to include model versioning. This process allows for review of the model configurations and the operational settings under which the model was trained and evaluated.

Conclusion

I wrote this post based on my own experience, though others might have different experiences. I use MLflow very often and believe it is an extensive tool that can create significant value by following best practices to reduce ML systems debt.

Of course, everyone has their own approach to alleviating this debt, but I believe the principles mentioned on this paper, which I highly recommend to read, and how MLflow can be used to adopt them are quite valuable for any project. I hope this adds to your experience.