Takeaway from MLOps NYC: Open Source Frameworks Need TLC

The growing open source options to manage and develop machine learning applications still require customization and care before enterprises dive in.

There are more ways to oversee the creation and deployment of machine learning applications thanks to open source frameworks such as Kubeflow and MLflow -- but enterprises might be a wee bit hesitant to get onboard. Naturally, the standards ingrained at the enterprise and other considerations may come into play before implementing any open source options. A panel at last week’s MLOps NYC conference, discussed best practices for multiplatform MLOps with Kubeflow and MLflow that might make it easier to get enterprises on board.

The panel consisted of Clemens Mewald, director of product management, machine learning and data science at Databricks; Thea Lamkin, open source program manager at Google; and David Aronchick, head of open source machine learning strategy at Microsoft. Yaron Haviv, CTO of Iguazio, the host of the conference, moderated.

Mewald, Lamkin, Aronchick, and Haviv at MLOPs NYC Image: Joao-Pierre Ruth

Databricks’ MLflow is an open source framework to manage the complete machine learning lifecycle. Mewald said the cofounders of Databricks, a platform for automated cluster management and unified analytics, conceived their platform with three components in mind: tracking, reproducibility, and diversity of machine learning frameworks. They started off, Mewald said, looking to data scientists as their first intended users but also took DevOps into consideration in the development of their platform.

The other open source platform discussed, Kubeflow, was developed by Google for the development of machine learning applications. Lamkin said in her role she focuses on the open source community and collaboration aspects of Kubeflow, which she sees evolving as more users engage with it. “One of the things we’re trying to address with the project is there is a diversity of tools in our [use] cases and standards that we’re working with within the machine learning ecosystem,” she said. “We’re trying to provide a way, with a solid platform, that allows people to leverage back best practices no matter if they’re using TensorFlow or PyTorch.”

There is an intent, she said, to also tap into underlying Kubernetes standards that come out of the community. Kubeflow, Lamkin said, can be a platform that lets people create tools without sacrificing standards they have been working with. However, Lamkin also said that with any open source tool such as Kubernetes, the enterprise may need an extra layer of support tied to how it is implementing and continues to run at scales.

“That could be support on your own team, or maybe you want to leverage the tooling that a platform offered,” she said. “Having Kubeflow running on-prem, on GKE (Google Kubernetes Engine) on-prem, for example, makes it easy to deploy and use Google Cloud AI features.” An effort is being made, Lamkin said, to ensure Kubeflow runs well on all the largest cloud providers. “Ultimately, we want Kubeflow to be ubiquitous,” she said.

Though open source frameworks might not be ready for every enterprise right out of the gate, Aronchick does not see this a deterrent. “Every platform will require some form of customization in order to work with a hosted or managed solution,” he said. Aronchick cofounded the Kubeflow project and was the first non-founding product manager on Kubernetes.

Such customization work has been part of the learning process with many open source frameworks and might help further adoption. During the development of Kubernetes, he said, it was realized early on that creating a container was easy but new challenges arose from there. “Taking it one step further to run in it in production in any way was actually quite hard,” he said. “How do you distribute it? How do you start it? What policies do you put in place?”

In 2017, Aronchick said, there was a realization that there was a very similar experience taking place with machine learning. Getting TensorFlow to run locally on a machine, he said, was easy but doing anything more complicated than that became a challenge. Aronchick said he sees an opportunity now to help data scientists through collaboration and the complementary connection between things such as MLflow and connecting to distributed systems on the backend. “Those two are really halves of the same coin,” he said.

Joao-Pierre S. Ruth has spent his career immersed in business and technology journalism first covering local industries in New Jersey, later as the New York editor for Xconomy delving into the city's tech startup community, and then as a freelancer for such outlets as ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.