Federated learning is not only a promising technology but also a possible brand new AI business model. Indeed, as a consultant, I have been recently tasked with making recommendations about how a healthcare company could create a “data alliance” with some competitors by creating a Federated Learning framework. The goal of this article is to explain to you how FL might give birth to a new data ecosystem and create data alliances.

What is Federated Learning (FL)?

Without getting too much into technical details, FL could be defined as a distributed machine learning framework that allows a collective model to be constructed from data that is distributed across data owners.

The data required by AI projects involve multiple elements. I would say that our capacity to create great AI projects is always limited. Access to external data is very restricted and represents a real issue in building advanced AI applications. The worse is that due to industry competition, privacy security, and other administrative procedures, even data integration between different departments of the same company represents a challenge.

In general, centralized ML is far from being perfect. Indeed, training the models requires companies to amass mountains of relevant data to central servers or data centers. In some projects, that means collecting a user’s sensitive data.

As a consequence, centralized machine learning is often inaccessible for most businesses. Needless to say that the “simple” task of gathering all the data necessary to a project is quite expensive and time-consuming.

I am often confronted with two issues while working on a ML project:

Depending on the project, the owner of the data you need may simply not want to share it with your company. It is the case when competitively sensitive data are concerned, or legally protected medical data.

Secondly, a significant amount of valuable training data is created on hardware at the edges of slow and unreliable networks, such as smartphones or equipment in industrial facilities. I realized that communication with such devices can be slow and expensive for the company.

Federated learning brings an answer to most issues related to traditional machine learning. Indeed, algorithm training moves to the edge of the network, so that data never leaves the device, whether it’s a mobile phone or a hospital branch’s servers. Once the model learns from the data, the results are uploaded and aggregated with updates from all the other devices on the network. The improved model is then shared with the entire network. (1)

A New Business Model?

The cloud computing model is being challenged like never before. Companies can no longer ignore the growing importance of data privacy and data security. Moreover, the relationship between a company’s profits and its data is becoming more and more obvious in the AI age. However, the business model of federated learning has provided a new paradigm for applications leveraging data.

The goal with federated learning is that when the isolated dataset used by each company fails to create an accurate model, the mechanism of federated learning makes it possible for companies to share a united model without a direct data exchange. It would become possible for companies to access more data and better train their models.

Equitable data sharing can be achieved either by building a meta-model from the sub-models each party builds so that only model parameters are transferred or by using encryption techniques to allow safe communications in between different parties. Blockchain techniques could also help to reinforce data control.

To put it simply, federated learning makes it possible for different data owners at the organizational level to collaborate and share their data. In a recent paper, the researchers (Qiang Yang et al.) envision the different configurations in which this can happen.

Vertical & Horizontal Federated Learning

Let’s take the example of two banks from the same country. Although they have non-overlapping clientele, their data will have similar feature spaces since they have very similar business models. They might come together to collaborate in an example of horizontal federated learning.

In vertical federated learning, two companies providing different services (e.g. banking and e-commerce) but having a large intersection of clientele might find room to collaborate on the different feature spaces they own, leading to better outcomes for both.

In both cases, the data owners can collaborate without having to reveal their respective customer’s privacy thanks to, for example, blockchain techniques. They will both have access to more data to better improve their AI initiatives.

Right now, federated learning seems perfect for the healthcare and the banking industry. When it comes to banks, we can imagine a system in which multiple banks could train a common powerful fraud detection model without sharing their sensitive customer data with each other through Federated Learning. Regarding hospitals and other healthcare institutions, they could benefit if they agree to share patient data for model training in a privacy-preserving manner.

Building Data Alliances

When I was tasked with building a data alliance around a federated learning framework, I noticed that companies are often extremely skeptical about data privacy. Indeed, none of them want to help the competition by sharing their data. The main challenge will be here… How to convince companies to open their data war chest and share it with others?

My opinion is that this new business model based on federated learning must be supported by an industrial data alliance otherwise it will be doom to fail. The alliance may have several entities, by joining the alliance, entities can cooperate using data under federated learning framework.

The data alliance I’m working on will look like this:

It will be a multi-party system composed of two or more organizations forming an alliance to train a shared model on their individual datasets through Federated Learning. Selected companies and organizations will be encouraged to join the alliance and this same alliance will have a clear incentive mechanism.

I believe that in order to fully commercialize federated learning among different organizations, a fair platform and incentive mechanisms needs to be developed.

Members in the alliance enjoy rights and interests, and also fulfill responsibilities. In my opinion, the alliance must use blockchain to build a consensus of all parties, record each party’s contribution in a permanent data recording mechanism, and award parties that yielding outstanding contribution.

“Keeping data private is the major value addition of Federated Learning here for each of the participating entities to achieve a common goal.” (2)

I would recommend relying on a neutral 3rd party. They could be in charge of “providing the infrastructure to aggregate model weights and establishes trust among the companies in the alliance”. (3)

Moreover, data structures and parameters are usually similar but need not be the same but a lot of pre-processing is required at each client to standardize model inputs. A neutral 3rd party can perfectly handle this part of the project.

Currently, data silos and the focus on data privacy are important challenges for artificial intelligence, but federated learning could be a solution. It could establish a united model for multiple organizations while the local and sensitive data is protected so that they could benefit together without having to worry about data privacy.

Challenges of Federated Learning

Transitioning federated learning from concept to production is not without challenges. Indeed, a lot has been achieved on the efficiency and accuracy of federated learning, the more important challenges, in my opinion, are related to security.

The key factor for federated learning is to preserve the privacy associated with data. It appears that even when the actual data is not exposed, the repeated model weight updates can be exploited to reveal properties not global to the data but specific to individual contributors. (4)

This inference can be performed on both the server-side as well as the client-side. A possible solution would be to use “differential privacy” techniques to mitigate this risk.

Conclusion

Federated learning makes it easier, safer and cheaper to apply machine learning in regulated and competitive industries. Through FL, companies might improve their models and enhance their AI applications. In the medical field, FL could be synonym of better treatment and faster drugs discovery.

I believe that the current mindset of centrally aggregating data and creating silos by large firms for competitive advantage would be a major obstacle to drive the adoption of Federated Learning. Most companies have only recently started their AI journey… We will need effective data protection policies, appropriate incentives and business models around decentralizing data can tackle these issues and develop the Federated AI ecosystem.

Shortly, I expect to see more industrial data alliances in many vertical markets, for example, the financial industry can form a financial data alliance, while the medical industry can form a medical data alliance. In the long term, we could also expect data alliances between companies from different industries but with the same AI vision.

If you are interested in having more technical details, I recommend this website.