With so many players in the mix from operations to developers, from PMs to technical leads, to Q&A, most of their activities are automated. Systems record all their email, Wiki, and chat conversations and other notations about these activities. This gives the devops community the opportunity to learn from that data in order to improve future projects.

“We have access to a massive amount of data that can in total define what the process is that’s going on and define it in-flight in real-time,” says Aaron Cois, Researcher, CERT Division, Software Engineering Institute, Carnegie Mellon University. Through exploratory research, Cois is learning that by using machine learning tools, enterprises can analyze security activities in the development process based on data from issue trackers and commit logs.

Valleys in Topic Density Point to Software Development That is Missing the Mark

Natural Language Processing, a subset of machine learning, uses systems to process text in unstructured data and to perform topic detection, determining whether or not a portion of the text belongs to a given topic category. “In our work, we’ve taken text from issue trackers and used machine learning to classify each issue as security-related or not security-related,” says Cois.

This has practical applications for detecting the course of a project and steering it in the right direction. If the PM expects the developers to raise the quality of security for an iteration of the software package, he can monitor the chatter among developers about the issues they are adding to the issue tracker or he can monitor the comments on the code.

“By using machine learning to classify issue tracking or source code commit messages as security-related, we can see a reasonable measure of the amount of work that’s being done to ensure the security of the software system,” says Cois.

If the PM can determine based on a shortage of security chatter that in actuality not much security work is going on, he can ask why, what is the challenge? “Is the security stuff more difficult? Did we do our due diligence to allow our developers to adequately address security?” asks Cois. Then the PM and the teams can work together to remove any roadblocks before any more scope creep or veering off course occurs.

Machine Learning for Physical Modeling and Trends Analysis

If the industry can build statistical models of what a healthy software project looks like as it progresses, then it should be able to determine whether a given project is on track and what risks it may face along the way.

Issue tracking data, commit logs, wikis, chat server logs, build/deployment logs—every type of data that can lead to a full picture of project activities—would form the basis of these models, according to Cois.

Through Natural Language Processing, machine learning can help determine from this data what the rates of creation and resolution of issues are and what the rates of commits and overall progress are for the software project.

Machine learning can help PMs look at how their project’s rates in these areas compare with rates of similar projects, which developed certain problems or turned out in certain ways. If there is a strong correlation between rates of projects that had known outcomes and rates in your project, then you might want to use that information to make corrections early on, in cases where the outcomes of the other projects were undesirable.

Over time, the models become stronger and their predictive value increases, leading to some strong correlations between current projects and the models collected. With the increasing density of the data available in models as they grow, even the weakest numbers in early models can become stronger, giving the model strong correlative value in areas where strong correlations did not appear at first.

“If I have collected both activity and outcome data on thousands of projects, and a large number of them are highly similar to the current project, I’ll have much more confidence in my predictions about the potential outcomes of the current project given the behaviors I am observing,” says Cois.