4 Reasons Why Machine Learning Is A Critical Component To Crowd-Sourced Security

We are strong believers in the power of Machine Learning to augment traditional automated and human research.

Crowd-sourced security intelligence is a new trend that is augmenting traditional security gathering. Koodous is one of the crowd-sourced platforms currently operating in the mobile app security space, already having catalogued more than 21 million URL’s and Android Packages (APKs). The platform performs automated analysis and surfaces that analysis to the hundreds of experts in the Koodous community, who rank each URL and APK to declare it to be good or bad.

This approach has already helped Koodous to grow its threat intelligence more rapidly and more fluidly than conventional antivirus programs. The next step for Koodous was to achieve consistency in its ratings across the board, a capability that Mi3 Security has brought to the table with its advanced machine learning capabilities. We are strong believers in the power of Machine Learning to augment traditional automated and human research.

According to ZDNet, there are several factors that are contributing to Machine Learning’s meteoric rise including Big Data, Software and Hardware Advances and Cloud Business Models. They suggest these factors are combining to cause a tipping point where the use of Artificial Intelligence is becoming commonplace. Read on for our take on specifically why Machine Learning is now a critical component for crowd-sourced security.

Exploding Data Availability

The United States alone now generates nearly 2.7 Petabytes every minute (that’s over 2 million gigabytes)! Expand that scope to the world and the numbers grow even larger. Much of this data is being captured and can be mined for valuable intelligence, but there’s just too much of it to reasonably expect humans to keep up. This is also true of the mobile security market and the plethora of threat data coming from vectors such as open source code, rapid application release, breached credential databases or man-in-the-middle attacks.

From a Machine Learning perspective this is a bonanza. We’ve now got more data than we ever had before, which simultaneously makes it impossible to manually analyze but also provides our machine learning algorithms with more than enough data to identify trends and patterns. Machine Learning is the solution to the problem of too much data, which leads us to the next reason machine learning is important.

Increasing Data Complexity

You could argue that there is more and more structured data in the world, which is true, but there’s also a mountain of unstructured data, or data in a myriad of formats as it comes from different systems, vendors, or parts of the world. We know there’s simply too much data to analyze by humans at this point, but even if there wasn’t too much data, the complexity has increased to the point where we can’t keep it all in our heads.

If you look at mobile application security, our RECON platform generates threat intelligence data at a 10:1 ratio to application size. The reason for the 10x factor is the shear amount of threat vectors that need to be analyzed such as which marketplace an app is in, which open source libraries are used, are plain text password embedded in code, are the correct signing certificates used, what is the permissions breakdown in the application manifest, which Internet connections does the app try to make, and so on.

Machine Learning not only allows us to analyze all of the vectors, but analyze them at speed and with ease, and even find things we weren’t even looking for.

Unexpected Pattern Identification

Possibly the most important reason for Machine Learning being critical in crowd-sourced security is that fact that it can identify patterns that most humans would not think to look for. Because it can analyze complex data at scale, and intelligence mathematical algorithms underpin that analysis, it can produce findings that would ultimately be impossible for humans to produce.

The result is often unexpected patterns that help us more rapidly identify threats or risks in broad data sets, ensuring the time-to-discovery window is shortened. This means that it’s now possible to instrument your software build pipeline with application security analysis and have results in near real-time. This changes how software is developed.

In a crowd-source setting, Machine Learning significantly augments all the human eyeballs that are looking at applications and up-voting or down-voting based on their analysis. We’re not suggesting that the human factor can be replaced, at least any time soon, but we are suggesting that any sourcing strategy that doesn’t include Machine Learning will be at a disadvantage.

Exponential Learning Effect

Finding unexpected patterns, and generally landing on Machine Learning algorithms that provide a high degree of accuracy for detecting threats, means that we can increase our intelligence and understanding almost on a daily basis. In a sense, this relates to the Kaizen methodology of continuous improvement. If every day we can hone and tweak Machine Learning algorithms, and those algorithms are finding patterns that humans would never find, we end up with an exponential benefit over time.

Today Machine Learning algorithms help our systems analyze data at speed, and detect with over 99% accuracy whether applications are malicious or have dangerous threat vectors. Every time we generate a new algorithm that produces a better result than traditional code, we replace that traditional code. Over time this will produce a system that is exponentially more capable than it was before the introduction of Machine Learning. Will we see the 1% daily improvement that some Kaizen experts discuss? Only time will tell, but we’re making incredible progress so far.

Summary

Collaborative and crowd-sourced security intelligence is on a path to disrupt traditional research firms, and the power of Machine Learning will become a big player to help these crowd-source platforms scale with the size and complexity of threat data in the market.

The complementary approach we have taken with Koodous leverages cloud-based Machine Learning for mobile app risk analysis from Mi3 Security and the socially networked community of volunteer analysts from Koodous to achieve quick identification, reporting, and mitigation of a wide range of risks, including the Open Web Application Security Project (OWASP) Top 10 Mobile Risks for 2017.

One could argue that the melding of humans and machines has begun, and in our case it’s making our mobile applications much more secure.