3 min read

Unlocking Small Data with Machine Learning

Unlocking Small Data with Machine Learning

Big data, small data and machine learning

Machine learning and big data have enabled us to see the big picture and generate software solutions in ways that were unfathomable only a few decades ago. These are immensely powerful tools for propelling our technology into the future, from predictive algorithms in search engines, to self-driving cars, to orchestrating pandemic responses across entire nations. But big data has one critical problem holding it back: as its name implies, it’s data-hungry. In order to generate quality results, software developers rely on gargantuan quantities of data. That has significant setbacks. The steeper threshold of required processing power is not only expensive, but it results in a large carbon footprint, as only one example.

And that’s before considering that a lot of vital sources of big data are not here to stay. The rise of data privacy as both a political issue (such as GDPR regulation) and a desirable consumer product (such as Apple’s privacy initiative) is a good thing in the sense that individuals are becoming more empowered to take ownership of their own data, but it also means headaches for artificial intelligence developers as formerly reliable wells of actionable data dry up.

The answer to this problem might be counter-intuitive: small data.

Most of the data that big institutions have counts as small data. From accounting to contracts to survey responses, companies often allow reams and reams of useful information to collect dust simply because there isn’t enough of it for an AI starting at zero to learn from. But one plus one equals two, and getting the most out of all your smaller data sets will have the same impact as tackling one of your big ones—or, more likely, an even bigger impact.

That’s because transfer learning, the machine learning technique that unlocks insights into small data tends to revolve around quality rather than quantity. It works by allowing an AI to carry over the lessons it learned from one task over to a separate but related task, where it “fine-tunes” the skill. For example, an AI that learns to identify a common type of cancer from a large set of medical records can then adapt its methods and knowledge base in order to identify a different type of cancer that is much rarer, and therefore only present in a smaller data set. It’s possible that a rare enough cancer simply doesn’t generate enough data for an AI to learn to recognize just based on that information exclusively, so the key here is to bring in an AI with a little more experience under its belt in the field of cancer studies more generally.

A lot has been made about the fact that neural networks are designed around the human brain, so perhaps it goes without saying that this method of adapting prior skill sets to accomplish other tasks with limited resources is precisely how we do it, too. In fact, until the day that unsupervised transfer learning overtakes the industry, our ability to do this appears to be the primary thing that sets human intelligence apart from artificial intelligence.

So why hasn’t transfer learning completely revolutionized our technology yet? For one thing, it’s more hands-on. In a large enough data set, an AI can simply be unleashed, and the raw quantity of information will be enough for it to arrive at valuable answers. With small data, AIs are more like children that require guidance from a teacher; they rely on the outside expertise of humans to let them know when they’re on the right track and when they’re going astray.

Advantages of small data

Industry leaders might be put off by the labor-intensiveness of small data, but the savviest among them will recognize that it’s a blessing in disguise. There are advantages that arise from human-machine collaboration on small data sets that outweigh those labor costs. Author and marketer Martin Lindstrom estimates that around 60-65% of the most important modern innovations resulted from small data insights, and he cites the indispensable human element as the cause.

An experiment published in Harvard Business Review found an even more interesting benefit: the process of teaching—even teaching an artificial intelligence—fosters learning for the teacher. When a human is an active participant in the process of machine learning, they become more thoughtful and curious about the subject matter and end up honing their own expertise in order to better collaborate on guiding the AI. So not only is a machine with human fingerprints on it better than one without, but the humans who put those fingerprints there are, themselves, more valuable as workers for the experience. It’s even being suggested that involvement in machine learning should not fall under the exclusive purview of professional coders, but would serve to benefit virtually any field of expertise, from emergency room nurses to law firms to marketing departments. While big data performs work that humans cannot, small data facilitates machines and humans growing together, and with equal or better results.

At present, transfer learning and small data are still in their infancy. But here at PVM, we’re watching the industry closely, and we know better than anyone how powerful data can be. When it comes to the federal government or private industry, our software engineers can create solutions that are perfectly tailored to accomplishing your mission. Contact us for inquiries about tackling your most challenging data, data software or integration needs.