It’s an undergraduate class in the Department of Electrical Engineering and Computer Science (EECS) whose humble course catalog label, 6.036, belies its exponentially growing popularity. The subject, Introduction to Machine Learning, was first offered in 2013 and now attracts hundreds more students than can fit into a 500-seat lecture hall. In addition to enrolling droves of EECS students, 6.036 brings in registrants from nearly every discipline MIT offers, from architecture to management. The irresistible draw? A chance to get a jump on the most powerful driver of technology innovation since Moore’s Law.
If artificial intelligence is the rocket ship to which tech giants like Google, Apple, and Facebook have strapped their fortunes, machine learning (ML) is the rocket fuel, and boundaries between the two are increasingly blurred. While machine-learning techniques are intricate and various, the discipline differs from traditional computer programming in one foundational way: instead of writing out in advance all the rules that govern a piece of software’s behavior, machine learning attempts to equip computers with a means of inferring those rules automatically from the various inputs and outputs they encounter. Consider email spam filters (one of the earliest ML applications): it would be impossible to predict every possible instance of what counts as spam. So instead, spam filters learn directly from the data (and from the labels on those data that users provide), making the application more flexible, more automated, and more effective over time.
Building on strong underpinnings in computer science, optimization, and mathematics with a recent wave of new faculty hires, MIT has amassed “very good expertise in the foundations, theory, algorithms, and some applications of machine learning,” says Stefanie Jegelka, the X-Window Consortium Career Development Assistant Professor in EECS—herself one of those hires (she joined MIT in 2015). She adds that “the research activity here, and in the Boston area as a whole, fosters the kind of interdisciplinary research that increases the impact of machine learning” on applications like robotics, computer vision, and health care. But because machine learning has recently experienced an explosion in effectiveness, distilling reality from hype can be difficult. “Paradoxically, the public tends to both underestimate and overestimate machine learning capabilities today,” says Tommi Jaakkola PhD ’97, Thomas M. Siebel Professor of Electrical Engineering and Computer Science. “In the context of narrowly defined tasks such as image analysis and game playing, the potential of machine learning already exceeds public perception. But in open-ended tasks requiring flexible common-sense reasoning, or pulling together and combining disparate sources of information in a novel way, the imagined capabilities may reach somewhat beyond where we actually are.” The Introduction to Machine Learning course—which Jaakkola teaches alongside three other instructors—appeals to students as a means of accessing the ground truth of the field as a whole.
Machine learning and medicine
One of those 6.036 co-teachers is Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science, who has an acutely personal interest in figuring out the big questions in machine learning. In 2014, she was diagnosed with breast cancer, and a portion of her current work is focused on making ML more relevant to medicine, where it might someday unlock advances in cancer diagnosis and personalized treatment. Since so much of medical data is stored in the form of written records and other text-based data, Barzilay’s background in natural-language processing (NLP)—a computer-science discipline that focuses on engineering software that can interpret written and spoken language—provided a toehold for applying machine-learning techniques to medical problems. It also drives Barzilay’s concern about the “interpretability” of ML models when used in medical diagnoses. “Today the majority of the machine learning field focuses on making accurate predictions,” Barzilay explains. “But for medical predictions, accuracy isn’t enough—you really need to understand why the software is making a recommendation, and you need to make that process transparent. That means generating rationales that humans can inspect and understand.”
The rise of big data has been instrumental in driving advances in machine learning, because the software models—especially ones relying on a technique called deep learning—“are very data-hungry,” Barzilay says. For companies like Google or Facebook that serve more than a billion users, this appetite is easily sated. In the medical field, a similar surfeit of data exists. But because medical records are not standardized, ML models—which must be “trained” to recognize relevant features in data by feeding them thousands of hand-labeled examples—run into problems that social network recommendation engines and email spam filters don’t.
“Let’s say you are working with diabetes, and you labeled text-based diagnosis data for one hospital in great detail,” Barzilay explains. “Now, you go to another hospital and you still want to be able to predict the disease. Most likely, the performance of the machine-learning model will be degraded because they put the same kinds of data in a different format.” That means re-training the model again and again—an expensive and impractical prospect, especially for life-or-death medical matters. The key question, says Barzilay, is how to develop models that can transfer their initial learning to new data sets with much less “supervision,” as it’s technically termed, while retaining their predictive powers.
A new research paper by Barzilay, Jaakkola, and Barzilay’s PhD student, Yuan Zhang, offers the beginnings of an answer. They use a machine-learning technique called adversarial training, in which two systems learn about the same data by pursuing competing goals. In Barzilay and her collaborators’ work, one ML system learns how to classify data according to labeled examples—for example, patient pathology reports noting evidence of cancer. Meanwhile, another ML system learns how to discriminate between the cancer labels and another kind of evidence that may also be present in the reports, albeit less extensively labeled—say, evidence of lymphatic disease. By working in concert, the two ML systems teach each other how to correctly classify evidence of cancer and lymphatic disease, despite the dearth of training data for the latter. Since individual medical records almost always encode many different aspects of a patient’s health, such a model could offer a powerful way of automating disease detection.
Reverse-engineering intelligence
Addressing the “transfer” problem in machine learning from another angle is Josh Tenenbaum PhD ’99, an MIT professor of brain and cognitive sciences affiliated with the multi-institutional Center for Brains, Minds, and Machines (CBMM). “When you think of machine learning these days, you think of big data,” he says, “but when we talk about learning in school, it’s really generalization that we prize. How can you figure out how to take something you’ve learned from one instance and generalize it to situations that aren’t quite like the ones you’ve been in before?”
Tenenbaum’s Computational Cognitive Science group uses computational models of learning to investigate how the human mind pulls off this feat. “We’re reverse-engineering intelligence—which means doing science using the tools of engineering,” he says. “We feel that if we’ve understood something important about how the mind and the brain work in engineering terms, then we should be able to put that into a machine and have it exhibit more human-like intelligence.”
An area of machine learning called Bayesian program learning (BPL) has captured the major interest of Tenenbaum and his collaborators as a means for implementing this more human-like learning capability in computers. Based on Bayesian statistics—a branch of mathematics dedicated to making precise inferences based on limited evidence—BPL has been shown to enable a computer to learn how to write unfamiliar letterforms (such as “A,” or a Chinese logogram) more accurately than a human can after just one training example. The research, done by Tenenbaum in collaboration with his former student Brenden Lake PhD ’14 as part of Lake’s MIT dissertation, made headlines in the popular press last year.
Computers capable of this kind of one-shot learning—using models that more closely correspond to how our own minds work—would create a powerful complement to the capabilities already exhibited by deep-learning software, whose artificial reasoning can be inscrutable to human users. Tenenbaum’s collaborations with MIT neuroscientist and CBMM investigator Rebecca Saxe PhD ’03 focus on illuminating how social intelligence manifests in human beings, with the aim of implementing it in computers—relying on some of the same Bayesian mathematical frameworks that power one-shot machine learning. “We want to build machines that humans can interact with the way we interact with other people, machines that support ‘the 3 T’s,’” Tenenbaum says: “We can talk to them, teach them, and trust them.”
Long before machine learning reaches that apex, the discipline must coalesce from its current state—a cluster of multidisciplinary ad-hoc investigations and successes within narrow domains—into “a really systematic understanding of how to take these advances and put them in the hands of less sophisticated companies who don’t have armies of PhDs at their beck and call,” says EECS professor Sam Madden ’99, MNG ’99. As co-director (with Jaakkola) of the Systems That Learn initiative at the MIT Computer Science and Artificial Intelligence Laboratory, Madden hopes to achieve that very end—turning machine learning into a broadly understood computing infrastructure that anyone can leverage.
“I like to make an analogy with computer programming,” Madden says. “Today, using machine learning is like writing code in assembly language—very technical and low level. What I want is for it to be more like using Microsoft Excel. You don’t need a computer science degree to be able to use Excel effectively to do data analysis. It would be really cool if we could package up machine learning in a similar way.” Until then, MIT course 6.036 will likely continue on its current enrollment trend: standing room only.