Melanoma is by far the deadliest form of skin cancer, killing more than 7,000 people in the United States in 2019 alone. Early detection of the disease dramatically reduces the risk of death and the costs of treatment, but widespread melanoma screening is not currently feasible. There are about 12,000 practicing dermatologists in the US, and they would each need to see 27,416 patients per year to screen the entire population for suspicious pigmented lesions (SPLs) that can indicate cancer.
Computer-aided diagnosis (CAD) systems have been developed in recent years to try to solve this problem by analyzing images of skin lesions and automatically identifying SPLs, but so far have failed to meaningfully impact melanoma diagnosis. These CAD algorithms are trained to evaluate each skin lesion individually for suspicious features, but dermatologists compare multiple lesions from an individual patient to determine whether they are cancerous — a method commonly called the “ugly duckling” criteria. No CAD systems in dermatology, to date, have been designed to replicate this diagnosis process.
Now, that oversight has been corrected thanks to a new CAD system for skin lesions based on convolutional deep neural networks (CDNNs) developed by researchers at the Wyss Institute for Biologically Inspired Engineering at Harvard University and the Massachusetts Institute of Technology (MIT). The new system successfully distinguished SPLs from non-suspicious lesions in photos of patients’ skin with ~90% accuracy, and for the first time established an “ugly duckling” metric capable of matching the consensus of three dermatologists 88% of the time.
“We essentially provide a well-defined mathematical proxy for the deep intuition a dermatologist relies on when determining whether a skin lesion is suspicious enough to warrant closer examination,” said the study’s first author Luis Soenksen, Ph.D., a Postdoctoral Fellow at the Wyss Institute who is also a Venture Builder at MIT. “This innovation allows photos of patients’ skin to be quickly analyzed to identify lesions that should be evaluated by a dermatologist, allowing effective screening for melanoma at the population level.”
The technology is described in Science Translational Medicine, and the CDNN’s source code is openly available on GitHub (https://github.com/lrsoenksen/SPL_UD_DL).
Bringing ugly ducklings into focus
Melanoma is personal for Soenksen, who has watched several close friends and family members suffer from the disease. “It amazed me that people can die from melanoma simply because primary care doctors and patients currently don’t have the tools to find the “odd” ones efficiently. I decided to take on that problem by leveraging many of the techniques I learned from my work in artificial intelligence at the Wyss and MIT,” he said.
Soenksen and his collaborators discovered that all the existing CAD systems created for identifying SPLs only analyzed lesions individually, completely omitting the ugly duckling criteria that dermatologists use to compare several of a patient’s moles during an exam. So they decided to build their own.
To ensure that their system could be used by people without specialized dermatology training, the team created a database of more than 33,000 “wide field” images of patients’ skin that included backgrounds and other non-skin objects, so that the CDNN would be able to use photos taken from consumer-grade cameras for diagnosis. The images contained both SPLs and non-suspicious skin lesions that were labeled and confirmed by a consensus of three board-certified dermatologists. After training on the database and subsequent refinement and testing, the system was able to distinguish between suspicious from non-suspicious lesions with 90.3% sensitivity and 89.9% specificity, improving upon previously published systems.
But this baseline system was still analyzing the features of individual lesions, rather than features across multiple lesions as dermatologists do. To add the ugly duckling criteria into their model, the team used the extracted features in a secondary stage to create a 3D “map” of all of the lesions in a given image, and calculated how far away from “typical” each lesion’s features were. The more “odd” a given lesion was compared to the others in an image, the further away it was from the center of the 3D space. This distance is the first quantifiable definition of the ugly duckling criteria, and serves as a gateway to leveraging deep learning networks to overcome the challenging and time-consuming task of identifying and scrutinizing the differences between all the pigmented lesions in a single patient.
Deep learning vs. dermatologists
Their DCNN still had to pass one final test: performing as well as living, breathing dermatologists at the task of identifying SPLs from images of patients’ skin. Three dermatologists examined 135 wide-field photos from 68 patients, and assigned each lesion an “oddness” score that indicated how concerning it looked. The same images were analyzed and scored by the algorithm. When the assessments were compared, the researchers found that the algorithm agreed with the dermatologists’ consensus 88% of the time, and with the individual dermatologists 86% of the time.
“This high level of consensus between artificial intelligence and human clinicians is an important advance in this field, because dermatologists’ agreement with each other is typically very high, around 90%,” said co-author Jim Collins, Ph.D., a Core Faculty member of the Wyss Institute and co-leader of its Predictive Bioanalytics Initiative who is also the Termeer Professor of Medical Engineering and Science at MIT. “Essentially, we’ve been able to achieve dermatologist-level accuracy in diagnosing potential skin cancer lesions from images that can be taken by anybody with a smartphone, which opens up huge potential for finding and treating melanoma earlier.”
Recognizing that such a technology should be made available to as many people as possible for maximum benefit, the team has made their algorithm open-source on GitHub. They hope to partner with medical centers to launch clinical trials further demonstrating their system’s efficacy, and with industry to turn it into a product that could be used by primary care providers around the world. They also recognize that in order to be universally helpful, their algorithm needs to be able to function equally well across the full spectrum of human skin tones, which they plan to incorporate into future development.
“Allowing our scientists to purse their passions and visions is key to the success of the Wyss Institute, and it’s wonderful to see this advance that can impact all of us in such a meaningful way emerge from a collaboration with our newly formed Predictive Bioanalytics Initiative,” said Wyss Founding Director Don Ingber, M.D., Ph.D., who is also the Judah Folkman Professor of Vascular Biology at Harvard Medical School and Boston Children’s Hospital, and Professor of Bioengineering at the Harvard John A. Paulson School of Engineering and Applied Sciences.
Additional authors of the paper include Regina Barzilay, Martha L. Gray, Timothy Kassis, Susan T. Conover, Berta Marti-Fuster, Judith S. Birkenfeld, Jason Tucker-Schwartz, and Asif Naseem from MIT, Robert R. Stavert from the Beth Israel Deaconess Medical Center, Caroline C. Kim from Tufts Medical Center, Maryanne M. Senna from Massachusetts General Hospital, and José Avilés-Izquierdo from Hospital General Universitario Gregorio Marañón.
This research was supported by the Abdul Latif Jameel Clinic for Machine Learning in Health, the Consejería de Educación, Juventud y Deportes de la Comunidad de Madrid through the Madrid-MIT M+Visión Consortium and the People Programme of the European Union’s Seventh Framework Programme, the Mexico CONACyT grant 342369/40897, and the US DOE training grant DE-SC0008430.