“We are in a data desert,” said Mary Bellard, principal innovation architect lead at Microsoft who also oversees the AI for Accessibility program. “There’s a lot of passion and energy around doing really cool things with AI and people with disabilities, but we don’t have enough data.”
“It’s like we have the car and the car is packed and ready to go, but there’s no gas in it. We don’t have enough data to power these ideas.”
To begin to shrink that data desert, Microsoft researchers have been working for the past year and a half to investigate and suggest ways to make AI systems more inclusive of people with disabilities. The company is also funding and collaborating with AI for Accessibility grantees to create or use more representative training datasets, such as ORBIT and the Microsoft Ability Initiative with University of Texas at Austin researchers.
Today, Team Gleason announced it is partnering with Microsoft on Project Insight, which will create an open dataset of facial imagery of people living with ALS to help advance innovation in computer vision and train those AI models more inclusively.
It’s an industry-wide problem that won’t be solved by one project or organization alone, Microsoft says. But new collaborations are beginning to address the issue.
A research roadmap on AI Fairness and Disability published by Microsoft Research and a workshop on Disability, Bias and AI hosted last year with the AI Now Institute at New York University found a host of potential areas in which mainstream AI algorithms that aren’t trained on inclusive data either don’t work well for people with disabilities or can actively harm them.
If a self-driving car’s pedestrian detection algorithms haven’t been shown examples of people who use wheelchairs or whose posture or gait is different due to advanced age, for example, they may not correctly identify those people as objects to avoid or estimate how much longer they need to safely cross a street, researchers noted.
AI models used in hiring processes that try to read personalities or interpret sentiment from potential job candidates can misread cues and screen out qualified candidates with autism or who emote differently. Algorithms that read handwriting may not be able to cope with examples from people who have Parkinson’s disease or tremors. Gesture recognition systems may be confused by people with amputated limbs or different body shapes.
It’s fairly common for some people with disabilities to be early adopters of intelligent technologies, yet they’ve often not been adequately represented in the data that informs how those systems work, researchers say.
“When technologies are so desired by a community, they’re often willing to tolerate a higher rate of errors,” said Meredith Ringel Morris, senior principal researcher who manages the Microsoft Research Ability Team. “So imperfect AI systems still have value, but they could provide so much more and work so much better if they were trained on more inclusive data.”
‘Pushing the state of the art’
Danna Gurari, an AI for Accessibility grantee and assistant professor at the University of Texas at Austin, had that goal in mind when she began developing the VizWiz datasets. They include tens of thousands of photographs and questions submitted by people who are blind or have low vision to an app originally developed by researchers at Carnegie Mellon University.
The questions run the gamut: What is the expiration date on this milk? What does this shirt say? Do my fingertips look blue? Do these clouds look stormy? Do the charcoal briquettes in this grill look ready? What does the picture on this birthday card look like?
The app originally crowdsourced answers from people across the internet, but Gurari wondered if she could use the data to improve how computer vision algorithms interpret photos taken by people who are blind.
Many of those questions require reading text, such as determining how much of an over-the-counter medicine is safe to take. Computer vision research has often treated that as a separate problem, for example, from recognizing objects or trying to interpret low-quality photos. But successfully describing real-world photos requires an integrated approach, Gurari said.
Moreover, computer vision algorithms typically learn from large image datasets of pictures downloaded from the internet. Most are taken by sighted people and reflect the photographer’s interest, with items that are centered and in focus.
But an algorithm that’s only been trained on perfect images is likely to perform poorly in describing what’s in a photo taken by a person who is blind; it may be blurry, off center or backlit. And sometimes the thing that person wants to know hinges on a detail that a person who is sighted might not think to label, such as whether a shirt is clean or dirty.
“Often it’s not obvious what is meaningful to people, and that’s why it’s so important not just to design for — but design these technologies with — people who are in the blind and low vision community,” said Gurari, who also directs the School of Information’s Image and Video Computing Group at the University of Texas at Austin.
Her team undertook the massive task of cleaning up the original VizWiz dataset to make it usable for training machine learning algorithms — removing inappropriate images, sourcing new labels, scrubbing personal information and even translating audio questions into text to remove the possibility that someone’s voice could be recognized.
Working with Microsoft funding and researchers, Gurari’s team has developed a new public dataset to train, validate and test image captioning algorithms. It includes more than 39,000 images taken by blind and low vision participants and five possible captions for each. Her team is also working on algorithms that can recognize right off the bat when an image someone has submitted is too blurry, obscured or poorly lit and suggest how to try again.
Earlier this year, Microsoft sponsored an open challenge to other industry and academic researchers to test their image captioning algorithms on the VizWiz dataset. In one common evaluation metric, the top performing algorithm posted a 33% improvement over the prior state of the art.
“This is really pushing the state of the art in captioning for the blind community forward,” said Seeing AI lead engineer Shaikh, who is working with AI for Accessibility grantees and their datasets to develop potential improvements for the app.