Editor’s Note: This is a guest post written by Todd Mozer, CEO of Sensory.
This month Orange decided to stop selling its Djingo smart speaker and it didn’t get much notice. But I think it’s significant. it shows that even a $30 billion telecom giant can’t compete with the likes of Google and Amazon (incidentally they were using SoundHound technology which is about as good a cloud recognizer as an independent general-purpose assistant has attempted). Add to that the failure of Bixby from Samsung one of the world’s largest hardware electronics conglomerates and the issue starts to become clearer.
Here’s the issu Google, Amazon and Apple are investing so much and collecting so much data that it will be very very difficult for anyone else to effectively penetrate broad domain uses of voice assistants. Part of it is an accuracy issue, part of it is infrastructure and ownership of the necessary cloud components, and part of it is the sheer force of thousands of employees working to gather more data, analyze failures, tackle new domains, and do everything else needed to make a general purpose assistant work well. And this includes selling hardware at near breakeven to build a user base and collect more data.
Yeah a lot of it is about data! Sensory did an interesting experiment recently. We used some of the open source techniques like Wav2Letter for generating state of the art acoustic models, we combined with certain features from Sensory’s arsenal like our language models. We then trained on some open source data sets. We found our approach could compete head to head with Google in the data domain in which we trained. But when we switched to other domains (like asking a few customers to test out our alpha software) we fell apart. We always knew how data specific training is, but this really highlighted it.
But all isn’t lost and that’s because most companies don’t really need a general-purpose assistant. I already have an Alexa or Google Home or Apple HomePod in many of the rooms in my house…and I carry my phone around with me if I need access to information. I don’t need more generalist assistants. I need specialists that can help with specific problems or products. This is where the concept of domain specific assistants comes into play. And what’s important here is that its not so hard to get or exceed the accuracy of Google, when you focus on a specific domain. A domain can be something as simple as “cooking” or it can be a user interface on an app or product. Vocalize.ai proved this to us in a test of Sensory’s natural language Microwave vs Amazon and Google.
But to create domain specific assistants you still need to collect a ton of data, right? Well very soon the answer might be “no”. Recent advances with generating and training with synthetic data have enabled accurate models without the need for live data collections. The combination of domain specific assistants and training with synthetic data is opening up new opportunities and might be the path forward for companies that want to own their voice experience. We all know that innovation never stops and this is particularly true in the voice technology field. So, don’t give up on competing with the likes of Amazon, Google and Apple. “Your brand with your voice” is closer than you may think.