MOZILLA has announced the release of its huge open source speech data set, as announced back in the summer.
Common Voice is an initiative to bring speech recognition to open source and has been busily collecting data all summer. In total 400,000 recordings of 20,000 volunteers, stretching to 500 hours of speech have been used to create the second biggest publicly available data set in the history of the internet.
The release is timed to coincide with the launch of Mozilla's first open source speech recognition model modelled after Deep Speech papers published by Chinese giant Baidu.
The aim of the project was to have an error rate of under 10 per cent and currently, that figure stands at just 6.5 per cent.
We particularly like this bit: "Deep Speech is an end-to-end trainable, character-level, deep recurrent neural network (RNN). In less buzzwordy terms: it's a deep neural network with recurrent layers that gets audio features as input and outputs characters directly — the transcription of the audio.
"It can be trained using supervised learning from scratch, without any external "sources of intelligence", like a grapheme to phoneme converter or forced alignment on the input."
Isn't "deep neural network" about as "buzzwordy" as can be anyway? Bless.
Eventually, the aim will be to make it fast enough to run on a mobile device or a Raspberry Pi. The current model requires some serious CPU or GPU action (or of course access via the cloud).
Of the voice data, it explains: "When we look at today's voice ecosystem, we see many developers, makers, startups, and researchers who want to experiment with and build voice-enabled technologies. But most of us only have access to the fairly limited collection of voice data; an essential component for creating high-quality speech recognition engines.
"This voice data can cost upwards of tens of thousands of dollars and is insufficient in scale for creating speech recognition at a level people expect. By providing this new public dataset, we want to help overcome these barriers and make it easier to create new and better speech recognition systems"
Currently, the project is limited to English but with plans to extend out to other languages in due course.
Find out more about Deep Speech or contribute to the project at GitHub. To download the speech data, go here. Plus of course, this is a living breathing project. You can always add your own voice. Especially if you've got a distinctive accent or timbre. µ
Plus IoT factories and a pricey Pixel pouch
It's all fun and games until someone loses their rent
Speeding this way from the Spring
It's generating lower margins than smart-speaker rivals from Amazon and Google