In which, I use Siri to create in-app third party commands while taking a break from cooking for Thanksgiving

Earlier today, I had a brainstorm. Maybe I was just overthinking the whole custom Siri thing for third party apps. I didn't want to install a proxy server or jailbreak my iPhone. I just wanted to use Siri dictation to issue commands to my apps, and I wanted to do that without having to display a keyboard that took up 50% of my iPhone screen.
So I chatted with Steven Troughton-Smith about overriding the keyboard by setting its inputView to a custom toolbar, and then using a button to start and stop Siri dictation. This last bit, the custom UI (sorry, but I wanted pretty) is the only "hacking" per se involved here. Instead of overwhelming the screen, he suggested I link directly to UIDictationController and tell it to start and stop dictation.
Originally, I was really aiming for doing a custom view without text at all and implementing UITextInput but apparently there's some other protocol I'm missing, so I ended up reverting to the standard text field with the blinking cursor you see at the top of the video. I'll figure that bit out later.
So what I do is this: I start dictation on the button press, stop it on the button release, and then catch the interpreted text and compare it against four words: up, down, left, and right. If these are found, the app runs the matching animation.
The big idea is this: instead of having Siri interpret commands and return ACE objects that match tasks I want to accomplish (my original approach), I do the text matching and interpretation myself.
In the end, the solution is both low rent and really easy to apply. You could even get this in App Store if you were willing to show the entire keyboard, and not just the start/stop button.
Check it out in action here:
Share
Earlier today, I had a brainstorm. Maybe I was just overthinking the whole custom Siri thing for third party apps. I didn't want to...
Add a Comment
www.youtube.com/watch?v=a2iZ34lMAQk
November 28 2011 at 7:34 PM Report abuse Permalink rate up rate down ReplyThis is very useful news. Thanks Erica. Is your Siri code using UIDictationController for download? Did not find docs at Apple about UIDictationController.
November 24 2011 at 11:13 AM Report abuse Permalink rate up rate down ReplyIt's a private API, that's why she termed it "hacking".
November 24 2011 at 7:51 PM Report abuse Permalink rate up rate down ReplyWould be even more thankful for this if you published the code.
November 24 2011 at 10:10 AM Report abuse Permalink rate up rate down ReplyWhat about taking advantage of the "bluetooth keyboard mode," in which case the onscreen keyboard isn't visible when a bluetooth keyboard is hooked up?
November 24 2011 at 9:23 AM Report abuse Permalink rate up rate down Replymmm not to take the wind out of your sails, but although this program is kind of neat, it is entirely misleading to call it Siri, seeing as it is really just normal voice recognition (without the natural language processing component)
November 24 2011 at 7:51 AM Report abuse Permalink rate up rate down ReplyActually, this is what Apple is doing as well.... And YES, this is Siri. Can you develop an app that responds to your voice without Siri (or Nuance?).
Siri is used to create the text, then an API returns the results or acts on what the text says.
Apple's Siri interface looking for specific keywords to act on, just like Erica is. The difference is the number of terms it is looking through, and the inferences between words.
Obviously, for a test, Erica is not going to develop a full fledged natural language interpretation system.
Well then it depends on what you mean by 'Siri' doesn't it? This is the same as using the Speech Input API on Android. There are plenty of ways to achieve speech recognition, thats why wiki has a long list of speech recognition programs. Apple markets Siri as being the combination of their voice recognition coupled with NLP, and this is what gives Siri an advantage. Most users wouldn't think of the microphone button on the iPhone keyboard as 'Siri', just dictation. Hence why 'Siri' is a little misleading- isn't the whole point of trying to get third-party Siri to leverage NLP at Apple's end? The sophistication of Siri comes down to NLP, not speech recognition, and the point of third party Siri would be to get the benefits of NLP without having to do it yourself. Not saying this example isn't cool, or a good feature to have- just that calling it Siri is misleading :P
November 24 2011 at 4:17 PM Report abuse Permalink rate up rate downNice shot Erica.
We've also done something similar, but with the proxy.
Our implementation can also run on two different devices.
Check out our blog post for the video and implementation details.
http://blog.fastpdfkit.com/
Matteo
Deals of the Day
more deals- Acoustic Research Digital Photo Frame with iPod Dock for $50 + free shipping
- Apple iPhone 4 8GB for Verizon, AT&T, or Sprint for $50 + pickup at Best Buy
- Unlocked iPhone 4S 16GB for GSM (AT&T, T-Mobile) for $619 + free shipping
- Apple iMac Core i7 Quad 3.4GHz 27" w/ 24GB RAM, 2TB HDD for $2,677 + $29 s&h
- Used Apple Magic Mouse for $36 + $4 s&h
- Skullcandy Riot Earbud Headphones for $10 + free shipping
9 Comments