Back to Mobile View

Skip to Content

In which, I use Siri to create in-app third party commands while taking a break from cooking for Thanksgiving

Earlier today, I had a brainstorm. Maybe I was just overthinking the whole custom Siri thing for third party apps. I didn't want to install a proxy server or jailbreak my iPhone. I just wanted to use Siri dictation to issue commands to my apps, and I wanted to do that without having to display a keyboard that took up 50% of my iPhone screen.

So I chatted with Steven Troughton-Smith about overriding the keyboard by setting its inputView to a custom toolbar, and then using a button to start and stop Siri dictation. This last bit, the custom UI (sorry, but I wanted pretty) is the only "hacking" per se involved here. Instead of overwhelming the screen, he suggested I link directly to UIDictationController and tell it to start and stop dictation.

Originally, I was really aiming for doing a custom view without text at all and implementing UITextInput but apparently there's some other protocol I'm missing, so I ended up reverting to the standard text field with the blinking cursor you see at the top of the video. I'll figure that bit out later.

So what I do is this: I start dictation on the button press, stop it on the button release, and then catch the interpreted text and compare it against four words: up, down, left, and right. If these are found, the app runs the matching animation.

The big idea is this: instead of having Siri interpret commands and return ACE objects that match tasks I want to accomplish (my original approach), I do the text matching and interpretation myself.

In the end, the solution is both low rent and really easy to apply. You could even get this in App Store if you were willing to show the entire keyboard, and not just the start/stop button.

Check it out in action here:



Earlier today, I had a brainstorm. Maybe I was just overthinking the whole custom Siri thing for third party apps. I didn't want to...
 

Add a Comment

*0 / 3000 Character Maximum Comment Moderation Enabled. Your comment will appear after it is cleared by an editor.

9 Comments

Filter by:
Phil

www.youtube.com/watch?v=a2iZ34lMAQk

November 28 2011 at 7:34 PM Report abuse rate up rate down Reply
Markus Möller

This is very useful news. Thanks Erica. Is your Siri code using UIDictationController for download? Did not find docs at Apple about UIDictationController.

November 24 2011 at 11:13 AM Report abuse rate up rate down Reply
1 reply to Markus Möller's comment
gormster

It's a private API, that's why she termed it "hacking".

November 24 2011 at 7:51 PM Report abuse rate up rate down Reply
digitalsedition

Would be even more thankful for this if you published the code.

November 24 2011 at 10:10 AM Report abuse rate up rate down Reply
antonkay

What about taking advantage of the "bluetooth keyboard mode," in which case the onscreen keyboard isn't visible when a bluetooth keyboard is hooked up?

November 24 2011 at 9:23 AM Report abuse rate up rate down Reply
romesh

mmm not to take the wind out of your sails, but although this program is kind of neat, it is entirely misleading to call it Siri, seeing as it is really just normal voice recognition (without the natural language processing component)

November 24 2011 at 7:51 AM Report abuse rate up rate down Reply
1 reply to romesh's comment
houdini

Actually, this is what Apple is doing as well.... And YES, this is Siri. Can you develop an app that responds to your voice without Siri (or Nuance?).

Siri is used to create the text, then an API returns the results or acts on what the text says.

Apple's Siri interface looking for specific keywords to act on, just like Erica is. The difference is the number of terms it is looking through, and the inferences between words.

Obviously, for a test, Erica is not going to develop a full fledged natural language interpretation system.

November 24 2011 at 10:47 AM Report abuse rate up rate down Reply
1 reply to houdini's comment
romesh

Well then it depends on what you mean by 'Siri' doesn't it? This is the same as using the Speech Input API on Android. There are plenty of ways to achieve speech recognition, thats why wiki has a long list of speech recognition programs. Apple markets Siri as being the combination of their voice recognition coupled with NLP, and this is what gives Siri an advantage. Most users wouldn't think of the microphone button on the iPhone keyboard as 'Siri', just dictation. Hence why 'Siri' is a little misleading- isn't the whole point of trying to get third-party Siri to leverage NLP at Apple's end? The sophistication of Siri comes down to NLP, not speech recognition, and the point of third party Siri would be to get the benefits of NLP without having to do it yourself. Not saying this example isn't cool, or a good feature to have- just that calling it Siri is misleading :P

November 24 2011 at 4:17 PM Report abuse rate up rate down
Matteo Gavagnin

Nice shot Erica.

We've also done something similar, but with the proxy.
Our implementation can also run on two different devices.

Check out our blog post for the video and implementation details.
http://blog.fastpdfkit.com/

Matteo

November 24 2011 at 2:54 AM Report abuse rate up rate down Reply
Buy an ad here

Tweets

© 2012 AOL Inc. All Rights Reserved.