Part One: Microsoft Voice Recognition Software
When my surgeon said that the pin in my finger had to stay in for another four weeks, rather than coming out 10 days post-op the way I had expected, I was upset beyond belief. I had stumbled through typing with nine fingers rather than the ten I’m used to using, but extended typing sessions made my hand cramp. In addition, my thinking process was massively impaired because I had to keep stopping to check and correct new-to-me typos (on the s, l, r, -, and n keys*), rather than letting my words flow into something that my spellchecker could recognize and fix for me. I had been able to put off a lot of my typing life for the first 10 post-operative days; I didn’t see how I could cope or make any income at all if I had to put as much off for four more weeks as I had postponed for 10 days. Later that afternoon, I complained to a friend. She said, “Why don’t you try voice recognition? I used it a long time ago on my Mac because I have arthritis in my hands and I can’t type very fluidly. I would still be using it if the software worked on my current PC.”
Some time ago, I had looked at Dragon Naturally Speaking voice recognition software and the reviews on Amazon were pretty stinking (for Dragon NS v.10 running on a Vista 64-bit PC). (Whether or not that a good PC to have is an entirely different question.) I was astounded to learn that Microsoft Vista shipped with a free voice recognition application. I was thrilled! “Free” is a very good word when you’re in start up.
I looked at a number of blogs and reviews of the Microsoft Voice Recognition product and most of them were pretty enthusiastic. People had been able to get it working. Most people pointed out that at least in the beginning, they could type faster than they could dictate and the learning curve for the product was reasonably steep. However, I could type much faster than I could dictate, too, until I couldn’t. I didn’t see any reviews from people who had come to voice recognition suddenly, through an accident that affected their ability to type.
I found an old set headset and set to work. As I went along, it appeared that the program was learning to identify my voice. At times it was almost thrilling: I could speak and read and transcribe text, thoughts, books, my “getting things done” list, you name it—the words were flowing. I could create again; I could get back to work; and all was well with the universe.
For a while, that is. I was taking notes from a series of books, some by Dan Kennedy and one by Naseem Taleb (The Black Swan), when I started to observe the software behaving very erratically. Sometimes, the words would just flow. The program would recognize numerals, command keys such as the tab, and just about any text I wanted to throw at it. Unfortunately, “sometimes” was the operative word. The program wasn’t reliable. I could not figure out what made the difference between when the system wanted to recognize my voice, and when it decided it didn’t know who I was or what I was saying. There are only so many times that you can dictate the same paragraph only to look up and discover that the system is stalling on the first phrase and lost everything else you said after the comma.
For the record, I discovered that Microsoft’s voice recognition program knows virtually no invective. (For that matter, it transcribed the work “invective” as “ineffective,” which is an interesting and probably apt mistake). I would include the program’s attempts to capture my irritation here if I didn’t think you would recognize what it was I was yelling at the time.
I have a very short fuse for technical trouble. I believe myself to be reasonably patient and calm under many circumstances, when my PC acts up I am driven to computer rage in no time flat. I needed to work; I believe voice recognition was a tool that would help me do that work; the Microsoft voice recognition product let me know that the technology existed today; and this product wasn’t the one for me.
Note: none of the blogs and reviews I read discussed using the MS VR product over any length of time. I found one Q&A on the Microsoft support center where someone with a Vista 64-bit PC had trouble with the product suddenly failing to recognize her voice and going into verification mode with every other phrase; the solution MS suggested to her did not work for her and it did not work for me either. If the product had not worked at all, I might’ve given up on voice recognition altogether. However it did work, some of the time, and when it did it was brilliant. I wanted the technology. I went to Nuance.com.
Part 2: Dragon Naturally Speaking, V. 10.1
I went to the Dragon Naturally Speaking site (nuance.com). I looked at the options, the reviews on Amazon, and called customer support. I asked them if they had a product for the 64-bit machines. They told me I needed 10.1. I asked about the different versions and the price made the decision for me: I’d love the Professional, which works with MS Outlook, but that’s $900. We’re talking about a dislocated pinkie finger… I can afford the Preferred option, and for now, I’ll simply compose in MS Word and then paste copy into email. If my finger never gets back to full functionality, I may have to upgrade. Then again, I may just discover I love this technology enough that a $600 upgrade would repay the investment.
The software and microphone arrived via UPS. First you have to install the software, then set up the microphone, then perform sound checks on the microphone. Finally, the system asks you to read a reasonable amount of text, perhaps 10 minutes or so. One of Scott Adams’ books is an option, as is a speech by John Kennedy. I haven’t tried the Kennedy speech yet. Setting up the program to this point took about an hour.
Before leaving for the day, I opened one of my documents to test the system. Was I ever disappointed! It didn’t appear that the system was running any better than the Microsoft version had done. I would dictate and go on and the system would start to recognize me, and then asked for verification of what I had just dictated, and stall up and lose what I’ve said. Because it was close to the end of the workday, I left it thinking that perhaps a reboot on the system would make a difference.
Indeed, a reboot appeared to make a slight difference, but the bigger improvement came from A) additional training, and B) syncing my training files with the user file created for my profile. It is not intuitively obvious that this needs to be done. Admittedly, I am not the most technically literate user on the planet. The software ships with a tutorial that may explain some of this. (1/2011 update: in general, I’ve found the software’s Help files to be pretty lacking.) When the files were synced, the system performed the way a $300 product should.
48 hours after the installation, I have to say, “I’m good to go.” I have dictated two multi-page blog posts, content which has been building up inside me for quite some time. I have unloaded notes from meetings into my tracking system with almost no effort whatsoever, much faster than I would have been able to do even full speed typing. At this point, the system doesn’t completely recognize all the words I used, including the name of my company, red tuxedo, but it’s learning. I plan on reading a number of the additional training modules and syncing my files and I expect that the accuracy of Dragon’s voice recognition will continue to improve over time.
In addition, I’m learning the little tricks that aren’t too obvious, such as the fact that Dragon NS won’t let you start dictating into a file that hasn’t been saved. There was no end of frustration with that one. It also appears that you might have to stop and restart the application in order to change microphones, as happened today when my Bluetooth went dead.
I will also have to learn a little bit more about editing and system commands than I do at the moment. Sometimes the system does what I ask it to do, and sometimes it types the command as dictation. The MS product was worse at this, however.
1/2011 update, about the Bluetooth microphone:
POS. Junk, if that acronym doesn’t jump to mind immediately. It’s a one-button, three function piece of plastic that has very complicated and hard-to-find instructions.
- You have to have the microphone in your ear to hear the tones that indicate it’s on, and then that it’s synced to your PC, and if you get it wrong, you are adjusting volume.
- Pressing an on-off-sync switch (1.5 seconds for sync, 3 seconds for Off) while it’s in your ear means you’re pushing hard plastic into the inside of your ear. This gets old fast.
- Every time you turn your laptop off, you have to resynch. This system might work on a desktop, but it’s hell on a laptop that travels.
- After enough presses, the on-off-sync-volume button breaks in half, and your $100 investment is junk.
Part 3: on learning to dictate content
At least one person in the various reviews of voice recognition software commented that, “he was an introvert and he wasn’t able to think (compose) out loud,” that typing simply worked better for him.
Well, so am I (an introvert), and I’m learning (to compose out loud).
How my writing creativity will shift with voice recognition has yet to be seen. However I’m a more than a little bit intrigued by the process and the prospect. Alphabetic literacy is an inherently left brain activity. It is sequential, it’s highly organized, and it flows in one direction. I know there is a difference in my creative thinking between handwriting notes and typing. I am able to pay attention to different things when I am handwriting, and I have met people using tablet PCs who believe the same to be true for themselves. Moving to a dictated form of creation would appear to engage a little bit less of the left brain.
I have my eyes closed as I speak these words and I am not in the least distracted by the typos and miss-recognitions coming up on the screen, certainly not any more so than I would be if I were watching my own fingers create the content from the keyboard. I am also aware that I speak very different rhythms and content than when I write. This is probably true for almost anybody. However, I happen to like what I say generally more than I like what I write. It will be interesting to see if I am able to be as engaging when I dictate to the page as I am when I speak. At the moment, the jury is out.
Having to verbally insert punctuation is a trick, but Dragon NS will learn to do this for me. Not there yet.
On the other hand, I have now created four pages of somewhat useful content in about half an hour. My hand is not exhausted. The ring finger is not cramping. I am not frustrated from repeatedly reaching for the “s” and hitting the “dash,” or reaching for the “l” and hitting “r” (Dvorak layout). I have heard other voice-recognition users complain about the time it takes to correct recognition mistakes and typos. Given that I am hand-disabled at the moment, I’m not sure I want to be doing a whole lot of keyboard revising. As I thought about editing one of these posts, it crossed my mind that it might be a good idea to simply make the corrections on the paper copy, and then re-dictate the entire thing. I’ll let you know. (Mixed results on the editing of this article; most was done on the keyboard. I do have comfortable use of the mouse.)
One thing I recognize for sure is that using dictation allows me to create, in the words of Anne Lamott, “really bad first drafts,” in a hurry. I understand that the word “writing” addresses an enormous universe of content generation. I am even amused that the idea of “speaking into a microphone” is considered “writing” when the result appears on the screen in the form of text. A few days ago, I discussed VR technology with a number of IP lawyers at an after-hours social event. They had a variety of responses to the technology. Some of them thought that they clearly composed better when they typed, and that their co-workers who dictated made mistakes that would have been caught at a keyboard. When it comes to IP briefs and opinions, that may be true. Certainly I don’t want to pay for “really bad first drafts” when I’m hiring an IP lawyer. On the other hand, blog content is hardly in the same class as IP legal opinion. (Although, anyone who wants to pay me IP legal rates to read my blog postings, feel free!)
Preliminary conclusion: voice-recognition software, when it works, can be a major productivity enhancement. When you come to voice-recognition through some kind of traumatic event, like I did, it can be even more important to your life. In the time since I began this post and when I published it, another friend of mine fell seriously injured her hands. She can’t even hold a telephone. Voice recognition technology may allow her to work. (Incidentally I have another post coming about postoperative productivity, which addresses and responding to physical disability.)
Post-pin removal update
Part 4: Digital Voice Recorder and Dragon Naturally Speaking 11
Dragon contacted me about buying the upgrade to version 11, which is supposed to work very much better than 10. I bought it, and while I was on the phone with the salesperson, I also bought a Phillips Digital Voice Recorder (DVR). I do a lot of note-taking from books and it crossed my mind that it would be more fun to dictate into a recorder while reading on the couch than to have to sit near my PC so I could see the screen while I read the relevant text out loud.
The results are mixed.
I trained Dragon to recognize my dictation voice for the DVR by reading a set text aloud. From this recording, it creates a profile for me. As long as I am reading text, either my words or someone else’s, Dragon does a pretty good job with transcription. It’s very good when I read my own handwritten notes into typed copy.
However, if I am dictating “free form,” such as you might do to create a blog post out of your head, it gets completely lost. The transcription is so bad I can barely discern what I was talking about. I experimented with what turned out to be 9 pages of dictation. Perhaps half is recognizable, and the other half gibberish.
It’s possible that if I planned my posts better, I would be able to speak with more conviction, and a faster rate. I think Dragon uses context when it can’t completely identify a word, and when I’m freestyling, my words come out much more slowly, perhaps changing the context algorithms. Doesn’t matter. The dictation was not as success. It will take me as long to clean up the text as it would to type it out again, using the initial mind maps as a starting point.
I had hoped I’d be able to learn to speak my posts, in part because I think my spoken word is more fluid than what I write. When I look at this transcription, I’m not so sure I’m right. If the transcription is roughly “true” (which it rather has to be), I’m much more clear and straightforward when I write. It is possible that the recorder would work better if it was recording a rehearsed speech, something I’d delivered a number of times so that the words flowed almost as fast as I type. At this moment, I can’t think why I’d want to record a speech. YMMV.
Follow Us!