Voice Recognition Software
Part One: Microsoft Voice Recognition Software
When my surgeon said that the pin in my finger had to stay in for another four weeks, rather than coming out on a Friday afternoon (10 days post-op) the way I had expected, I was upset beyond belief. I had stumbled through typing with nine fingers rather than the ten I’m used to using, but extended periods of typing made my hand cramp. In addition, my thinking process was massively impaired because I had to keep stopping to check and correct new-to-me typos (on the s, l, r, -, and n keys*), rather than letting my words flow into something that my spellchecker could recognize and fix for me. I had been able to put off a lot of my typing life for the first 10 post-operative days; I didn’t see how I could cope or make any income at all if I had to put as much off for four more weeks as I had postponed for 10 days. Later that afternoon, I complained to a girl friend. She said, “Why don’t you try voice recognition? I used it a long time ago on my Mac because I have arthritis in my hands and I can’t type very fluidly. I would still be using it if the software worked on my current PC.”
Some time ago, I had looked at Dragon Naturally Speaking, and the reviews on Amazon were pretty stinking (for Dragon 10 running on a Vista 64-bit PC. (Whether or not that a good PC to have is an entirely different question.) I was astounded to learn that Microsoft Vista shipped with a free voice recognition application. I was thrilled! “Free” is a very good word when you’re in start up.
I looked at a number of blogs and reviews of the Microsoft product and most of them were pretty enthusiastic. People had been able to get it working. Most people pointed out that at least in the beginning, they could type faster than they could dictate and the learning curve for the product was reasonably steep. However, I could type much faster than I could dictate, too, until I couldn’t. I didn’t see any reviews from people who had come to voice recognition suddenly, through an accident that affected their ability to type.
I found an old set headset and set to work. As I went along, it appeared that the program was learning to identify my voice. At times it was almost thrilling: I could speak and read and transcribe text, thoughts, books, my “getting things done” list, you name it—the words were flowing. I could create again; I could get back to work; and all was well with the universe.
For a while, that is. I was taking notes from a series of books, some by Dan Kennedy and one by Naseem Taleb (The Black Swan), when I started to observe the software behaving very erratically. Sometimes, the words would just flow. The program would recognize numerals, command keys such as the tab, and just about any text I wanted to throw at it. Unfortunately, “sometimes” was the operative word. The program wasn’t reliable. I could not figure out what made the difference between when the system wanted to recognize my voice, and when it decided it didn’t know who I was or what I was saying. There are only so many times that you can dictate the same paragraph only to look up and discover that the system is stalling on the first phrase and lost everything else you said after the comma.
For the record, I discovered that Microsoft’s voice recognition program knows virtually no invective. (For that matter, it transcribed the work “invective” as “ineffective,” which is an interesting and probably apt mistake). I would include the program’s attempts to capture my irritation here if I didn’t think you would recognize what it was I was yelling at the time.
I have a very short fuse for technical trouble. I believe myself to be reasonably patient and calm under many circumstances, when my PC acts up I am driven to computer rage in no time flat. I needed to work; I believe voice recognition was a tool that would help me do that work; Microsoft’s voice recognition product let me know that the technology existed today; and this product wasn’t the one for me.
Note: none of the blogs and reviews I read discussed using the MS VR product over any length of time. I found one Q&A on the Microsoft support center where someone with a Vista 64-bit PC had trouble with the product suddenly failing to recognize her voice and going into verification mode with every other phrase; the solution MS suggested to her did not work for her and it did not work for me either. If the product had not worked at all, I might’ve given up on voice recognition altogether. However it did work, some of the time, and when it did it was brilliant. I wanted the technology. I went to Dragon.
Part 2: Dragon Naturally Speaking
I went to the Dragon Naturally Speaking site (nuance.com). I looked at the options, the reviews on Amazon, and called customer support. I asked them if they had a product for the 64-bit machines. They told me I needed 10.1. I asked about the different versions and the price made the decision for me: I’d love the Professional, which works with MS Outlook, but that’s $900. We’re talking about a dislocated pinkie finger… I can afford the Preferred option, and for now, I’ll simply compose in MS Word and then paste copy into email. If my finger never gets back to full functionality, I may have to upgrade. Then again, I may just discover I love this technology enough that a $600 upgrade would repay the investment.
Dragon arrived in the afternoon via UPS. First you have to install the software, then set up the microphone, then perform sound checks on the microphone. Finally, the system asks you to read a reasonable amount of text, perhaps 10 minutes or so. One of Scott Adams’ books is an option, as is a speech by John Kennedy. I haven’t tried the Kennedy speech yet. Setting up the program to this point took about an hour.
Before leaving for the day, I opened one of my documents to test the system. Was I ever disappointed! It didn’t appear that the system was running any better than the Microsoft version had done. I would dictate and go on and the system would start to recognize me, and then asked for verification of what I had just dictated, and stall up and lose what I’ve said. Because it was close to the end of the workday, I left it thinking that perhaps a reboot on the system would make a difference.
Indeed, a reboot appeared to make a slight difference, but the bigger improvement came from A) additional training, and B) syncing my training files with the user file created for my profile. It is not intuitively obvious that this needs to be done. Admittedly, I am not the most technically literate user on the planet. Dragon NS provides a tutorial of some form that may explain some of this. When the files were synced, the system performed the way a $300 product should.
48 hours after the installation, I have to say, “I’m good to go.” I have dictated two multi-page blog posts, content which has been building up inside me for quite some time. I have unloaded notes from meetings into my tracking system with almost no effort whatsoever, much faster than I would have been able to do even full speed typing. At this point, the system doesn’t completely recognize all the words I used, including the name of my company, red tuxedo, but it’s learning. I plan on reading a number of the additional training modules and syncing my files and I expect that the accuracy of Dragon’s voice recognition will continue to improve over time.
In addition, I’m learning the little tricks that aren’t too obvious, such as the fact that Dragon won’t let you start dictating into a file that hasn’t been saved. There was no end of frustration with that one. It also appears that you might have to stop and restart the application in order to change microphones, as happened today when my Bluetooth went dead.
I will also have to learn a little bit more about editing and system commands than I do at the moment. Sometimes the system does what I ask it to do, and sometimes it types the command as dictation. The MS product was worse at this, however.
Part 3: on learning to dictate content
At least one person in the various reviews of voice recognition software commented that, “he was an introvert and he wasn’t able to think (compose) out loud,” that typing simply worked better for him.
Well, so am I (an introvert), and I’m learning (to compose out loud).
How my writing creativity will shift with voice recognition has yet to be seen. However I’m a more than a little bit intrigued by the process and the prospect. Alphabetic literacy is an inherently left brain activity. It is sequential, it’s highly organized, and it flows in one direction. I know there is a difference in my creative thinking between handwriting notes and typing. I am able to pay attention to different things when I am handwriting, and I have met people using tablet PCs who believe the same to be true for themselves. Moving to a dictated form of creation would appear to engage a little bit less of the left brain.
I have my eyes closed as I speak these words and I am not in the least distracted by the typos and miss-recognitions coming up on the screen, certainly not any more so than I would be if I were watching my own fingers create the content from the keyboard. I am also aware that I speak very different rhythms and content than when I write. This is probably true for almost anybody. However, I happen to like what I say generally more than I like what I write. It will be interesting to see if I am able to be as engaging when I dictate to the page as I am when I speak. At the moment, the jury is out.
Having to verbally insert punctuation is a trick, but Dragon NS will learn to do this for me. Not there yet.
On the other hand, I have now created four pages of somewhat useful content in about half an hour. My hand is not exhausted. The ring finger is not cramping. I am not frustrated from repeatedly reaching for the “s” and hitting the “dash,” or reaching for the “l” and hitting “r” (Dvorak layout). I have heard other voice-recognition users complain about the time it takes to correct recognition mistakes and typos. Given that I am hand-disabled at the moment, I’m not sure I want to be doing a whole lot of keyboard revising. As I thought about editing one of these posts, it crossed my mind that it might be a good idea to simply make the corrections on the paper copy, and then re-dictate the entire thing. I’ll let you know. (Mixed results on the editing of this article; most was done on the keyboard. I do have comfortable use of the mouse.)
One thing I recognize for sure is that using dictation allows me to create, in the words of Anne Lamott, “really bad first drafts,” in a hurry. I understand that the word “writing” addresses an enormous universe of content generation. I am even amused that the idea of “speaking into a microphone” is considered “writing” when the result appears on the screen in the form of text. A few days ago, I discussed VR technology with a number of IP lawyers at an after-hours social event. They had a variety of responses to the technology. Some of them thought that they clearly composed better when they typed, and that their co-workers who dictated made mistakes that would have been caught at a keyboard. When it comes to IP briefs and opinions, that may be true. Certainly I don’t want to pay for “really bad first drafts” when I’m hiring an IP lawyer. On the other hand, blog content is hardly in the same class as IP legal opinion. (Although, anyone who wants to pay me IP legal rates to read my blog postings, feel free!)
Preliminary conclusion: voice-recognition software, when it works, can be a major productivity enhancement. When you come to voice-recognition through some kind of traumatic event, like I did, it can be even more important to your life. In the time since I began this post and when I published it, another friend of mine fell seriously injured her hands. She can’t even hold a telephone. Voice recognition technology may allow her to work. (Incidentally I have another post coming about postoperative productivity, which addresses and responding to physical disability.)
{ 2 comments… read them below or add one }
Excellent blog. I was just searching for a blog or a solution for transferring my user profile from one computer used at home to the other one being used in my office. I through Google search engine was directed to your blog. Really you have very nicely narrated your experiences and apprehensions. I have too started using Dragon 10 and I really find it much more accurate and speedier than its earlier version that I had tried about 6-7 years back. I am from India and we speak English in different accent. I while browing some website came to know that Dragon has come out with Indian spoken English version of its software. I tried it and am finding it very effective. Like you, I am also planning to use it for writing articles and books but I am having apprehensions whether I will be able to dictate creatively and effectively in English which is not my mother tongue. Because I am not that fluent in English, I take a lot of time in thinking in English and dictating. I get up stuck up time and again in between sentences while dictating. I find Dragon has not yet fully recognised my speech and pronunciation. But I do hope that after repeated practice, I would be able to have more effective dictations.
By the way, you talked about syncing. What is that?
With all good wishes and my heart felt compliments to you for such a good and encouraging write up on the use of speech recognition software.
SKJ
Thank you!
“Syncing” is (my term for) the link between the Dragon database and what you have recently dictated–the process by which Dragon learns to recognize YOUR voice, separate from what its files think you are saying.
In my application, it happens under the Accuracy Center option under Tools. You may already do this, if you call it Running the … Optimizer, the way Dragon labels it.