Home - Knowledge
Center - Healthcare Technologies
- Emerging Technologies
CONTINUOUS SPEECH RECOGNITION
Zadinksy et al in their paper entitled Speech Technologies And
Its Impact on Medical Decision, presented at the 2002 Annual HIMSS
Conference, defined speech recognition as “The conversion of spoken words into
computer text. Speech is digitised and matched against a dictionary of coded
waveforms. The matches are converted into text as if the words are typed on the
keyboard.”
The first quest into the development of speech recognition was
carried out in the 1940’s by the US Department of
Defense. In the 1950’s, research was funded by the
Defense Advanced Research Project Agency (DARPA) and in 1952;
Bell Laboratories developed an automatic speech recognition system that
could successfully identify the digits from 0 to 9 spoken to it over the
telephone.
Subsequent improvements were few and far in-between, mainly
restricted to laboratories and required serious computing power. The
introduction of continuous speech recognition (CSR) and high-speed processor
chips saw the technology move on to the Personal Computer (PC).
advertisement
There are currently two types of speech recognition; discrete
and continuous. Discrete speech recognition processes speech word by word and
has a slower dictation process. Early speech recognition was based on this
model. Continuous speech recognition processes speech by phrases and takes
context into account. It is faster but lass accurate if the phrases are broken.
Modern speech technology applications are based on this model.
The “moderate success” of continuous speech recognition
applications has been restricted mainly to clinical domains where they have
been used in scenarios where dictation is required - autopsy and radiology
reports, and operating rooms. They also used
in the area of medical transcriptions. Electronic medical records (EMRs)
vendors such as McKesson have added the
technology as an optional feature on their products.
Many have tried to champion CSR as a viable technology but
have been firmly and probably rightly met by some opposition. Opposition have
argued that “out of box” applications have an error rate of 5 – 10%and at best
improve to a 3% error rate with training of the computer, where it learns to
recognise the user’s voice. Documents have to created have to corrected for
errors during creation or afterwards. Some argue that the correction process
can sometimes take as long as actually typing out the whole document using a
keyboard.
Also it has been noted that to ensure the quality of the
recordings that ambient noise must be kept to a minimum and this usually means
finding a quiet room with little interruptions (which is not always
practicable) and the use of high quality or special headsets and microphones.
Commercial software organisations involved in the production
of CSR applications have gone through some changes in the past few years. In
1996, they were four major players in the market – Dragon Systems,
IBM, Lernout and Heuspie (L&H) and Philips. However in the year
200, Dragon Systems was bought over by L&H, who 18 months later declared
bankruptcy and was bought by ScanSoft This
might be indication of how difficult these firms are finding it to shift their
products.
It is not all doom and gloom for CSR, as there is an
increasing number of success stories about the adoption of voice recognition
telephony by firms outside of the healthcare sector. These firms are beginning
to see reasonable returns on their investments.
Continuing research, improvements in the accuracy of natural
sounding “continuous speech”, and the involvement of established technology
firms such as Microsoft,
Apple and IBM, mean
that it would not be long before applications would have widely acceptable
errors rates within the healthcare sector and push itself towards becoming a
key healthcare technology.
|