Nach dem Hinweis von Frank68 zeigt ein Blick auf die Nuance-Shop-Seite: Der Drache Nr. 15 ist im Anflug auf Deutschland, erstmal in der Publikumsversion Dragon Professional Individual. Ich erlaube mir ein Zitat aus der "Was ist neu"-Rubrik der Seite: "Mit seiner Spracherkennungsengine der nächsten Generation erweitert Dragon Professional Individual 15 die Möglichkeiten der Spracherkennung und steigert erneut seine Genauigkeit gegenüber den Vorgängerversionen. Dragon erkennt Diktat äußerst genau und setzt Deep-Learning- und eine Anpassungstechnik ein, um sich fortlaufend auf Ihre Stimme und sich ändernde Umgebungsbedingungen einzustellen – selbst während des Diktats. Die Vorteile von Dragon 15: ?Bis zu 10 % mehr Genauigkeit schon ab der ersten Verwendung und ohne Sprachtraining ?Lernt – auch während des Diktats – kontinuierlich hinzu und macht damit die Verwendung noch angenehmer ?Optimale Erkennung auch von Sprechern mit Akzent oder bei Hintergrundgeräuschen (Großraumbüro)" http://shop.nuance.de/store/nuanceeu/de_...utm_content=dpi
Was auch immer man vom Marketing-Sprech hält: Dieses "Deep Learning" und die "Anpassungstechnik" scheinen ja etwas anderes zu sein als das altbekannte OUFA. Ich bin gespannt auf die Diskussion dazu und erste Erfahrungsberichte. Ich selbst muss mich ja editionsbedingt sicher noch etwas länger gedulden....
Ok, anderswo läuft die Diskussion schon. Ich erlaube mir ausnahmsweise mal, statt vieler Worte auf ein zuweilen hirnloses Forum zu verweisen, zumal Co-Moderator Phil an dem verlinkten Diskussionsfaden beteiligt ist, und der ist reichlich "brainy":
I follow the technology for what is called "cognitive" or "neural network" computing very closely because my systems integration company also implements both rules as well as cognitive systems (usually natural language processing with written input and not speech input) for very large organizations.
It doesn't take too much effort to figure out that Nuance is been working on this for a while. There are press releases, conference proceedings, academic presentations by their named research staff as well as the Nuance R&D Notes websites which disclose quite a bit of information. Plus, anybody wants to take the time to do a Google search for the combination terms "Nuance Nvidia" or "IBM Nuance" will find quite a bit of additional information. Finally, anyone who attends the technical conferences in these areas would have heard even more detail.
Now, none of this general technical information, in the abstract, has anything to do with whether or not Nuance has been able to successfully repackage their algorithms from their GPU neural net R&D processors into a single core x86 architecture processor for commercial release. There is absolutely no reason to doubt that the engineering description in the Nuance announcement is reasonably accurate, at least at the usual level of product announcements. However, this is less of a technical challenge than creating an true, real time, adaptive learning system for speech recognition that doesn't break random recognition over extended use.
I'm looking forward to doing blackbox testing on the new release, both for my own intellectual curiosity as well as to see whether or not this is really going to be a better version for expert users, with superb dictation styles.
I will have the English version of the software available to me as soon as an unnamed vendor can release it. Fortunately, because I have free time up to the American holiday of Labor Day Monday, September 5, I should have two weeks to be able to do pretty much continuous blackbox testing. I should have a decent review posted by then.
After that date I need to go heads down to test and finish up up a cognitive/adaptive offering from my company that aims at replacing human tech support at Level I and most of Level II for a very complex security/Gateway/API product. I usually don't mention my business on the forums, but I'm very excited that this work was considered significant enough that I will be presenting and demonstrating live at the upcoming IBM World of Watson conference in October.
Zitat von philsbut I'm very excited that this work was considered significant enough that I will be presenting and demonstrating live at the upcoming IBM World of Watson conference in October
That's great, Phil, and you deserve it. Do keep us updated.
_______________________________________
Dragon Professional 16 auf Windows 10 Pro und Windows 11 SpeechMike Premium (LFH3500); Office 2019 Pro + Office 365 (monatliches Abo) HP ZBook Fury 17 G8 - i7-11800H - 24 MB SmartCache - 32 GB RAM - 1 TB SSD
Erkennungsgenauigkeit und Adaptibilität sind das eine, Stabilität und Kompatibilität ein anderes. Hat schon jemand konkrete Erfahrungen zu berichten, wie absturzsicher und kompatibel der 15. Drachen in der Kombination mit exotischer, aber hochfunktionaler Software wie etwa wie LaTeX / TexnicCenter ist?
Nuance is under intense competition across its various medical/business products and its balance sheet is very financially week. I would bet large sums of money that exotic programs will never be supported. For those of us, like myself, who must be hands-free and use exotic programs, it is necessary to use some combination of head mouse, Dragon scripts, other third-party macro products and/or UI mapping interfaces(VoicePower or SpeechStart+ Show Numbers) .
For this release, Nuance did not go with an extensive beta program including many of the Dragon VARs. Therefore, no one has had direct experience with it.
Phil, it's always a pleasure and honor to hear from you over here. Thanks so much for sharing your insights with us. And congratulations on your invitation to World of Watson. I do humbly concur with you regarding future support of exotic programs. It would seem to me that Nuance did concentrate on implementing the new approach to the recognition engine described in their blog (an approach which is totally beyond my grasp, by the way, unlike HMM).
On the Mac side of things, they have announced that Dragon Professional for Mac 6 will provide so-called full text control in more applications than before, but that relates to Outlook 2016, Apple pages and the like. Also, these announcements have to be taken with pounds of salt judging from experience regarding Word for Mac. Regarding the new speech engine I almost cannot wait to read about your future findings about whether it is ready for prime time.
There is likely to be a very significant change in the perceived behavior, by expert users, with the new speech engine in DPI15.
For those of us who have exotic custom vocabularies, the improvements since DNS 10 have made significant reductions in recognition errors. I could never get much above 95-96% with my technobabble vocabularies in version 10 or earlier. Now it's pretty easy for me to consistently get 99% after inputting my 1 million+ words of manicured training texts and my trained custom words/phrases followed by a couple of sessions of acoustic optimization.
For me 99% is functionally "perfect" because listening to recordings of my speech production, I personally make mistakes on a good day, so that I am producing speech at a statistical accuracy of a little over 99% so I need to make corrections anyway.
My concern, based on significant professional experience with "neural net" and "adaptive cognitive systems" is that the folks at Nuance may have improved the experience for the casual user, but, the unintended offset could be that the "Deep Learning" might possibly reduce accuracy for specialist vocabulary users like PG and myself.
We will see soon enough and I and making some new recordings to accurately measure any differences.
Zitat von philsWe will see soon enough and I am making some new recordings to accurately measure any differences.
Phil, while you are at it, I would like to ask you to also include a recording of less than perfect quality (choppy phrases, background noise, etc.).
_______________________________________
Dragon Professional 16 auf Windows 10 Pro und Windows 11 SpeechMike Premium (LFH3500); Office 2019 Pro + Office 365 (monatliches Abo) HP ZBook Fury 17 G8 - i7-11800H - 24 MB SmartCache - 32 GB RAM - 1 TB SSD
Zitat von philsMy concern, based on significant professional experience with "neural net" and "adaptive cognitive systems" is that the folks at Nuance may have improved the experience for the casual user, but, the unintended offset could be that the "Deep Learning" might possibly reduce accuracy for specialist vocabulary users
Phil, I'd love to know what makes you suspect that. Is it that the neural approach relies on a higher degree of "pre-manufacturing" of speech models using the powerful arrays of GPUs at the server farm?
earlier this year when I had a week off I used only the Genius microphone and I was never able to get above 95%, which I guess for most people is acceptable. It was terrible at distinguishing plosives ("Verschlusslaut" I think) which is made more difficult by my tendency towards a Midwestern mumble. Speaking of which I'm going to have a text where I'm going to read it several times:
my best dictation voice my not paying attention Midwestern mumble (my lovely wife always complains about the so I'll have her do quality control to make sure that my mumble is pretty bad) parallel dictation on the Airline77 and the genius microphone parallel dictation with repeatable oise on the Airline77 and the genius microphone
Phil, I'd love to know what makes you suspect that. Is it that the neural approach relies on a higher degree of "pre-manufacturing" of speech models using the powerful arrays of GPUs at the server farm?
At a summary level that's pretty much it. My specific concern is that Nuance has already said that they don't need BestMatch V anymore because the second pass was only required in less than 10% of the cases. With my custom vocabulary I had about a 0.5% improvement in recognition between BestMatch IV and BestMatch V.
Again at the summary level, Nuance's final engineering for the adaptive algorithms will have code that determines when the adaptation stops and is is "good enough". I'm sure there will be an improvement for sloppy colloquial dictation but will need to see the results with expert dictation combined withcustom vocabularies.
I will also be interested in seeing what kind of results the German version provides to some of the professional linguists on this forum. I believe I once read that there was someone, who specialized in the German language over the centuries, who had tens of thousands of custom words.
Zitat von philsI will also be interested in seeing what kind of results the German version provides to some of the professional linguists on this forum. I believe I once read that there was someone, who specialized in the German language over the centuries, who had tens of thousands of custom words.
You are probably thinking of the man who had a particular interest in citations from the Bible, due to his profession, but has withdrawn from the forum for some time after he retired.
Anyway, one thing I would like to investigate specifically as regards the new version will be how importing custom vocabulary will be handled. In all versions prior to 13, all it took was to simply import them from a file, with no additional training required, except for the odd occasion. That seems to have stopped however starting with version 13, where you would have to train virtually every item to make sure that they are recognized.
Quite frankly, I skipped DPI 14 altogether after only testing it very briefly, and couldn't really justify the invest in the company I work at, but will definitely give DPI 15 a shot.
_______________________________________
Dragon Professional 16 auf Windows 10 Pro und Windows 11 SpeechMike Premium (LFH3500); Office 2019 Pro + Office 365 (monatliches Abo) HP ZBook Fury 17 G8 - i7-11800H - 24 MB SmartCache - 32 GB RAM - 1 TB SSD
Zitat von R.Wilke[ ...That seems to have stopped however starting with version 13, where you would have to train virtually every item to make sure that they are recognized.
Yes I had to train my custom vocabulary to get correct recognition. However once I did that I got superb recognition.
I created my first technobabble test recording yesterday of approximately 5000 words. The text was free dictation in a style of my typical technical reports or whitepaper. The text provides a summary explanation of a complete end to end enterprise IT technology stack, from:
mobile applications into cloud managed APIs and security Cloud SaaS analytics and cognitive computing technologies (including Watson) back through a Cloud Connector into the complex infrastructure of the enterprise on premise data center.[/*]
This is a common textual style for my work and each sentence contains between one and two very unusual words. The document is rather formal style, at least by present-day standards..
I created the text using my usual normal free dictation workflow. That is I look at the deployment drawings on one screen while dictating into a text editor on the other screen. In this case, I also had the Windows Recorder running at the same time so that I had a real time recording of the free dictation that I use in multiple Dragon configurations. I was very careful with my enunciation, since I wanted to come up with a best case recording first. I made the recording in segments each one covering part of the overall technical infrastructure. I was careful with my dictation style, but otherwise it was pretty much a normal workflow for me.
After listening to the recording and hand correcting the reference text, I put the recording and the reference text through your (Rüdiger’s) accuracy comparison tool. I used my current profile which has the trained custom vocabulary, over 1 million words of technical documents analyzed by Dragon, as well is my business email replies on a monthly basis. Plus I run acoustic optimization on my work profile every week along with an export of the profiles to archive immediately before immediately after acoustic optimization. The resulting accuracy for this technobabble recording as per your tool was 99.3%. My normal work dictation technique is excellent but by being scrupulous with my dictation I achieved an additional 0.3% in accuracy.
Using a brand-new user profile, the accuracy for the same recording was 94.5% which is what would be expected looking at the frequency of unusual technobabble words. I'll use this recording to run tests on DPI 15 as to the impacts on custom vocabulary, document analysis and whatever adaptive learning I might be able to detect.
This weekend I'm also dictating a document that is more like the response to a technical RFP. I'm calling my recording the "Technobabble Challenge". The frequency of the specialized vocabulary will end up somewhere between two or three times that of the whitepaper style. My guess is that out-of-the-box Dragon will only get about 90% recognition due to the far higher frequency of technobabble terms. This recording will give me a better baseline to see any kind of incremental improvements in recognition with DPI 15.
Zitat von philsYes I had to train my custom vocabulary to get correct recognition. However once I did that I got superb recognition.
Yes, but one shouldn't have to do that, and one didn't have to up until and including version 12. In my opinion, the whole purpose of exporting and importing custom vocabulary is to have them readily available right from the get-go without any additional training entailed. My assumption, based on observation merely, is that, along with a number of changes in the algorithms going into the language model, a modification of some sort which certainly has been underlying every new version one way or the other, a mistake has been made in adjusting the probability weight for custom words added via import, as long as they are not trained. Oddly enough, they don't even show up in the correction menu although they are present in the vocabulary, unless you train them, in version 13 that is.
I can't speak for DPI 14 and how custom words were treated there, but hopefully, DPI 15 will do better in this area.
As regards your preparing the black box testing, in my experience, a 5000 words recording is just about the appropriate sample size to go with. Keep up the good work, and we're all intrigued to read about the results in the not-too-distant future. I think I've mentioned this in the past, but Nuance should actually pay you for your efforts.
_______________________________________
Dragon Professional 16 auf Windows 10 Pro und Windows 11 SpeechMike Premium (LFH3500); Office 2019 Pro + Office 365 (monatliches Abo) HP ZBook Fury 17 G8 - i7-11800H - 24 MB SmartCache - 32 GB RAM - 1 TB SSD