Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rate boost support for SAPI5 voices #17610

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

gexgd0419
Copy link
Contributor

@gexgd0419 gexgd0419 commented Jan 11, 2025

Link to issue number:

Closes #17606

Summary of the issue:

SAPI5 voices do not support rate boosting, but some of the SAPI5 voices are not fast enough even at the highest rate for some experienced users.

Description of user facing changes

The "rate boost" option will be available to users when using SAPI5 voices, which supports rates ranging from 0.5x to 6x.

If rate boost is disabled, the behavior will be the same as before.

Description of development approach

The Sonic library, which is also used by eSpeak NG, is used to change the speed when rate boost is enabled.

When rate boost is enabled, to preserve quality, the SAPI5 voice is set to output at its original speed (1x), and then Sonic is used to change the speed of the original audio. When rate boost is disabled, Sonic is no longer used to change the speed, and the rate of the SAPI5 voice itself is set instead to preserve the previous behavior.

As Sonic is used by eSpeak NG, it has already been included in the NVDA repo as a submodule (/include/sonic/). However, in eSpeak NG, it is compiled as a static library, which cannot be easily reused. So some build steps are changed to build Sonic as a DLL, sonic.dll, instead, which is installed in the synthDrivers folder. eSpeak-NG is also changed to dynamically link to sonic.dll. As importing functions from DLL needs __declspec(dllimport), the header file sonic.h is copied to nvdaHelper/eSpeak and then have __declspec(dllimport) added to functions, which replaces the original sonic.h file when compiling eSpeak.

A new file _sonic.py is created inside synthDrivers to handle the interoperation with sonic.dll. There's initialize() to load the Sonic DLL which is called in speech.initialize(), and there's a wrapper class SonicStream for the Sonic stream mode functions.

The SAPI5 synthesizer now passes the audio through a SonicStream first, before sending the audio to the WavePlayer. To speed up audio processing in Sonic, which uses 16-bit integer wave format internally, we explicitly choose a 16-bit wave format for the SAPI5 voice and the WavePlayer to avoid unnecessary format conversion.

This is the approach I chose currently. The implementation details are open for discussion. Other ways I've thought of:

  • Move the Sonic library inside nvdaHelperLocal, and process the audio with Sonic in WasapiPlayer before feeding the data to the device. Then add some functions such as getRate and setRate to the WavePlayer.
  • Implement some kind of "audio plugin" system to allow easy modification to audio streams.

Testing strategy:

Seemed to work on my system.

Known issues with pull request:

None

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@coderabbitai summary

@gexgd0419 gexgd0419 changed the title Sapi5 rate boost Add rate boost support for SAPI5 voices Jan 11, 2025
@gexgd0419 gexgd0419 marked this pull request as ready for review January 12, 2025 12:48
@gexgd0419 gexgd0419 requested a review from a team as a code owner January 12, 2025 12:48
@SaschaCowley
Copy link
Member

@gexgd0419 do you know if it would be possible to switch eSpeak-ng to use the dynamically linked Sonic as well?

@gexgd0419
Copy link
Contributor Author

Sonic itself is not prepared to be exported as a DLL. There isn't any __declspec(dllexport) or __declspec(dllimport) used in the header file, so I added a .def file to export its functions.

When importing functions from a DLL, typically __declspec(dllimport) should be used, which is missing in sonic.h. We can choose to use our own copy of the header that have __declspec(dllimport) added to the functions, but then we'll have to maintain this every time Sonic is updated.

We can also just use the original header file without __declspec(dllimport). It can still work, but the compiler won't apply some optimizations for imported functions. See also this SO question and this blog post.

Should we change the header file or not?

@gexgd0419
Copy link
Contributor Author

Here eSpeak is changed to use sonic.dll, and the sonic.dll is moved to the synthDrivers directory.

To add __declspec(dllimport) to the functions, I put a modified copy of sonic.h inside nvdaHelper/eSpeak.

@SaschaCowley Is this a better way? If so, I will change the development approach above.

You can use the original header file without __declspec(dllimport) (reverting the last commit), and it can still work. It's just that calls to Sonic functions go through an extra jump.

@AppVeyorBot
Copy link

See test results for failed build of commit c7d0ad8f83

@LeonarddeR
Copy link
Collaborator

I really like your contributions @gexgd0419!
The only concern I have, is how indexing is supposed to work when Sonic is active. Is that still accurate?

@SaschaCowley
Copy link
Member

@gexgd0419 my main concern was having 2 copies of Sonic in NVDA: a statically linked one for eSpeak, and a dynamically linked one for SAPI5 (and potentially other synths in future). It sounds like maintaining the dllexport and dllimport headers for eSpeak on our side will be unmaintainable, and dynamically linking eSpeak without them will come at a performance penalty, which is the opposite of what we're trying to achieve here. Apart from an increase in built size, is there any disadvantage to having 2 copies of sonic? Is creating the dllexport and dllimports upstream in sonic something you'd be willing to either do, or open an issue for? @seanbudd what are your thoughts here?

@seanbudd seanbudd added the blocked/needs-product-decision A product decision needs to be made. Decisions about NVDA UX or supported use-cases. label Jan 13, 2025
@gexgd0419
Copy link
Contributor Author

There is a PR in sonic waywardgeek/sonic#27 that does something similar, but it hasn't been merged for years.

@gexgd0419
Copy link
Contributor Author

gexgd0419 commented Jan 14, 2025

sonic.dll is 87.5 KiB. eSpeak using static sonic is 637 KiB, while using sonic DLL is 632 KiB, which only decreases 5 KiB in size after splitting out the Sonic part. So it will be larger if we decide to build Sonic as a standalone DLL.

The "performance penalty" for not using dllimport is that all calls to Sonic functions will go to a thunk first, then jump to the actual target stored in the import table. If dllimport is used, the thunk will be inlined. I guess that the performance penalty would be small compared to the transferring and calculation of audio data.

If whole-program optimization is enabled, the linker will be able to figure out that the Sonic functions are actually imported from a DLL, and apply the optimization even without dllimport, so there would be no "performance penalty". However, whole-program optimization is intentionally turned off for eSpeak.

# Whole-program optimization causes eSpeak to distort and warble with its Klatt4 voice
# Therefore specifically force it off
"/GL-",

I'm not sure what exactly the reason would be, but turning on whole-program optimization can benefit a lot. For example, calls to statically linked Sonic functions can be inlined, and the eSpeak DLL size dropped to 624 KiB even with a statically linked Sonic inside. It's unfortunate that some bugs prevented us from enabling whole-program optimization.

Edit: The "disable whole-program optimization" part was actually introduced in cf0443b which was 10 years ago! The issue was caused by MSVC 2012, but now we are using MSVC 2022. It is possible that this issue got fixed later.

@LeonarddeR
Copy link
Collaborator

@gexgd0419 You might have missed my question or may be you just didn't have time to answer it.
Could you please elaborate on how you tested/warranted whether indexing still works correctly with rate boost?

@gexgd0419 gexgd0419 marked this pull request as draft January 17, 2025 03:07
@gexgd0419
Copy link
Contributor Author

@LeonarddeR From my manual tests, it seemed that this rate boost method doesn't affect the accuracy of indexing much.

From the Sonic project page:

The other way to use libsonic is in stream mode. This is more complex, but allows sonic to be inserted into a sound stream with fairly low latency. The current maximum latency in sonic is 31 milliseconds, which is enough to process two pitch periods of voice as low as 65 Hz. In general, the latency is equal to two pitch periods, which is typically closer to 20 milliseconds.

Here the stream mode is used to minimize latency. According to that, the maximum latency is 31ms, so theoretically the position of an index could be off by at most 31ms. But I couldn't find the difference in index accuracy during testing. (I added a beep sound when a bookmark is reached to test this)

Also, in commit cf0443b the whole-program optimization of eSpeak was turned off due to a bug when compiling with MSVC 2012. Could you help me check whether the bug has been fixed? I tried but couldn't find the difference in audio output between whole-program optimization on and off, so maybe it has been fixed already. If this is the case, I can turn the whole-program optimization on, revert the "include __declspec(dllimport)" part, use the original header of Sonic and let this PR continue.

@seanbudd seanbudd added conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. and removed blocked/needs-product-decision A product decision needs to be made. Decisions about NVDA UX or supported use-cases. labels Jan 20, 2025
@gexgd0419 gexgd0419 marked this pull request as ready for review January 21, 2025 10:02
@gexgd0419
Copy link
Contributor Author

@SaschaCowley The dllimport part has been reverted. Now sonic will be compiled using its original header, which is fine if #17631 can be merged.

@AppVeyorBot
Copy link

  • PASS: Translation comments check.
  • PASS: License check.
  • PASS: Unit tests.
  • PASS: Lint check.
  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit cb12a2a861

nvdaHelper/espeak/sconscript Outdated Show resolved Hide resolved
nvdaHelper/espeak/sonic.def Show resolved Hide resolved
nvdaHelper/espeak/sonic.def Show resolved Hide resolved
source/synthDrivers/_sonic.py Outdated Show resolved Hide resolved
source/synthDrivers/_sonic.py Outdated Show resolved Hide resolved
source/synthDrivers/_sonic.py Outdated Show resolved Hide resolved
source/speech/__init__.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
source/synthDrivers/sapi5.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add rate boost support for SAPI5 voices
5 participants