Bugzilla – Bug 485
Sliding Time Scale quality improvements
Last modified: 2018-08-20 11:46:05 UTC
Created attachment 243 [details] same as previously tested patch The sbsms TimeScale effect creates an audible discontinuity at the beginning of the processed selection. This is because it zero-pads the beginning of the input instead of padding it with existing audio. I have changes which address this, and add further quality improvements in sbsms2_TimeScale.patch (attached, but unchanged from previous, tested submission)
Please document issues clearly :=). As I understand it, these are r11214 / r11215 which were previously committed then reverted. I added the Nabble url. Changes: * r11214: "Sliding Time Scale / Pitch Shift gui now has more control over the rate/pitch slide. Significant quality improvements to time-stretching including multi-window,multi-pass sinusoid tracking & synthesis, stereo phase matching & much finer tuning. Sounds good, even with transient low frequency material & dense polyphony. The stereo matching is far from perfect, but better than before." * r11215: adds mixed radix fft / spectrogram window adjustment menu items "Narrow Spectrum" CTRL+[ and "Broaden Spectrum" CTRL+] to show finer window size increments than you can get from Spectrograms Preferences. Much more code than 11214. Clayton also has a multithreaded implementation which is currently disabled by default. I already objected in the -devel thread that the new menu items shouldn't appear unless a spectrogram wave track display is active. Questions: * Does this patch still compile against current HEAD? * Are the Sliding Time Scale Improvements completely separable from the new mixed radix fft changes? If so, the latter should be a separate issue and patch IMO.
(In reply to comment #0) The patch builds OK on svn Head though there are a few line off-sets: patching file mac/Audacity.xcodeproj/project.pbxproj Hunk #1 succeeded at 1150 (offset 1 line). Hunk #2 succeeded at 1173 (offset 1 line). Hunk #3 succeeded at 2818 (offset 2 lines). Hunk #4 succeeded at 2850 (offset 2 lines). Hunk #5 succeeded at 2868 (offset 2 lines). Hunk #6 succeeded at 5219 (offset 3 lines). Hunk #7 succeeded at 5242 (offset 3 lines). Hunk #8 succeeded at 5768 (offset 3 lines). Hunk #9 succeeded at 6191 (offset 3 lines). Hunk #10 succeeded at 7128 (offset 4 lines). Hunk #11 succeeded at 7244 (offset 4 lines). Hunk #12 succeeded at 7598 (offset 4 lines). Hunk #13 succeeded at 7790 (offset 4 lines). Hunk #14 succeeded at 8113 (offset 4 lines). Hunk #15 succeeded at 8735 (offset 4 lines). Hunk #16 succeeded at 8782 (offset 4 lines). Hunk #17 succeeded at 8803 (offset 4 lines). Hunk #18 succeeded at 8824 (offset 4 lines). Hunk #19 succeeded at 8845 (offset 4 lines). The Preview feature does not work correctly. If the effect has been previously applied, the Preview will use the last settings applied rather than the current settings. The sound quality improvements are terrific. The discontinuity at the beginning of the processed selection is a substantially improved though some discontinuity may occur with some settings (which may be unavoidable). The interface is a lot more complex - perhaps too complex?
This patch is separate from the mixed_radix_fft changes. The new dialog just introduces different slide interpolation types. Most of the time people won't care, but it can make a difference if you're trying to do something particular and the options should cover any reasonable situation. I could revert it to the old interface, or is there something else I could do? Shall I create a new enhancement bug for the mixed_radix_fft changes?
(In reply to comment #3) > This patch is separate from the mixed_radix_fft changes. Thanks for clarifying, title adjusted. > Shall I create a new enhancement bug for the mixed_radix_fft changes? Please do, titled "Add mixed-radix-fft implementation with menu item to adjust spectrogram window". About disabling the new menu items unless a relevant track display is visible, people can have one track as a waveform and another as a spectrogram, so I think it means only show if at least one track view is a spectrogram or pitch? Can you clarify this in the new bug? As we have bug 271 which now has a better patch for Sliding Time Scale preview, can you remove the non-functional preview code from this patch?
(In reply to comment #3) Would it be possible to split the quality enhancements from the GUI changes? My personal view is that the quality improvements, once tested on all platforms should be committed as a very worthwhile enhancement but the GUI changes may need some discussion before finalising. I agree that the new GUI will cover "any reasonable situation", but I feel that many users will be discouraged from using the effect due to its apparent complexity. Perhaps a wiki page would be appropriate to discus a balance between functionality and usability. I note that there are a couple of items in the wiki feature requests page that could be relevant to such considerations.
Created attachment 247 [details] same patch without gui updates This patch is the same as the previous patch, updating to sbsms2, without the gui changes.
Created attachment 248 [details] corrected sbsms2_TimeScale_nogui.patch the old patch was diff'd against the wrong source.
Since attached patch now excludes GUI changes, title changed from "Sliding TimeScale GUI/quality improvements" to "Sliding Time Scale quality improvements".
(In reply to comment #7) Tested on Debian Squeeze, with svn head built against Wx 2.8.10. The patch built without any problems, though there were a few few line off-sets. The results are imho very impressive. The fix for the glitch at the start of the selection works very well and the audio quality improvement is quite noticeable on both test tones and real-world audio recordings.
Shall I commit the changes now?
OK, I'll wait. The speed improvements have to wait until 2.1, and it turns out there's some bugs I need to work through before submitting anything.
Whoops, to clarify, that last out-of-the-blue comment was in response to a non bugzilla comment from steve.
@Clayton - QA people will want to test on the various platforms and then the issue be keyworded "patch_ready" whereupon a developer will review it. What are the known bugs you have identified - are they to do with this patch? If so should we wait for a new patch? Can you provide steps to repro to test the fix for "the sbsms TimeScale effect creates an audible discontinuity at the beginning of the processed selection." Can you please start a new enhancement issue "Add mixed-radix-fft implementation with menu item to adjust spectrogram window" so we don't lose sight of that (with a modification so the menu item is greyed if there are only wave tracks).
The bugs are in the newer version I'm working on; they aren't present in this patch. The audible discontinuity can be observed by generating a tone, and applying any stretch. The patched version does not guarantee the effected audio will have no discontinuity for complicated waveforms, but it should be smooth for a sine tone.
(In reply to comment #14) > The audible discontinuity can be observed by generating a tone, and applying > any stretch. In particular, select a section of a generated tone, e.g. with a 10 second tone, select from 2 seconds to 5 seconds, then apply a time stretch. The discontinuity at the beginning using the current version of Audacity is really obvious whereas with the patched version the join at the beginning is virtually perfect. An easy way to demonstrate the audio quality improvement is to apply to a 0.5 amplitude sawtooth tone.
Confirming that the patch does give audibly improved performance on a generated tone. It all compiles here (32-bit win-7, Visual Studio C++ 2008 Express) except for the nested /* */ in TimeScale.cpp lines 293-308 - perhaps better to delete or comment out line-by-line? Easily fixed. So we have confirmation (from me) that this patch is ok on win, from Steve's build on Linux and from Bill's test (reported on -devel) of Clayton's Mac build. It looks ready to commit to me! HTH Martyn
(In reply to comment #16) > Confirming that the patch does give audibly improved performance on a > generated tone. Thanks for testing, Martyn. OK I tested on Win 7 x64 (not with this patch, but using http://withunder.org/misc/Audacity_TimeScale_GUI_spectrogram_Win.zip). Tested with tones and fairly subtle real music (soprano/mezzo/piano/violin/winds). Listened in Sennheiser headphones. Using steps to reproduce, clearly the selected area of the tone is much less visibly mangled now (especially sawtooth). Either with initial tempo unchanged and final tempo changed or vice-versa, the audible discontinuity at the start of the selection has gone; the audible discontinuity at the end seems only marginally improved (even if the final tempo is unchanged). Was there a reason not to address the ending discontinuity? Or should it be a new bug? In the tested "musio" I found a modest but worthwhile reduction in artifacts on held vocal notes. I consistently found processing of the effect ~10% slower than in the unpatched Win Nightly with the old effect (a 30 seconds selection from a three minute track took 11 or 12 seconds instead of 10, the whole track took ~90 seconds instead of 80 seconds). Current machine is 2.2 GHz dual core, 6 GB RAM. I assume Martyn's positive is just for this patch. +1 if so, with the reservation about the ending discontinuity. I have not tested much else yet in the "withunder" Win build, though I don't think the GUI changes in the effect are ready yet. I will get to testing the bug 482 fix (64-bit Linux) soon. ==== This has not changed from the previous behaviour, but as a "stupid user comment", if I have initial tempo = no change and final tempo 50% or "150", I would kind of expect a progressive speed up from just after the start of the selection. For example, my music test was a selection of 30 seconds containing two phrases of 15 seconds each. The first phase is AFAICT not speeded up at all. The second phrase audibly starts speeding up about a quarter of the way through. It looks as if this will happen from the waveform too. Anyway, there is no one-pass way to get what I would expect to do with this effect on that 30 seconds selection.
(In reply to comment #16) The version that I tested was the one before#; "corrected sbsms2_TimeScale_nogui.patch (932.07 KB, patch) 2012-03-27 06:00 EDT" I am currently testing this version and there may be a problem. More to follow.
(In reply to comment #18) The patch fails to build after r11653 release-2.8 -I/usr/include/wx-2.8 -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -D__WXGTK__ -pthread -I../lib-src/FileDialog -I/home/steve/sourcecode/audacity/lib-src/lib-widget-extra -I../lib-src/libresample/include -I../lib-src/sbsms/include -I/usr/include/soundtouch -I../lib-src/libnyquist -I../lib-src/portsmf -I../lib-src/portaudio-v19/include -fno-strict-aliasing -I./include -I. -DLIBDIR=\"/usr/local//lib\" -D__STDC_CONSTANT_MACROS -Wall -pthread -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include -I/usr/include/atk-1.0 -I/usr/include/cairo -I/usr/include/pango-1.0 -I/usr/include/gio-unix-2.0/ -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng12 -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include effects/TimeScale.cpp -o effects/TimeScale.o effects/TimeScale.cpp:299:10: warning: "/*" within comment effects/TimeScale.cpp:305: error: expected constructor, destructor, or type conversion before ‘.’ token effects/TimeScale.cpp:306: error: expected declaration before ‘}’ token make[1]: *** [effects/TimeScale.o] Error 1
(In reply to comment #19) The problem is the block comment in lines 307 to 309 of TimeScale.cpp Replacing the block comment with individual line comments (or just deleting the comment) fixes the problem and the "corrected sbsms2_TimeScale_nogui.patch" builds correctly on SVN r11683
Steve: Individual line comments in place of /* some stuff */ will break some versions of gettext when getting the internationalisation hint. If there are nested comments to deal with, use #ifdef OLD_CODE #endif to comment out old code without actually removing it.
(In reply to comment #21) James wrote: > If there are nested comments to deal with, use > > #ifdef OLD_CODE > > #endif Thanks James, that fixes it. Gale wrote: > I consistently found processing of the effect ~10% slower than in the unpatched > Win Nightly with the old effect (a 30 seconds selection from a three minute > track took 11 or 12 seconds instead of 10, the whole track took ~90 seconds > instead of 80 seconds). Current machine is 2.2 GHz dual core, 6 GB RAM. I'm finding a consistent but very marginal speed increase with patch. For 98 seconds from 100 seconds mono 44.1 kHz 32-bit float: Patched - 22 seconds Unpatched - 23 seconds Machine: 2 GHz dual core 3 GB RAM, Debian Linux Martyn wrote: > It looks ready to commit to me! From a user perspective I agree.
(In reply to comment #22) >> Martyn wrote: >> It looks ready to commit to me! > From a user perspective I agree. Sorry for ignorance, but is the pop at the end of the selection in the tone unavoidable? Certainly I don't see a visual discontinuity at the end, where pre-patch there was such at the start of the modified selection.
(In reply to comment #23) Zooming in close enough to see the individual waveform cycles it can be seen that the waveform at the end of the processed section does not match the waveform following the selection. If this is what you mean then I think that is probably inevitable. In practice the selection would normally be made such that it starts and ends at silence (or at a very low level) level, in which case there is no noticeable glitch. The glitch is certainly much less severe than when using Change Tempo. The workaround for (rare) cases where it may be necessary to end the selection part way through a note would be to process a duplicate copy of the sound and then make short cross-fades between the processed and unprocessed tracks. I don't know if it would be possible for the effect to automatically make a short cross-fade with any audio following the processed section, but I'd expect that could be problematic, for example if the selection ended immediately before a transient such as a drum beat.
The original issue as entered by Clayton read "The sbsms TimeScale effect creates an audible discontinuity at the beginning of the processed selection." It did this with a tone (even if the selection was zero crossed before applying the effect) but it rarely happened with music. Pre-patch there was an easily visible discontinuity at the start of the selection (even without even zooming in), but not at the end. There is a clearly audible discontinuity at the end of the processed selection when applying the effect to a tone, which has not been addressed. If for some reason no-one cares about this, fair enough; it was not noted in the original bug report so I can't claim it was not fixed. Marked as "patch_ready" (certainly better than we had before).
The endpoint discontinuity may be present with almost any effect. One idea is to make a new effects preference for automatic crossfading at the beginning and/or end of the selection, which would default to 0ms. The fade would not need samples outside of the selection. It would just crossfade the end of the processed audio with the end of the unprocessed selection. This is another discussion, though... I intend to commit the patch tonight. I'll just delete the double commented code, since the pre-analyze option is no more. Thanks everybody!
(In reply to comment #26) Hi Clayton Please go ahead with the commit, since all looks well. Thanks Martyn
committed revision 11696 Regarding Comment 17: I changed the default slide type to reference the output audio, rather arbitrarily. I should probably change that back to what it was before...
(In reply to comment #28) Moved to DEVEL - FIX MADE because committed. > Regarding Comment 17: I changed the default slide type to reference the output > audio, rather arbitrarily. I should probably change that back to what it was > before.. I have not compiled HEAD yet, but please explain exactly what you have done and what we should be testing for.
(In reply to comment #29) I added various different slide interpolation modes in the new version, and inadvertently changed the default used in the TimeScale effect. One result is what Gale noticed in comment 17, that the slide does not appear gradual. I'll change this back to the old pre-patch behavior unless somebody objects.
(In reply to comment #30) > I added various different slide interpolation modes in the new version, and > inadvertently changed the default used in the TimeScale effect. One result is > what Gale noticed in comment 17, that the slide does not appear gradual. I'll > change this back to the old pre-patch behavior unless somebody objects. I find the exact same "problem" (tempo change only seems to start after half way through the selection) pre-patch e.g. in a build from Sep 2011. So you may be describing something more subtle than I am.