Audacity Bug Summary
••• Introduction •••
••• Keywords •••
    Audacity 3.0.3 development began 19th April 2021

Audacity Bugzilla



Bug 453 - Nyquist receives Unicode characters from Audacity producing platform-inconsistent behavior
Nyquist receives Unicode characters from Audacity producing platform-inconsis...
Status: CLOSED WONTFIX
Product: Audacity
Classification: Unclassified
Component: Nyquist
1.3.14 alpha
Per OS All
: P4 RepeatableAll
Assigned To: Default Assignee for New Bugs
http://audacity.238276.n2.nabble.com/...
: nyquist
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-26 17:18 UTC by Steve Daulton
Modified: 2022-01-05 11:42 UTC (History)
7 users (show)

See Also:
Steps To Reproduce:
1. in Nyquist prompt enter: (setq f (open "E:/eeee/äöü.txt" :direction :input)) (setq text (read-line f)) (close f) (print text) 2. Click OK. Observe A: The printed file name does not have the desired accented characters. (and see comment 1 for variations on this theme)
Release Note:
First Git SHA:
Group: ---
Workaround:
Closed: 2022-01-05 00:00:00


Attachments
Debug log (6.91 KB, image/png)
2021-02-02 12:46 UTC, Peter Sampson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Daulton 2011-09-26 17:18:48 UTC
Nyquist is limited to single-byte ASCII characters, but Unicode build of Audacity will pass Unicode characters with multi-byte encoding to Nyquist from string input widgets.

This produces inconsistent behaviour according to whether an ANSI or Unicode build is being used and which OS.

My suggestion (though I don't fully understand the issues involved):
For characters to be converted to a consistent extended ASCII encoding (for example ISO 8859-1) before being passed to Nyquist and any characters that cannot be converted to printable characters to be removed.
This would provide limited, but consistent support for non-English characters in string input which could then be documented. It may also be necessary to provide the reverse conversion for all string values that are returned from Nyquist.
Comment 1 Gale Andrews 2011-09-27 13:26:50 UTC
Examples:

In the Windows ANSI build of Audacity this reads the text in "E:\eeee\äöü" when run from Nyquist Prompt:

(setq f (open "E:/eeee/\344\366\374"
:direction :input))
(setq text (read-line f))
(close f)
(print text)

as does:

(setq f (open "E:/eeee/äöü.txt"
:direction :input))
(setq text (read-line f))
(close f)
(print text)

but the same commands do not produce the text in Windows Unicode Release. 

On Linux these commands produce the text:

(setq f (open "C:/Documents and Settings/Steve/Desktop/\344\366\374"
:direction :input))
(setq text (read-line f))
(close f)
(print text)
 
(setq fname "/home/steve/Desktop/äöü")
(setq f (open fname :direction :input))
(setq text (read-line f))
(close f)
(print text)

but this does not:

(setq fname "C:/Documents and Settings/Steve/Desktop/\344\366\374/\344\366\374")
(setq f (open fname :direction :input))
(setq text (read-line f))
(close f)
(print text)

It is believed the behaviour on Mac is similar to Linux. 
   
Because of these inconsistencies,  http://wiki.audacityteam.org/wiki/Nyquist_Plug-ins_Reference recommends sticking to ASCII 32 to 126 in 
Nyquist plug-ins meant for public distribution. 

Edgar Franke notes:
> It's Windows itself that causes the problems. Different Windows versions use 
> different encoding tables. So it's not even for sure that a program compiled 
> on Win7 will display correct Unicode on older Windows versions. Windows is 
> _not_ backwards-compatible to itself.  
> But UTF-8 has very good chances to become the future Unicode standard. Most 
> Unices, Linux, Mac OS X and the most recent Windows versions already support 
> UTF-8, so it's probably only a question of time until these problems 
> disappear.

Gale asks:
> Is it possible to "test" a character in a lookup table, and use an
> alternative if one fails or is not found?

Edgar replies:
> I have no idea how Win7 resolves the Unicode encoding, the Audacity
> Windows developers shold know this.
>
> I will try to find somebody who has better knowledge than me about
> Windows 7 and Unicode issues, but I can't promise too much yet.
Comment 2 James Crook 2018-09-27 05:43:00 UTC
Demoted to P5 and no longer 'Review'
Comment 3 Steve Daulton 2018-09-27 07:39:17 UTC
A subset of this issue:

In the Nyquist Prompt, enter:
(format t "~A" "ü")
then click the Debug button.

The expected result is that some representation of the character ü is printed to the debug window (though not the actual UTF8 character because ~A treats the data as ASCII).

What actually happens (only tested in debug build) is:

"An assertion failed
../src/common/unichar.cpp(52): assert "Assert failure" failed in FromHi8bit(): invalid multibyte character"

The assert error occurs twice, then the debug window displays two question marks "??"


The assert can be avoided by changing:

void NyquistEffect::OutputCallback(int c)
{
   // Always collect Nyquist error messages for normal plug-ins
   if (!mRedirectOutput) {
      mDebugOutput += (char)c;
      return;
   }

to:

void NyquistEffect::OutputCallback(int c)
{
   // Always collect Nyquist error messages for normal plug-ins
   if (!mRedirectOutput) {
      // 'c' could be part of a multi-byte char
      mDebugOutput += wxString::Format(wxT("%c"), c);
      return;
   }

or more simply:

void NyquistEffect::OutputCallback(int c)
{
   // Always collect Nyquist error messages for normal plug-ins
   if (!mRedirectOutput) {
      mDebugOutput += (wchar_t)c;
      return;
   }


These alternatives do not give the correct UTF8 character (we still have the underlying problem that Nyquist does not handle multi-byte characters). They both actually print the character "ü" but if we really wanted the two byte character we should have used:
(format t "~S" "ü")


The above two fixes / workarounds have only been tested on 64-bit Ubuntu.
I don't know if there's a better cross-platform solution.
Comment 4 Peter Sampson 2019-05-26 07:55:31 UTC
Testing (using the code in the Steps) I get no "printout"

When I run in debug mode I get an error:

>error: bad argument type - NIL
>Function: #<Subr-CLOSE: #dca56a8>
>Arguments:
>  NIL
>1> NIL
>NIL
>1>
Comment 5 Steve Daulton 2019-05-26 09:14:13 UTC
(In reply to Peter Sampson from comment #4)
The "steps to reproduce" don't make sense. It looks like some steps are missing.

With the steps as written, I would expect to see the error reported in comment #4.

As I wrote in comment #3, the underlying problem that Nyquist does not handle multi-byte characters. I don't think that the problem described in comment #1 can be fixed unless / until Nyquist becomes Unicode (which may never happen).

It would be good to fix the ASSERT described in comment #3 as this could easily occur in Nyquist plug-ins that have text input.
Comment 6 Leland Lucius 2021-02-02 06:57:45 UTC
(In reply to Steve Daulton from comment #5)
> (In reply to Peter Sampson from comment #4)
> The "steps to reproduce" don't make sense. It looks like some steps are
> missing.
> 
> With the steps as written, I would expect to see the error reported in
> comment #4.
> 
> As I wrote in comment #3, the underlying problem that Nyquist does not
> handle multi-byte characters. I don't think that the problem described in
> comment #1 can be fixed unless / until Nyquist becomes Unicode (which may
> never happen).
> 

I second that opinion.  This part of the bug is probably a WONTFIX.

> It would be good to fix the ASSERT described in comment #3 as this could
> easily occur in Nyquist plug-ins that have text input.

The assert no longer occurs (probably a wx upgrade.
Comment 7 Leland Lucius 2021-02-02 07:00:13 UTC
(In reply to Steve Daulton from comment #3)
> 

...

> void NyquistEffect::OutputCallback(int c)
> {
>    // Always collect Nyquist error messages for normal plug-ins
>    if (!mRedirectOutput) {
>       mDebugOutput += (wchar_t)c;
>       return;
>    }
> 
>
I've committed this as it makes the output appear the same between (at least) Windows and Linux:

ü
Comment 8 Leland Lucius 2021-02-02 09:01:55 UTC
Fix in:

https://github.com/audacity/audacity/commit/392360a
Comment 9 Peter Sampson 2021-02-02 12:46:09 UTC
Created attachment 1078 [details]
Debug log

Testing on W10 with Audacity 3.0.0 392360a

a) when I use the OK button I get no output

b) when I use the Debug button I get the attached error message

According this will be REOPENED
Comment 10 Roger Dannenberg 2021-02-02 15:56:10 UTC
Would it make sense to change Nyquist Prompt to be an ASCII editor? It really doesn't make sense to prompt for a Unicode string and then pass them to Nyquist as if they are ASCII. 

It would be relatively easy to convert all characters and strings in Nyquist to wide characters, but to be honest, I'm pretty ignorant about unicode details. Complications that scare me include: String comparison seems to be very complicated, byte-level interfaces to files interact strangely with Unicode, file I/O has to deal with multiple types: binary, ascii, utf-8 and others, string indexing is no longer simple if characters are not all the same size, and I just read C++11 has new character types as alternatives to wchar_t, so I guess even the experts feel they got it wrong the first time around. Is there a language that "gets it right" anyone would recommend?
Comment 11 Steve Daulton 2021-02-02 16:16:40 UTC
(In reply to Peter Sampson from comment #9)
As I wrote in comment #5, that error is expected if the file "E:/eeee/äöü.txt" does not exist.

A better, but more complex test script:

;type tool
(setf dir (get '*system-dir* 'documents))
(setf dir (string-right-trim (string *file-separator*) dir))
(setf filename (format nil "~a~a~a"
    dir  *file-separator* "äöü.txt"))
(let ((fp (open filename :direction :output)))
  (format fp "Hello World")
  (close fp))
(let ((fp (open filename :direction :input)))
  (setf text (read-line fp))
  (close fp))
(print text)


That should print "Hello World", and in a perfect world it would create a file "äöü.txt" in the default "documents" directory. That file should contain: "Hello World". 

Depending on what platform you use, you may get a file called "äöü.txt" instead of "äöü.txt", but we probably can't fix that. Whatever the file is called, it should contain "Hello World".
Comment 12 Steve Daulton 2021-02-02 16:30:39 UTC
(In reply to Roger Dannenberg from comment #10)
The problem is not only with the Nyquist Prompt.
Some Nyquist plug-ins have text input widgets:

;control val "Left text" string "Right text" "Default text"

The "string" widget has been available for Nyquist plug-ins for very many years.

Of course we can't control what users write in their own custom plug-ins either.

I think the best we can do is to tell people that Nyquist is ASCII only (which we already do in the docs), and prevent "bad things" (such as crashes and Assert errors) from occurring.
From my testing, I think we now avoid "bad things" from happening.

It's just unfortunate that Charles Babbage never got round to inventing UTF-8 ;-)
Comment 13 Roger Dannenberg 2022-01-04 17:23:26 UTC
I think this prompted me to add Unicode to another language, Serpent. It was not easy! I concluded that UTF-8 is the best way to go, but that means characters are variable sized, which means indexing is not trivial, which (maybe) means using a cache to store previous mappings from index to address since that allows stepping through a string character-by-character to be constant-time-per-access.

For Audacity and Nyquist, there are multiple possibilities:

(1) restrict everything going in to be 8-bit ASCII (or some extended ASCII), and raise an error if a code point cannot be mapped into ASCII. (This will mean, e.g., some files cannot be named).

(2) Same as (1) but have some convention for unmappable characters, e.g. translate to '?'.

(3) Translate everything to UTF-8 going into Nyquist, but warn that the UTF-8 will be handled as if it is just a string of 8-bit characters. This will mean files with Unicode names can be opened, but string processing may break up multi-byte code points and give strange results. Also, it looks like Nyquist output captured by Audacity must be reassembled into Unicode and not processed 1-byte-at-at-time. Maybe this is the policy we have now except that UTF-8 is not reassembled when returned from Nyquist to Audacity.

(4) Update Nyquist to UTF-8. This is significant work, involving a change in string representation, change to all string manipulation functions since strings are not directly index-able, and updates to a lot of I/O functions. I decided the gains are not worth it, but I'd be happy to share my experience, help design Unicode string type, and provide UTF-8 code to get started if someone is determined to do it.
Comment 14 Peter Sampson 2022-01-04 17:30:43 UTC
(In reply to Roger Dannenberg from comment #13)
Roger,

Muse do not really read Bugzilla or pay any attention to it.

Should this discussion be transferred to GitHub where ther is a better chnace of them seeing it?
Comment 15 Steve Daulton 2022-01-04 21:07:17 UTC
The behaviour now is much less bad than when this bug was first logged, though the main issue (no Unicode support) remains.

Previously, even a Unicode character in a comment could cause a plug-in to fail!
That doesn't happen now.
Unicode characters can now be used in comments, even in ";control" widgets (but not as variable names). For example, this works fine:

;control test "E:/eeee/äöü.txt" int "" 5 0 10
(print test)


It is useful to be able to use Unicode in comments, for example, in the ";author" header (Robert Hänggi was pleased to be able to use his real name ;-)
Comment 16 Steve Daulton 2022-01-04 21:19:28 UTC
(In reply to Roger Dannenberg from comment #13)
> (4) Update Nyquist to UTF-8...
> I decided the gains are not worth it.

I agree.

One important use case in Audacity was creating labels. This works now:

;type analyze
;; Add a label
(list (list 1 2 "äöü"))


Though, not surprisingly, this doesn't:

(setf ä 24)
(* ä  2)

but it does give a reasonable error message:

"error: illegal character - -61"


Unless anyone is really keen to add full UTF-8 support to Nyquist, I'm happy for this bug to be closed. As I wrote in comment #12, we no longer seeing "bad things" (such as crashes and asserts) happening.
Comment 17 Peter Sampson 2022-01-05 11:42:49 UTC
(In reply to Steve Daulton from comment #16)
>Unless anyone is really keen to add full UTF-8 support to Nyquist, 
>I'm happy for this bug to be closed.

Well I''m not keen - and judging by Roger's Comment #13 he understandably doesn't seem keen.  And I can't imagine Muse seeming keen about this.

Accordingly I shall close this bug as WONTFIX (albeit ameliorated)