Miscellaneous

Images

If a terminal emulator supports some image format, images must not be flipped by default in RTL mode.

(Authors of such image protocols might introduce explicit flags such as “flip”, or “flip for RTL” (and then perhaps “flip for LTR” too for full symmetry which probably no one would ever use), marginally useful for the rare case when an image is known to depict an arrow or something similar, part of the UI of an app. These might probably be more easily achieved by having the app print the mirrored image in the first place. I can’t think of a use case in the terminal world.)

If such an image protocol has some option corresponding to the alignment of the image, it should be revised and updated if necessary with RTL in mind. In paragraphs of RTL direction, the overall layout should be the mirrored version of the LTR one by default.

Inline images should be handled according to UAX #9 section 3.2:

“For the purpose of the Bidirectional Algorithm, inline objects (such as graphics) are treated as if they are an U+FFFC OBJECT REPLACEMENT CHARACTER.”

Chars disappearing at resize

Some terminal emulators remember the characters that disappear at the end edge when the window is made narrower, and those characters reappear as the window is widened back. Konsole, PuTTY, and VTE with rewrapping disabled are some examples.

Interestingly at first sight, both Konsole and PuTTY cuts the logical data to the corresponding length, and runs the BiDi algorithm on this fragment. The result is different from a simple visual cut; narrowing and widening the window may swallow and reintroduce characters visually somewhere in the middle of the line.

The other approach, i.e. cropping visually might have bigger downsides. The cursor might easily be placed offscreen, or in the inverse direction: a mouse click might generate a column number bigger than the current number of columns.

Also, emulators that chop off those characters for good (as many do, e.g. mlterm) necessarily have to truncate the logical data (in implicit mode), so Konsole’s and PuTTY’s current behavior is the same as mlterm’s (as long as you make the window narrower only).

It’s a corner case where the behavior of terminal emulators is quite different even in LTR-only mode. I think it’s fine to leave it unspecified. I’m absolutely okay with PuTTY’s and Konsole’s approach which might not be the most user friendly one, but is definitely less problematic technically than the other one. (The most user friendly behavior is to rewrap the lines anyway.)

Mirrored box drawing

Box drawing (U+2500 .. U+257F), block element (U+2580 .. U+259F) and presumably many other characters that would make sense to mirror are actually not to be mirrored in RTL, meaning that any BiDi-aware app in RTL mode should replace them by their counterparts or find some alternate solutions.

As such, ncurses’s box drawing methods will by default break in RTL.

Is it fair to say that the required changes to ncurses-based apps are quite trivial? E.g. fix the corresponding arguments to border(), wborder(), border_set(), wborder_set() to explicitly specify the character instead of the default, and avoid the box() and box_set() wrappers? Any other affected methods? Hopefully eventually get ncurses have some more convenient way, i.e. specify the global or per-window RTL property which will then change the default of these methods.

Or an LD_PRELOAD library could intercept these calls and “fix” them?

Or shall we introduce another per-paragraph BiDi property for ncurses workaround, to mirror the U+2500 .. U+257F glyphs in RTL? (Shall we further limit it to the characters actually used by ncurses? Or shall we even extend to many other glyphs where it makes sense? The latter is especially arbitrary, plus needs further adjustment at every new Unicode version, doesn’t sound ideal.) Note that UAX #9 allows this at bullet point HL6:

“Certain characters that do not have the Bidi_Mirrored property can also be depicted by a mirrored glyph in specialized contexts.”

VTE’s current work-in-progress implementation adds DECSET/DECRST 2500 to mirror the U+2500 .. U+257F characters in RTL context. This is experimental, subject to change.

Terminal title and friends

For externally displayed data, such as the window title, icon title, desktop notification etc., I think we can be loose and just pass the received string as-is to whoever handles that data.

Browsers are pretty loose here, too. In both Firefox and Chromium, if the document’s title is the logical string HEBREW english, the visual window title shown in my window manager’s title bar becomes english - BrowserName WERBEH, they don’t even care about proper BiDi separation of the title from the browser name.

In case a terminal wants to be more pedantic, it could look at the current direction and autodetection mode when the escape sequence is received, and force formatting accordingly (e.g. embed the string in a corresponding BiDi isolate block).

For even more pedantic mode, it could even respect explicit mode, and force visual order then (e.g. embed in a corresponding BiDi override block).

Some contexts might have inherent LTR base direction, and inherently require logical order rather than visual. These should always be handled and displayed as such, ignoring the emulator’s current mode. An example is the target of explicit hyperlinks (OSC 8), if displayed to the user.

Scrollback context lines

In implicit mode, the visual contents cannot be constructed from the data inside a single line of the emulator. The BiDi algorithm needs to be run on the entire paragraph, including those lines that are outside of the current view of the terminal emulator. With extremely long paragraphs even just locating the boundaries of the paragraph might already become an expensive operation.

In order to avoid DoS attacks, emulators might set up reasonable safety caps. For example, they might decide to render paragraphs over 500 lines as if they were in explicit mode, still respecting the LTR vs. RTL property.

Finite scrollback

A tricky situation arises as data disappears at the top of the scrollback buffer. Chopping off the beginning of the logical data in implicit paragraphs might result in a different visual rendering on the rest. This can become especially prominent if there’s no scrollback at all: as data scrolls out, the onscreen contents can break.

Probably the only reasonable approach is simply to remember more scrollback lines than configured. For example, if a user requests a scrollback of 10000 lines, the emulator could remember 10500. The topmost 500 lines would never be revealed, but used for running the BiDi algorithm. If this additional number is as much as the maximum supported paragraph size (as per the previous section) then users won’t notice any problems here.

Of course terminal emulators can come up with more sophisticated approaches, but they would probably require to dig into the details of the BiDi algorithm rather than using it as an out-of-the-box method. It’s hardly unlikely to be a good use of developer efforts.

Or we can just accept that things might slightly break near the top of the scrollback.

Efficiency

In implicit modes, receiving just a single character requires to rerun the BiDi algorithm on the entire paragraph, and redisplay everything. This obviously shouldn’t be done immediately (upon receiving the character), only when it comes to updating the screen.

The BiDi algorithm seems to be a relatively cheap one to me, and it only needs to run when the contents are displayed. The emulation logic hardly requires any change. So I don’t expect any emulator to suffer from significant performance regression by implementing BiDi.

Emulators might cache the BiDi algorithm’s result, so that they don’t need to run it again if they need to refresh their display but the underlying data hasn’t changed. They might also special case the most frequent situation, namely LTR-only paragraphs; or come up with other ideas for performance improvement.

Scrollbar, padding

Most terminal emulators display a scrollbar on one side. There’s no feasible way the scrollbar could change its location upon receiving escape sequences, especially when these escape sequences apply to paragraphs and not the entire terminal session. Moreover, a jumping scrollbar would be much more irritating than helpful.

Hence the placing of the scrollbar should remain solely the terminal emulator’s decision (based on UI guidelines of the desktop it is best integrated with, the locale, user settings, etc.), unrelated to BiDi handling.

Same goes for the extra padding, the remainder if the window’s width is not grid-aligned.

BiDi algorithm tweaks

Terminals often work with specialized data, just as filenames. With possible RTL or even BiDi path components, the implicit BiDi rendering of the entire full pathname might not retain the / character as a visual separator between individual components, fragments of text might get shuffled across slashes.

Once implicit mode level 2 is designed and implemented, a utility might prevent this from happening by embedding each component in isolates. (Copy-pasting might provide two modes: one where such BiDi control characters are retained and one where they are dropped.)

Another possibility which doesn’t need level 2 and doesn’t need help from utilities is to tweak the BiDi algorithm, e.g. specify a set of “stop characters”, or stop where the background color changes, etc. Such modes raise several questions, e.g. how to autodetect the direction (probably still on the entire paragraph), and whether then to apply autodetection on each fragment individually too or make them always follow the paragraph direction.

Such tweaks not only go against the BiDi algorithm, but would also probably be hard and expensive for emulators to track, at least if the exact mode (set of stop characters or even intervals) is to be remembered separately for each paragraph.

At this point I do not intend to introduce escape sequences to control this behavior or make any such nonstandard behavior part of this specification; however, terminals are free to offer such tweaks as nondefault user settings.