Why terminals are a truly special story
Word processors, browsers etc. see the entire document, and display a part thereof. Other parts of the text might be scrolled out, obscured by a popup etc., these are no problem. The entire text’s visual layout can still be constructed, and only the relevant parts end up being displayed.
A terminal emulator is only aware of the part that it displays, and not what’s “outside” or “underneath”. E.g. if a line of a text file contains 100 characters, the text viewer or editor is in “unwrap” mode, and the emulator window is 80 characters wide, then the emulator has no idea about the remaining 20 characters of that line.
In most terminal-based applications cropping is essential. These apps, unlike graphical ones, cannot enforce a minimum window size. Cropping is required to squeeze any text into a smaller viewport. A text viewer or editor needs to crop all the lines that don’t currently fit in the view (e.g. the 100 character line to an 80 character string in the previous example), a file manager needs to crop all the long filenames, etc. With BiDi in the game, cropping is tricky. Cropping the in-memory string and then applying the BiDi algorithm produces faulty visual layout; cropping needs to be performed on the BiDi algorithm’s result. The BiDi algorithm cannot be run on partial data, it must be run on the entire paragraph, and hence can only be run by someone who is aware of the entire paragraph. The only such component is the application.
The same applies for shaping, too. Shaping depends on the neighbor characters which are not always known by the terminal emulator. Hence it can only be done by the emitting application, either by printing the desired “presentation form” variants, or by inventing other means of sending the result to the terminal emulator.
In many other cases, however, a simple utility producing some output doesn’t want to and cannot reasonably deal with BiDi, it just wants the terminal emulator do it. Let’s take possibly the simplest utilities, like echo or cat, and imagine that these (or their BiDi-aware counterparts) would need to check whether their output goes to a terminal, query its width, run the BiDi algorithm, run the shaping algorithm, wrap the text (taking care of TAB characters and friends), align the text… and the result would still not be okay if one resizes the terminal afterwards. Clearly not a feasible approach.
There isn’t a single mode of operation that can cover all use cases, we’ve just seen that we’ll need two substantially different modes.
With graphical applications, it’s the responsibility of one single application to do BiDi rendering, i.e. to convert the external data it handles (e.g. document, web page) along with its own UI to the pixel-by-pixel user-visible representation. In case of the terminal emulator, it’s the joint responsibility of two components: the emulator, and the application inside. The exact responsibility of each party and the interface between them needs to be well thought out.
Split responsibility is fragile: the two parties have to exactly agree on the details. What if, for example, they implement a different version of the BiDi algorithm, or use a different Unicode character database? Some of the BiDi done by the app and at the same some other parts done by the terminal sounds bad design. Ideally at any point in time the entire responsibility should be at one place – either completely at the terminal, or completely at the app.
Web pages, graphical applications have different means for handling non-text elements (overall RTL UI by reversing the order of children) versus text (whatever the BiDi algorithm dictates). Terminal emulators have to flatten these two into the same level, as terminal-based apps can only implement any UI using text. This one level has to fulfill the requirements of both.
Terminal based apps often have a pretty dense UI, sometimes adjacent UI
elements only being separated by different attributes (typically
different background color), without a space between. Think of e.g. the
bottom bar of htop or mc (9PullDn10Quit
as a single “word”). So the
problem with any approach relying solely on the resolved embedding
levels (we’ll see later) is a real issue.
Terminals are a legacy story of 50+ years, with additional layers (screen handling libraries etc.) written during these times, none of which had BiDi or shaping in mind. We’d need to add BiDi support in a backwards compatible way, without redesigning everything from the grounds up.
We must be careful not to end up with a standard that no one implements for decades (as it happened to ECMA TR/53), but with one that’s simple enough and good enough for many terminal emulators and applications to adopt.