Windows Command-Line: Introducing the Windows Pseudo Console (ConPTY)

Rich

August 2nd, 2018

In this, the fourth post in the Windows Command-Line series, we’ll discuss the new Windows Pseudo Console (ConPTY) infrastructure and API – why we built it, what it’s for, how it works, how to use it, and more.

Posts in the “Windows Command-Line” series

Note: This chapter list will be updated as more posts are published:

In the previous post in this series, we started to explore the internals of the Windows Console and Windows’ Command-Line infrastructure. We also discussed many of Console’s strengths and outlined its key weaknesses.

One of those weaknesses is that Windows tries to be “helpful” but gets in the way of alternative and 3rd party Console developers, service developers, etc. When building a Console or service, developers need to be able to access/supply the communication pipes through which their Terminal/service communicates with command-line applications. In the *NIX world, this isn’t a problem because *NIX provides a “Pseudo Terminal” (PTY) infrastructure which makes it easy to build the communication plumbing for a Console or service, but Windows does not …

… until now!

From TTY to PTY

Before we dig into what we’ve done, let’s briefly revisit how Terminals evolved:

In the beginning was the TTY

As discussed in the first ‘backgrounder’ post in this series, in the early days of computing, users operated computers via electromechanical Teletype (TTY) devices connected to a computer via some form of serial communications link (typically a 20mA current loop).

Ken Thompson and Dennis Richie (standing) working on a DEC PDP-11 via teletype (notice no electronic display)

Rise of the Terminals

Teletype devices were replaced by computerized Terminals with electronic display devices (usually CRT screens). Terminals were generally very simple devices (hence the term “dumb terminal”) containing only the electronics and compute-power required to:

Accept text input via the keyboard Buffer input text one line at a time (enabling local editing before sending) Send/receive text via serial communications (usually via the once ubiquitous RS-232 interface) Display received text on the Terminal’s display

Despite their simplicity (or perhaps because of it), Terminals rapidly became the primary devices used to operate mini, mainframe, and server computers: Most data entry clerks, computer operators, system administrators, scientists, researchers, software developers, and industry luminaries earned their digital-stripes by pounding away on Terminals from DEC, IBM, Wyse, and many others.

Admiral Grace Hopper in her office with a DEC VT220 Terminal on her desk

The Rise and Rise of Software Terminals

Starting in the mid 1980’s, dedicated Terminal devices gradually started to be replaced by general purpose computers that were rapidly becoming more affordable, popular, and powerful. Many early PCs and other computers of the ’80s had Terminal applications that could open a connection to the PC’s RS-232 serial port and exchange data with whatever was listening on the other end of the connection.

As general-purpose computers grew in sophistication, the Graphical User Interface (GUI) arrived and introduced a whole new world of simultaneously running applications, including Terminal applications.

But a problem arose: How would a Terminal application speak to another Command-Line application running on the same machine? And how would you attach a physical serial cable between the two apps running on the same computer?

Enter, the Pseudo Terminal (PTY)

In the *NIX world, this problem was solved by the introduction of the Pseudo Terminal (PTY).

The PTY virtualizes a computer’s serial communications hardware, exposing “master” and “slave” pseudo-devices: Terminal apps connect to a master pseudo-device; Command-Line applications (e.g. shells like Cmd, PowerShell, and bash) connect to a slave pseudo-device. When the Terminal client sends text and/or control commands (encoded as text) to the master, the text is relayed along to the associated “slave”. Text emitted by the application is sent to the slave and is then routed back to the master and thus to the Terminal. Data is always sent/received asynchronously.

Terminal PTY App/Shell

Importantly, the “slave” pseudo-device emulates the behavior of a physical Terminal device and converts command characters into POSIX signals. For example, if a user types CTRL+C into the Terminal, the ASCII value of CTRL+C (0x03) is sent via the master. When received by the slave, the 0x03 value is removed from the input stream and a SIGINT signal is generated.

This PTY infrastructure is used extensively by *NIX Terminal applications, text pane managers (like screen, tmux), etc. Such apps call openpty() which returns a pair of file descriptors (fd) for the PTY’s master and slave. The app can then fork/exec the child Command-Line application (e.g. bash), which uses its slave fds to listen and return text to the attached Terminal.

This mechanism allows Terminal applications to “talk” directly to Command-Line applications running locally in the same way as the Terminal would talk with a remote Computer via a serial/network connection.

What, no Windows Pseudo Console?

So, as we discussed in the previous post in this series – Inside the Windows Console – while the Windows Console is conceptually similar to the traditional *NIX Terminal, it differs in several key ways, especially at its lowest-levels which can cause problems for developers of Windows Command-Line apps, 3rd-party Terminals/Consoles, and server apps:

Windows lacks a PTY infrastructure: When the user launches a Command-Line app (e.g. Cmd, PowerShell, wsl, ipconfig, etc.), Windows itself “hooks up” a new or existing Console instance to the app Windows obstructs 3rd party Consoles and Server Apps: Windows (currently) does not provide Terminals a way to supply the communication pipes via which it wants to communicate with a Command-Line app. 3rd party Terminals are forced to create an off-screen Console, and to send it user-input and scrape its output, redrawing the output on the 3rd party Console’s own display! Only Windows has a Console API: Windows Command-Line apps rely on the Win32 Console API which reduces code portability because every other platform “speaks text/VT” rather than calling APIs Windows Command-Line Remoting is substandard: Windows’ Command-Line apps’ dependence on Console API significantly impedes interop & remoting scenarios

What to do?

We’ve heard from many, many developers, who’ve frequently requested a PTY-like mechanism in Windows – especially those who created and/or work on ConEmu/Cmder, Console2/ConsoleZ, Hyper, VSCode, Visual Studio, WSL, Docker, and OpenSSH.

Well, we finally did it: We created a Pseudo Console for Windows:

Welcome, to the Windows Pseudo Console (ConPTY)

Since taking ownership of the Console ~4 years ago, the Console Team has been busy overhauling the Windows Console & Command-Line internals. While doing so, we regularly and carefully considered the issues described above and many other related asks and issues. But the internals weren’t in the right shape to make a Pseudo Console feasible … until now!

Windows’ new Pseudo Console (ConPTY) infrastructure, API, and several other supporting changes will remedy/facilitate an entire class of issues … without breaking or damaging backward compatibility for existing command-line applications!

The new Win32 ConPTY API (formal docs to follow soon) is now available in recent Windows 10 Insider builds and corresponding Windows 10 Insider Preview SDK, and will ship in the next major release of Windows 10 (due sometime in fall/winter 2018).

Console/ConHost’s Architecture

To understand ConPTY, we have to revisit the architecture of Windows Console … or more accurately … ConHost!

It’s important to understand that while ConHost implements what you see and know as the Windows Console application itself, ConHost also contains and implements most of Windows’ Command-Line infrastructure! From now on, however, ConHost also becomes a true “Console Host”, supporting all Command-Line applications and/or GUI applications that communicate with Command-Line applications!

How? Why? What? Let’s dig in some more:

Here’s a high-level view of the internal architecture of Console/ConHost:

Compared to the architecture we outlined in the previous “Console Internals” post in this series, ConHost now contains a few additional modules for handling VT and a new ConPTY module that implements the public API:

ConPTY API : The new Win32 ConPTY API provides a mechanism that is similar to the POSIX PTY model, but in a Windows-relevant manner

: The new Win32 ConPTY API provides a mechanism that is similar to the POSIX PTY model, but in a Windows-relevant manner VT Interactivity : Receives incoming UTF-8 encoded text, converts each displayable text character into the corresponding INPUT_RECORD , and stores them in the Input Buffer. It also handles control sequences such as 0x03 (CTRL+C) converting them into KEY_EVENT_RECORDS that will effect the corresponding control action

: Receives incoming UTF-8 encoded text, converts each displayable text character into the corresponding , and stores them in the Input Buffer. It also handles control sequences such as 0x03 (CTRL+C) converting them into that will effect the corresponding control action VT Renderer: Generates the VT sequences necessary to move the cursor and render the text and styling in regions of the Output Buffer that have changed since the previous frame

Okay, but what does this actually mean?

How Do Windows Command-Line Applications Work?

To better understand the impact of the new ConPTY infrastructure, let’s consider for a moment how Windows Console and Command-Line applications have worked up until now.

Whenever a user launches a Command-Line application like Cmd, PowerShell, or ssh, Windows creates a new Win32 process into which it loads the app’s executable binary file, and any dependencies (resources or libraries).

The newly created process usually inherits the stdin and stdout handles from its parent. If the parent was a Windows GUI process, there are no stdin and stdout handles and so Windows will spin up and attach the new app to a new Console instance. Communications between Command-Line apps and their Console are transported via ConDrv.

For example, if launched from a non-elevated PowerShell instance, the new app process will inherit its parent’s stdin/stdout handles and will therefore receive input from and emit output to the same Console as the parent.

There is a little “hand-waving” going on here as there are cases where Command-Line apps are launched attached to a new Console instance, especially for security reasons, but the cases described above are generally true. Ultimately, when a Command-Line app/shell is launched, Windows connects it to a Console (ConHost.exe) instance via ConDrv:

How does ConHost work?

Whenever a Command-Line application is executed, Windows will connect the app to a new or existing ConHost instance. The app and its Console instance are connected via the kernel-mode Console Driver (ConDrv) which sends/receives IOCTL messages containing serialized API call requests and/or text data.

Historically, as outlined in prior posts, ConHost’s job today is a relatively simple one:

The user generates input via keyboard/mouse/pen/touch which is converted into KEY_EVENT_RECORD or MOUSE_EVENT_RECORD and stored in the Input Buffer

or and stored in the Input Buffer The Input Buffer is drained one record at a time and performs the requested input action like draw text on screen, move cursor, copy/paste text, etc. Many of these actions result in the Output Buffer’s contents being changed. These changed regions are recorded by ConHost’s state engine

Each frame, the Console renders the OutputBuffer’s changed regions to the display

When a Command-Line app calls Windows Console APIs, the API calls are serialized into IOCTL messages and sent via the ConDrv driver. ConDrv then delivers the IOCTL messages to the attached Console, which decodes and executes the requested API call. Return/output values are serialized back into an IOCTL message and sent back to the app via ConDrv.

ConHost – Investing in yesterday for tomorrow

Microsoft tries, wherever possible, to maintain backward compatibility with existing apps/tools. This is especially true in the Command-Line world. In fact, 32-bit editions of Windows 10 can still run many/most “Win16” 16-bit Windows apps and executables!

As mentioned above, one of ConHost’s key roles is to provide services to Command-Line apps that it hosts, especially legacy apps that call and rely on the Win32 Console API. ConHost now offers some new services:

Seamlessly provide PTY-like infrastructure for communication with modern Consoles and Terminals

Modernizes legacy/traditional Command-Line Apps Receives & converts UTF-8 encoded text/VT into input records (as if typed by user) Executes Console API calls for the app it’s hosting, updating its “Output Buffer” accordingly Renders changed regions of the output buffer as UTF-8 encoded text/VT



Below is an example of a modern Console app talking via a ConPTY ConHost to a Command-Line app

In this new model:

Console: Creates its own communication pipes Calls the ConPTY API to create a ConPTY causing Windows to spin up a ConHost instance connected to the other end of the pipes Creates an instance of the Command-Line app (e.g. PowerShell) attached to ConHost as usual ConHost: Reads UTF-8 encoded text/VT input and converts into INPUT_RECORD that are sent to the Command-Line app Executes API calls from the Command-Line app which may modify the contents of the Output Buffer Renders changes in its Output Buffer as UTF-8 encoded text/VT and sends the resulting text to its Console Command-Line app: Runs as usual, reading input and calling Console APIs without any knowledge that its ConPTY ConHost is translating its input/output from/to UTF-8 encoded text/VT!

The latter point is important! When a legacy Command-Line app uses a Console API like WriteConsoleOutput(...) , the specified text is written to the attached ConHost’s Output Buffer. Periodically, ConHost renders changed areas of the Output Buffer as text/VT which is sent via stdout back to the Console.

Ultimately, this means that even traditional Command-Line apps “speak text/VT” externally, without requiring any changes!

Using the new ConPTY infrastructure, 3rd party Consoles can now communicate directly with modern and traditional Command-Line applications, and speak text/VT with all of them.

Remoting Windows Command-Line Applications

The mechanism above works great on a single machine, but also helps when you interact with, for example, a PowerShell instance running on a remote Windows machine or in a container

To run Command-Line applications remotely (i.e. on remote machines, servers, or in containers), there is a challenge: Command-Line apps running on remote machines communicate with a local ConHost instance because IOCTL messages are not designed for use over a network connection. So how does input from a Console running on your client machine get to the remote machine, and how does output from the app running on the remote machine get back to your client Console? Further, what if you’re running a Linux or Mac machine that has Terminals, but not Windows-compatible Consoles, and don’t understand how Windows Console works?

So, to remotely operate a Windows machine we need a communications broker of some kind – one that can transparently serialize data across some form of network connection and manage app instance lifetime, etc.

Something like ssh, perhaps?

Thankfully, OpenSSH was recently ported to Windows and added as a Windows 10 optional feature. PowerShell Core has also adopted ssh as one of its supported PowerShell Core Remoting protocols. And for those who’ve invested in Windows PowerShell, Windows PowerShell Remoting is still a viable option.

Let’s consider how OpenSSH for Windows allows us to remotely operate Windows Command-Line shells and apps today:

Currently, OpenSSH involves some unwanted convolutions:

The user: Runs the ssh client and Windows attaches a Console instance as usual Types into the Console which sends keystrokes to the ssh clent The ssh client: Reads input as bytes of text data Sends the text data via the network to the listening sshd service The sshd service has to jump through several hoops: Launches the default shell (i.e. Cmd) which causes Windows to spawn & connect a new Console instance Finds & attaches itself to the Cmd instance’s Console Moves Console off-screen (and/or hides it) Sends input data received from ssh client to the off-screen Console as input The cmd instance operates as it always has: Gathers input delivered by the sshd service Does work Calls Console APIs to emit/style text, move the cursor, etc. The attached [off-screen] Console: Executes the API calls, updating its output buffer The sshd service: Scrapes the off-screen Console’s output buffer, finds differences, encodes them into text/VT and sends them back to … The ssh client which sends the text back to … The Console, which draws the text on the screen

Fun, right? No, it’s not! There’s a lot that can and does go wrong, especially in the process of simulating and sending user-input and scraping the output buffer to/from the off-screen Console. This results in instability, crashes, data corruption, excessive power consumption, etc. Further, not all apps do the work to scrape text properties as well as text itself, which results in text formatting being lost, and remoted applications’ text being “monochromatized”!

Remoting Using Modern ConHost and ConPTY

Surely we can do better than this? Yes, yes we can – let’s make a few architectural changes and use our new ConPTY:

In the diagram above, we can see:

The user: Runs the ssh client and Windows attaches a Console instance as usual Types into the Console which sends keystrokes to the ssh clent The ssh client: Reads input as bytes of text data Sends the text data via the network to the listening sshd service The sshd service: Creates stdin/stdout pipes Calls the ConPTY API to create a ConPTY Launches instance of Cmd attached to the other end of the ConPTY. Windows spins-up and attaches a new ConHost instance The cmd instance operates as it always has: Gathers input Does work Calls Console APIs to emit/style text, move the cursor, etc. The ConPTY ConHost instance: Executes the API calls, updating its output buffer Renders changed regions of the output buffer as UTF-8 encoded text/VT which is sent back to the Console/Terminal via ssh

The ConPTY-enabled approach above is clearly much cleaner and simpler for the sshd service. The only Windows Console API calls are being executed entirely within a Command-Line app’s ConHost instance which converts any visible changes into text/VT: Nothing ConHost is connected to need know that the app its hosting calls Console APIs rather than generating text/VT itself!

We think you’ll agree that this new ConPTY remoting mechanism results in an elegant, consistent, simpler architecture. Combined with the powerful features built into ConHost, supporting legacy apps, and rendering changes caused by apps calling Console APIs into text/VT, the new ConHost and ConPTY infrastructure helps us carry the past into the future.

The ConPTY API and how to use it

The ConPTY API is available in the current Windows 10 Insider Preview SDK. By now, I am sure you’re itching to see some code 😉

Let’s take a look at the API declarations:

// Creates a "Pseudo Console" (ConPTY). HRESULT WINAPI CreatePseudoConsole( _In_ COORD size, // ConPty Dimensions _In_ HANDLE hInput, // ConPty Input _In_ HANDLE hOutput, // ConPty Output _In_ DWORD dwFlags, // ConPty Flags _Out_ HPCON* phPC); // ConPty Reference // Resizes the given ConPTY to the specified size, in characters. HRESULT WINAPI ResizePseudoConsole(_In_ HPCON hPC, _In_ COORD size); // Closes the ConPTY and all associated handles. Client applications attached // to the ConPTY will also terminated. VOID WINAPI ClosePseudoConsole(_In_ HPCON hPC);

The ConPTY API above essentially exposes three new functions:

CreatePseudoConsole(size, hInput, hOutput, dwFlags, phPC) Creates a pty with dimensions of w columns and h rows of characters using pipes created by the caller: size : Width and Height (in chars) of the ConPTY buffer hInput : For writing input to the PTY, encoded as UTF-8, text/VT sequences hOutput : For reading the output from the PTY, encoded as UTF-8, text/VT sequences dwFlags : Possible values: PSEUDOCONSOLE_INHERIT_CURSOR : The created ConPTY will attempt to inherit the cursor position of the parent Terminal application phPC : handle to a Console reference for the cerated ConPty Returns : Success/failure. On success, phPC contains handle to the new ConPty

ResizePseudoConsole(hPC, size) Resizes the given ConPTY’s internal buffer to represent a display of the specified character width and height

ClosePseudoConsole(hPC) Closes the ConPTY and all associated handles. Client applications attached to the ConPTY will also terminated, as if they were running in a console window that was closed



Using the ConPTY API

Below is a small code example of how to call the ConPTY API to create a Pseudo Console and attach a Command-Line application to the created ConPTY.

Note: A simple sample illustrating how to use the Pseudo Console API is available here.

// Note: Most error checking removed for brevity. // ... // Initializes the specified startup info struct with the required properties and // updates its thread attribute list with the specified ConPTY handle HRESULT InitializeStartupInfoAttachedToConPTY(STARTUPINFOEX* siEx, HPCON hPC) { HRESULT hr = E_UNEXPECTED; size_t size; siEx->StartupInfo.cb = sizeof(STARTUPINFOEX); // Create the appropriately sized thread attribute list InitializeProcThreadAttributeList(NULL, 1, 0, &size); std::unique_ptr<BYTE[]> attrList = std::make_unique<BYTE[]>(size); // Set startup info's attribute list & initialize it siEx->lpAttributeList = reinterpret_cast<PPROC_THREAD_ATTRIBUTE_LIST>( attrList.get()); bool fSuccess = InitializeProcThreadAttributeList( siEx->lpAttributeList, 1, 0, (PSIZE_T)&size); if (fSuccess) { // Set thread attribute list's Pseudo Console to the specified ConPTY fSuccess = UpdateProcThreadAttribute( lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE, hPC, sizeof(HPCON), NULL, NULL); return fSuccess ? S_OK : HRESULT_FROM_WIN32(GetLastError()); } else { hr = HRESULT_FROM_WIN32(GetLastError()); } return hr; } // ... HANDLE hOut, hIn; HANDLE outPipeOurSide, inPipeOurSide; HANDLE outPipePseudoConsoleSide, inPipePseudoConsoleSide; HPCON hPC = 0; // Create the in/out pipes: CreatePipe(&inPipePseudoConsoleSide, &inPipeOurSide, NULL, 0); CreatePipe(&outPipeOurSide, &outPipePseudoConsoleSide, NULL, 0); // Create the Pseudo Console, using the pipes CreatePseudoConsole( {80, 32}, inPipePseudoConsoleSide, outPipePseudoConsoleSide, 0, &hPC); // Prepare the StartupInfoEx structure attached to the ConPTY. STARTUPINFOEX siEx{}; InitializeStartupInfoAttachedToConPTY(&siEx, hPC); // Create the client application, using startup info containing ConPTY info wchar_t* commandline = L"c:\\windows\\system32\\cmd.exe"; PROCESS_INFORMATION piClient{}; fSuccess = CreateProcessW( nullptr, commandline, nullptr, nullptr, TRUE, EXTENDED_STARTUPINFO_PRESENT, nullptr, nullptr, &siEx->StartupInfo, &piClient); // ...

At this point, cmd.exe is running connected to the ConPTY created by CreatePseudoConsole() . The caller uses the ConPTY’s handles that it created to write and read to/from the Cmd instance. The Pseudo Console can be resized by calling ResizePseudoConsole() , and can be closed by calling ClosePseudoConsole() :

Writing to Pseudo Console

Writing input to the ConPTY is simple:

// Input "echo Hello, World!", press enter to have cmd process the command, // input an up arrow (to get the previous command), and enter again to execute. std::string helloWorld = "echo Hello, World!

\x1b[A

"; DWORD dwWritten; WriteFile(hIn, helloWorld.c_str(), (DWORD)helloWorld.length(), &dwWritten, nullptr);

Resizing the Pseudo Console

This scenario shows how to resize the ConPTY:

// Suppose some other async callback triggered us to resize. // This call will update the Terminal with the size we received. HRESULT hr = ResizePseudoConsole(hPC, {120, 30});

Closing the Pseudo Console

Closing the ConPTY couldn’t be simpler:

ClosePseudoConsole(hPC);

Note: Closing the ConPTY will terminate the associated ConHost and any attached clients.

Call To Action!

The introduction of the ConPTY API is perhaps one of the most fundamental, and liberating, changes that’s happened to the Windows Command-Line in several years … if not decades!

We, the Console team, have already ported some of Microsoft’s tools to use the ConPTY API. We’re also working with several teams inside Microsoft (Windows Subsystem for Linux (WSL), Windows Containers, VSCode, Visual Studio, etc.), and with several external parties including @ConEmuMaximus5 – creator of the awesome ConEmu 3rd party Console for Windows.

But we need your help to raise awareness of, and to start adopting the new ConPTY API:

Command-Line Application Developers

If you own and/or maintain an existing traditional Windows Command-Line application, you’re largely off the hook and don’t have much to do: ConHost will do all the work for you – you can continue to depend upon and call the Console API’s as you always have, and your app will work just as it always has, while also benefitting from improved, higher-fidelity experience if operated remotely 😃

But if you’d like to, you can also introduce new VT support gradually or in new feature areas if you wish – the decision is yours.

If, on the other hand you’re currently or are planning on writing new Windows Command-Line applications, we strongly encourage you to consider simply emitting UTF-8 encoded text/VT instead of calling the Windows Console API: “Speaking VT” will give you access to many features that will not be available via the Windows Console API (e.g. 16M RGB True Color support)

3rd Party Console/Service Developers

If you’re a developer working on a stand-alone Console/Terminal app, or are integrating a Console inside of an application, then we strongly encourage you to explore and adopt the new ConPTY API at your earliest convenience: Adopting the ConPTY API (rather than the older off-screen Console mechanism) is likely to eliminate several classes of bugs, while increasing stability, reliability, and performance.

As an example, the VSCode team currently maintains an issue (GitHub #45693) that tracks several issues caused by Windows’ current lack of a Pseudo Console.

Detecting the ConPTY API

The new ConPTY API will be available for the first time in the Autumn/Fall 2018 release of Windows 10.

If you need to support earlier versions of Windows, then you’ll likely need to test at runtime whether the currently running version of Windows supports ConPTY. As with most Win32 API’s an effective way to test if an API is present is to use Runtime Dynamic Linking approach of calling LoadLibrary() & GetProcAddress() .

If the currently running version of Windows supports ConPTY, your app can find and call the new ConPTY APIs. If not, you’ll have to revert to the convoluted mechanisms used until now, as outlined above.

So, where are we?

Another long post … this is getting to be a habit! Once again, if you’ve read and followed the post this far, THANK YOU! 😃

There’s a lot to unpack from the information above, but we feel it is important to understand why we make changes and improvements such as introducing a Pseudo Console API, as well as what we built. Our goals here are to eradicate an entire class of issues and limitations for developers of Console and server apps, and to make developing code for the Windows Command Line infrastructure more powerful, consistent, and fun.

We look forward to hearing from you via Feedback Hub. For more complex problems, please file issues on our Windows Console GitHub Repo. And if you have questions, please ping me on Twitter.

We can’t wait to hear about what you build atop the new Pseudo Console API.

Rich & the Windows Console Team: