Key Traits of the Coming Delphi For Linux Compiler

Embarcadero is about to release a new Delphi compiler for the Linux platform. Here are some of the key technical elements of this compiler, and the few differences compared to Delphi compilers for other platforms.

Linux Intel 64-bit

Before we get to language specific features, let me clarify once more the target platform, as Delphi for Linux is a bit vague. The compiler produces Intel 64-bit executables for Linux. This is a key difference, for example, compared to the old Kylix project compiler, that was 32-bit. The new compiler does not include Linux ARM platforms, which we are considering for the future.

Another related element is that the compiler is based on the LLVM compilers architecture, like all the most recent new Delphi compilers (iOS 32 bit, Android 32 bit, and iOS 64-bit). The advantage is that it will provide some significant optimization on the generated code. The disadvantage is that compiling and linking an application takes considerable more time than when using the Windows compilers.

In the rare case you need platform specific code and when calling platform APIs, you can use the {IFDEF LINUX64}.

Object Pascal Language Compatibility

Getting to the language specifics, the level of language compatibility is going to be very high. Almost all of the classic Pascal-based languages features, OOP features, RAD support capabilities, modern Pascal features (generics, anonymous methods, reflection, attributes) are going to work the same. Some beta testers have been able to port significantly complex libraries in a fairly smooth way.

What you might find a little more trouble in is porting some "older" code, like code that is not Unicode enabled or relies heavily on Windows-ism. Below are some of the specific differences. The only area that is not meant to be fully compatible is memory management, given the new compiler is based on Automatic Reference Counting, as explained later.

Core Data Types and LongWord Blues

I'm not going to list all of the core data types that remain the same, as the list is very long, but let's look at what's specific to this compiler. Being a 64-bit compiler, all pointers are going to be 64-bit, while Integer stay 32-bit -- this is the behavior of all other Delphi 64-bit compilers (and most other programming languages, BTW).

The only caveat is for the LongWord type. This is a data types often used when making operating system calls, so the decision that was taken some time ago was to keep it matching the underlying OS. So, for example, on iOS the same API declaration with LongWord compiles to a 32-bit or 64-bit data type depending on the compiler you are using. On Windows, however, Microsoft made a non-standard decision to keep LongWord the same size of an Integer. This implies the Windows 64-bit platforms works differently from the Linux 64-bit platform in regard of this data type. For reference, among other sources, see the long type in C language on different platforms at https://en.wikipedia.org/wiki/Integer_(computer_science)#Long_integer and the first answer at http://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-on-64-bit-windows.

You might have to revisit you code using LongWord and decide to keep that data type or use a different one (Integer, UInt32, NativeUInt...) depending on your goal. We have done and are still doing a significant revision of the RTL to make sure we are not misusing this type. In same cases, however, we are going to keep code that behaves differently depending on the platform, particularly when changing core RTL classes would cause a lot of legitimate Windows code working for 20 years not to compile any more.

Strings and Encodings

Since Delphi 2009 the Object Pascal language string type has defaulted to UTF-16 Unicode and 2-byte Char data type. Needless to say the Linux compiler follows the same path. Since 10.1 Berlin, all compilers (including the mobile ones) received full support for the UTF8String type and also (for direct low-level processing) the RawByteString type. The Linux compiler includes these data type, and in fac the UTF8String was added to mobile mostly because we anticipated it as a key requirements for Linux. A significant part of the HTTP-base traffic uses UTF-8 and supporting this representation as a native type -- beside supporting encoding to it -- was considered a requirement for the Linux project.

It is true, however, that some other string types like AnsiString are not supported. This is mostly a "Windows-centric" data type. If you are still using strings and PChar for managing generic data structures, it is really time to move to TBytes and PByte instead -- or enable pointer math for all data structures. Also the support for the old Pascal ShortString type is limited. Declaring a string [20] variable on Linux will fail. The other string type that is not supported is WideString. This is the old pre-Unicode non-reference counted UTF-16 type used for Windows COM platform integration. In fact, any COM-specific type and feature is missing on Linux, like on all other non-Windows Delphi platforms.

Notice that the TEncoding support is available, so you can read and write text files in any format you want. What you are not directly able to do is process an AnsiString in memory with the standard language support. But you can have an array of bytes (TBytes) representing text in any format in memory, and read and write it on disk, or receive and send via a socket connection, and you ca use the TEncoding support for conversions.

Linux Defaults to 1-Based String Access

What about string access via the [] operator? As you might know, there is a compiler default you can change per-project, per-unit, or even per-code fragment that determines if the compiler treats the string access operator with a 1-based Pascal-classic notation or the 0-based notation most programming languages use. While mobile compilers default to 0-based, for Linux we decided to stick with the traditional Windows model, on the ground developers are mostly likely to migrate existing Windows server side code to Linux. The recommendation is to try to sue clean, agnostic code, but if you prefer forcing a given string access model for all of your Delphi code, just use the $ZEROBASEDSTRINGS directive in your projects. Just as a reminder all RTL string functions and the newer string helper methods stays the same regardless of the platform and this setting. The first group uses a 1-based logic, the second a 0-based logic. Your pick.

Here Comes ARC

The other notable change from the Windows compiler is that on the Linux platform (as in any new platform) we have decided to use the Automatic Reference Counting (ARC) model for memory management. This is the model Delphi uses for all mobile compilers plus the iOS simulator one. The long term plan is to shift the entire Delphi ecosystem in that direction -- probably keeping the VCL world on the traditional memory model. This is the reason not-picking ARC for Linux would have been very confusing, as given you need testing when adopting a new platform this is the least disruptive moment for such a transition.

Feedback from beta testers has been fairly positive on this, and migration of existing code and libraries has not bumped into big hiccups. Now I don't have room in this blog post to revisit the best practices for ARC migration, but I'll try to have some more extensive material on this in the future.

Shameless plug: My Object Pascal Handbook (and particularly the Berlin revised edition) has some good material on this.

More Information? Delphi Linux BootCamp is Coming!

For more information, sign up to the boot camp (which is actually a one hour webinar) scheduled for March 1st in 3 times zones. For more information and to sign up see https://community.embarcadero.com/blogs/entry/delphi-linux-boot-camp