rfc:numeric_literal_separator

PHP RFC: Numeric Literal Separator

Introduction

The human eye is not optimized for quickly parsing long sequences of digits. Thus, a lack of visual separators makes it take longer to read and debug code, and can lead to unintended mistakes. 1000000000 ; // Is this a billion? 100 million? 10 billion? ‪ 107925284.88 ; ‬ // What scale or power of 10 is this? Additionally, without a visual separator numeric literals fail to convey any additional information, such as whether a financial quantity is stored in cents: $discount = 13500 ; // Is this 13,500? Or 135, because it's in cents?

Proposal

Enable improved code readability by supporting an underscore in numeric literals to visually separate groups of digits. $threshold = 1 _000_000_000 ; // a billion! $testValue = ‪ 107 _925_284 . 88 ; // scale is hundreds of millions $discount = 135 _00 ; // $135, stored as cents Underscore separators can be used in all numeric literal notations supported by PHP: 6.674 _083e - 11 ; // float 299 _792_458 ; // decimal 0xCAFE _F00D ; // hexadecimal 0b0101_1111 ; // binary 0137 _041 ; // octal

Restrictions

The only restriction is that each underscore in a numeric literal must be directly between two digits. This rule means that none of the following usages are valid numeric literals: _100 ; // already a valid constant name // these all produce "Parse error: syntax error": 100 _ ; // trailing 1 __1 ; // next to underscore 1 _ . 0 ; 1 . _0 ; // next to decimal point 0x_123 ; // next to x 0b_101 ; // next to b 1 _e2 ; 1e_2 ; // next to e

Unaffected PHP Functionality

Adding an underscore between digits in a numeric literal will not change its value. The underscores are stripped out during the lexing stage, so the runtime is not affected. var_dump ( 1 _000_000 ) ; // int(1000000) This RFC does not change the behavior of string to number conversion. Numeric separators are intended to improve code readability, not alter how input is processed.

Backward Incompatible Changes

None.

Discussion

Use cases

Digit separators make possible the cognitive process of subitizing. That is, accurately and confidently “telling at a glance” the number of digits, rather than having to count them. This measurably lessens the time to correctly read numbers longer than four digits. Large numeric literals are commonly used for business logic constants, unit test values, and performing data conversions. For example: Composer's retry delay when removing a file: usleep ( 350000 ) ; // without separator usleep ( 350 _000 ) ; // with separator Conversion of an Active Directory timestamp (the number of 100-nanosecond intervals since January 1, 1601) to a Unix timestamp: $time = ( int ) ( $adTime / 10000000 - 11644473600 ) ; // without separator $time = ( int ) ( $adTime / 10 _000_000 - 11 _644_473_600 ) ; // with separator Working with scientific constants: const ASTRONOMICAL_UNIT = 149597870700 ; // without separator const ASTRONOMICAL_UNIT = 149 _597_870_700 ; // with separator Separating bytes in a binary or hex literal: 0b01010100011010000110010101101111 ; // without separator 0b01010100_01101000_01100101_01101111 ; // with separator 0x42726F776E ; // without separator 0x42 _72_6F_77_6E ; // with separator

Use cases to avoid

It may be tempting to use integers for storing data such as phone, credit card, and social security numbers since these values appear numeric. However, this is almost always a bad idea, since such numbers often have prefixes and leading digits that are significant. A good rule of thumb is that if it doesn't make sense to use mathematical operators on a value (e.g. adding it, multiplying it, etc.), then an integer probably isn't the best way to store it. // don't do this: $phoneNumber = 345 _6789 ; $creditCard = 231 _6547_9081_2543 ; $socialSecurity = 111 _11_1111 ;

Will it be harder to search for numbers?

A concern that has been raised is whether numeric literal separators will make it more difficult to search for numbers, since the same value can be written in more than one way. This is already possible, however. The same number can be written in binary, octal, decimal, hexadecimal, or exponential notation. In practice, this isn't problematic as long as a codebase is consistent. Furthermore, separators can sometimes make it easier to find numbers. To use an earlier example, 13_500 and 135_00 could be differentiated in a find/replace. Another example would be separated bytes in a hex literal, which allows searching for a value like “_6F_” to find only the numbers containing that specific byte.

Should it be the role of an IDE to group digits?

It has been suggested that numeric literal separators aren't needed for better readability, since IDEs could be updated to automatically display large numbers in groups of three digits. However, it isn't always desirable to group numbers the same way. For example, a programmer may write 10050000 differently depending on whether or not it represents a financial quantity stored as cents: $total = 100 _500_00 ; // represents $100,500.00 stored as cents $total = 10 _050_000 ; // represents $10,050,000 Binary and hex literals may also be grouped by a varying number of digits to reflect how they are used (e.g. bits may be separated into nibbles, bytes, or words). An IDE cannot do this automatically without knowing the programmer's intent for each numeric literal.

Why resurrect this proposal?

The previous RFC was originally voted on over three years ago (January 2016). While a majority of voters supported it, it did not reach the required 2/3 threshold for acceptance. Based on reading the discussion at the time, it didn't receive enough positive votes because there weren't many good use cases put forward for it. Also, the RFC had a short voting period of only 1 week. Since that time, the ability to use underscores in numeric literals has been implemented in additional popular languages (e.g. Python, JavaScript, and TypeScript), and a stronger case can be made for the feature than was made before.

Should I vote for this feature?

Andrea Faulds summarized the considerations as follows: This feature offers some benefit in some cases. It doesn't introduce much new complexity. There's no new syntax or tokens, it just modifies the form of the existing number tokens. It fits in well [with] what's already there, consistently applying to all number literals. It follows established convention in other languages. Its appearance at least hints that values with these separators are not constants or identifiers, but numbers, reducing potential for confusion. It limits its own application to prevent abuse (no leading, trailing, or repeated separators). And it's relatively intuitive.

Comparison to other languages

Numeric literal separators are widely supported in other programming languages. Ada: single, between digits 1

C#: multiple, between digits 2

C++: single, between digits (single quote used as separator) 3

Java: multiple, between digits 4

JavaScript and TypeScript: single, between digits 5

Julia: single, between digits 6

Kotlin: multiple, between digits 7

Perl: single, between digits 8

Python: single, between digits 9

Ruby: single, between digits 10

Rust: multiple, anywhere 11

Swift: multiple, between digits 12

Vote

Voting started 2019-05-30 and ended 2019-06-13. Support numeric literal separator in PHP 7.4? Real name Yes No ajf (ajf) ashnazg (ashnazg) bishop (bishop) brzuchal (brzuchal) bwoebi (bwoebi) chregu (chregu) cmb (cmb) colinodell (colinodell) danack (danack) derick (derick) diegopires (diegopires) dmitry (dmitry) duncan3dc (duncan3dc) galvao (galvao) gasolwu (gasolwu) girgias (girgias) jasny (jasny) jbnahan (jbnahan) jhdxr (jhdxr) kalle (kalle) kelunik (kelunik) kguest (kguest) kinncj (kinncj) krakjoe (krakjoe) lstrojny (lstrojny) mike (mike) nikic (nikic) ocramius (ocramius) peehaa (peehaa) petk (petk) pmmaga (pmmaga) ramsey (ramsey) rasmus (rasmus) reywob (reywob) rogeriopradoj (rogeriopradoj) rtheunissen (rtheunissen) salathe (salathe) sammyk (sammyk) sergey (sergey) stas (stas) thekid (thekid) tpunt (tpunt) trowski (trowski) yunosh (yunosh) Final result: 33 11 This poll has been closed.

References