Skip to content
Closed
Changes from 3 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
bf47cb7
Union types proposal
nikic Sep 2, 2019
16a72d9
Fix typos
nikic Sep 4, 2019
8839f65
Fix another typo
nikic Sep 4, 2019
cf76009
false is a subtype of bool
nikic Sep 4, 2019
cf59fa5
Mention alternatives for handling of type coercions
nikic Sep 4, 2019
256535b
Note methods that are inherited from ReflectionTypes
nikic Sep 4, 2019
5bdd11d
Add reflection examples
nikic Sep 4, 2019
1d13bf1
Clarify that "use" is part of duplicate detection
nikic Sep 5, 2019
452daa5
Clarify that future scope is future scope...
nikic Sep 5, 2019
12eb8e4
Update RFC header
nikic Sep 5, 2019
f8bee21
Fix typos in reflection examples
nikic Sep 5, 2019
698cbd3
Also mention object|T as a redundant type pair
nikic Sep 6, 2019
0a2782b
Use T1|T2|null instead of ?(T1|T2)
nikic Sep 6, 2019
931afdd
Add type conversion table
nikic Sep 6, 2019
60fc4d0
Fix typo in variance description
nikic Sep 6, 2019
3622f8c
s/int/float in coercive typing alternative example
nikic Sep 6, 2019
edcc2df
Also mention iterable + array/Traversable as redundant types
nikic Sep 6, 2019
7829b45
Remove obsolete note about parentheses
nikic Sep 18, 2019
76736e5
Swap sections on nullable types and false types
nikic Sep 18, 2019
36dcf38
Clarify that "false|null" and friends are also not allowed
nikic Sep 18, 2019
fc3b6a5
Clarify that no implicit coercions to "false" occur
nikic Sep 18, 2019
6b29bdf
Slightly expand literal types section
nikic Sep 18, 2019
19a12d0
Update reflection output for recent master change
nikic Oct 25, 2019
edf6ac2
Add link to final RFC document
nikic Nov 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
388 changes: 388 additions & 0 deletions rfcs/0000-union-types-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,388 @@
* Name: `union_types_v2`
* Date: 2019-09-02
* Author: Nikita Popov <nikic@php.net>
* Proposed Version: PHP 8.0
* RFC PR: [php-src/rfcs#0000](https://github.com/php-src/rfcs/pull/0000)

# Introduction

A "union type" accepts values of multiple different types, rather than a single one. PHP already supports two special union types:

* `Type` or `null`, using the special `?Type` syntax.
* `array` or `Traversable`, using the special `iterable` type.

However, arbitrary union types are currently not supported by the language. Instead, phpdoc annotations have to be used, such as in the following example:

```php
class Number {
/**
* @var int|float $number
*/
private $number;

/**
* @param int|float $number
*/
public function setNumber($number) {
$this->number = $number;
}

/**
* @return int|float
*/
public function getNumber() {
return $this->number;
}
}
```

The [statistics section](#statistics) shows that the use of union types is indeed pervasive in the open-source ecosystem, as well as PHP's own standard library.

Supporting union types in the language allows us to move more type information from phpdoc into function signatures, with the usual advantages this brings:

* Types are actually enforced, so mistakes can be caught early.
* Because they are enforced, type information is less likely to become outdated or miss edge-cases.
* Types are checked during inheritance, enforcing the Liskov Substitution Principle.
* Types are available through Reflection.
* The syntax is a lot less boilerplate-y than phpdoc.

After generics, union types are currently the largest "hole" in our type declaration system.

# Proposal

Union types are specified using the syntax `T1|T2|...` and can be used in all positions where types are currently accepted:

```php
class Number {
private int|float $number;

public function setNumber(int|float $number): void {
$this->number = $number;
}

public function getNumber(): int|float {
return $this->number;
}
}
```

## Supported Types

Union types support all types currently supported by PHP, with some caveats outlined in the following.

### `void` type

The `void` type can never be part of a union. As such, types like `T|void` are illegal in all positions, including return types.

The `void` type indicates that the function has no return value, and enforces that argument-less `return;` is used to return from the function. It is fundamentally incompatible with non-void return types.

What is likely intended instead is `?T`, which allows returning either `T` or `null`.

### `false` pseudo-type
Copy link

@dbrekelmans dbrekelmans Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO adding the false pseudo-type because of library implementation details is the wrong choice. Adding a language-level feature to cope with a historical design flaw is only adding to the pile. It's legacy as soon as it's implemented.

I absolutely understand your reasoning; I myself would prefer to know a function cannot return true. However, if typehinting |bool over |false is such an issue, it should be fixed at the root (return null instead of false) instead of adding a band-aid.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're getting at, but rather than "legacy as soon as it's implemented" it could also be seen as a pre-cursor for a more general feature, where you can specify a union of values rather than just types. See "Literal Types" in the Future Scope section for a brief discussion of this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to BC constraints, changes to return type like that have a devastating potential to break uncountable (read infinite) amount of code on the internet, affecting adoption rate for the upgrade. I don't think you are wrong to say it's legacy the moment it's implemented, but it's part of coping with the 20 years baggage of PHP.
I think discussing the merits of BC breaks is not worth anybody's time at this RFC as Internals have been debating that for several months already (that I know of).

Copy link

@dbrekelmans dbrekelmans Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deleugpn I'm certainly not proposing such a BC break and I agree let's avoid the BC discussion. I'd rather leave the wound exposed (typehint |bool where false is returned) than cover it up with this pseudo-type band-aid.

@IMSoP Looking at it that way, it makes more sense. :)

Copy link

@mattacosta mattacosta Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with false is really one of meaning as opposed to technical requirement. With all the "standard" types, developers adopt a general definition of what they represent. The value false is generally defined similarly to:

// The two values of a logical condition.
enum Boolean {
  False,
  True,
}

As we can see, false was never meant to represent an error condition much in the same way that the string "apple" or the number 1234 don't generally represent error conditions either. Including false as a pseudo-type simply perpetuates this misrepresentation.

Furthermore, I would argue that may actually be detrimental to developers as its usage in this manner typically indicates an error.

  • In the case of a function or method that should return a value, what was likely meant was null or another type similar to Rust's Option<T> or Result<T, E> which are meant to represent error conditions.
    This also covers literal types, because they are generally used to represent a subset of some other type. Since true is invalid in this context and false should not be combined with anything (its supertype does not have any other values), false can no longer represent its intended meaning, one of two values.
  • In the case of a function or method that should not return a value, the correct result should actually be a thrown error instead. Using false in this second case is especially bad, as it may also prevent static analysis from narrowing the intended type.
class Map {
  function has(key: TKey): bool {}
  function get(key: TKey): T | false {}
}

function foo(): T | null {
  $map = new Map();
  if ($map->has("bar")) {
    // Here we expect `get()` to always return `T`, however because of our poor
    // design decision to not throw an exception, the type cannot be narrowed,
    // and an error would be reported here because `T | false` != `T | null`.
    return $map->get("bar");
  }
  return null;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're getting at, but rather than "legacy as soon as it's implemented" it could also be seen as a pre-cursor for a more general feature, where you can specify a union of values rather than just types. See "Literal Types" in the Future Scope section for a brief discussion of this.

Maybe it would be better to leave false out of this proposal but flesh it out more as a part of literal types? If someone wants to use that paradigm, allowing type false = false would allow the end use to pull off those sort of Shenanigans without encouraging poor practices by specifically implementing them in code.


While we nowadays encourage the use of `null` over `false` as an error or absence return value, for historical reasons many internal functions continue to use `false` instead. As shown in the [statistics section](#statistics), the vast majority of union return types for internal functions include `false`.

A classical example is the `strpos()` family of functions, which returns `int|false`.

While it would be possible to model this less accurately as `int|bool`, this gives the false impression that the function can also return a `true` value, which makes this type information significantly less useful to humans and static analyzers both.

For this reason, support for the `false` pseudo-type is included in this proposal. A `true` pseudo-type is *not* part of the proposal, because similar historical reasons for its necessity do not exist.

The `false` pseudo-type cannot be used as a standalone type, it can only be used as part of a union.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some clarity on why true is not included as a pseudo-type is needed here.

Copy link

@asgrim asgrim Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A true pseudo-type is not part of the proposal, because similar historical reasons for its necessity do not exist.

it's already there ;)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. A bit weak, IMO, so if there isn't much complexity involved, I'd suggest introducing it to reduce the edge cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that true should be included as a pseudo type. Without reading the RFC one would assume it's included. It helps documentation & tutorials since no effort is needed to explain why it's not included.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. A bit weak, IMO, so if there isn't much complexity involved, I'd suggest introducing it to reduce the edge cases.

@mnapoli I noticed your downvote. Care to elaborate? (Personally, I value your opinion.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnapoli I noticed your downvote. Care to elaborate?

@joshdifabio it's easier to simply downvote what you find on the internet rather than write a proper argument, I went with the lazy way 😄

Nikita summed it up nicely already, I don't have a strong opinion on this and I don't think my opinion matters much here, but: false would exist for legacy reasons. true would have no real reason to exist except consistency, and I would favor aiming for what's essential over consistency or completeness.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is sound; my one concern is that allowing false as an explicit return type seems to indicate that it's an OK/good thing to do. I think most agree these days that returning something|false is bad design, but PHP (and sadly much of userspace) is full of that bad design.

Is there some way, perhaps just in documentation, that we could call out that returning foo|false is supported but a bad idea? Or at least avoid giving the implication that it's a good idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people should just fall back to Foo|bool for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially now that the syntax for nullable type has switched to T|null, it should be made clear that not only false, but also both ?false and false|null, are unsupported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding support for false in favor of bool without proper support for literal types helps to promote bad design decisions

### Nullable union types

Nullable types in PHP use the syntax `?T`. Nullable union types are required to be written as `?(T1|T2)`, while variations such as `?T1|T2`, `T1|?T2` or even `?T1|?T2` are disallowed. This avoids issues of "symmetry" where the nullability is syntactically associated with a specific type, while in reality it could equally apply to all.

An alternative would be to allow an explicit `null` type for use in unions only, such that the above example could be written `T1|T2|null`, and the existing syntax `?T` could be written `T|null`. While this might have been a good option if union types were introduced earlier, I think it is preferable not to introduce a second type nullability syntax at this point in time.

### Duplicate types

Each literal type may only occur once inside a union. As such, types like `int|string|INT` are not permitted. Additionally, only one of `false` and `bool` may be used.

This is a purely syntactical restriction that is intended to catch simple bugs in the type specification. It does not ensure that the union type is in some sense "minimal".

For example, if `A` and `B` are class aliases, then `A|B` remains a legal union type, even though it could be reduced to either `A` or `B`. Similarly, if `class B extends A {}`, then `A|B` is also a legal union type, even though it could be reduced to just `A`. Detecting these cases would require loading all types at the point of declaration.

### Type grammar

Excluding the special `void` type, PHP's type syntax may now be described by the following grammar:

```
type: simple_type
| union_type
| "?" simple_type
| "?" "(" union_type ")"
;

union_type: simple_type "|" simple_type
| union_type "|" simple_type
;

simple_type: "false" # only legal in unions
| "bool"
| "int"
| "float"
| "string"
| "array"
| "object"
| "iterable"
| "callable" # not legal in property types
| "self"
| "parent"
| namespaced_name
;
```

At this point in time, parentheses in types are only allowed in the one case where they are necessary, which is the `?(T1|T2|...)` syntax. With further extensions to the type system (such as intersection types) it may make sense to allow parentheses in more arbitrary positions.

## Variance

Union types follow the existing variance rules:

* Return types are covariant (child must be subtype).
* Parameter types are contravariant (child must be supertype).
* Property types are invariant (child must be subtype and supertype).

The only change is in how union types interact with subtyping. A union `U_1|...|U_n` is a subtype of `V_1|...|V_n` if for each `U_i` there exists a `V_j` such that `U_i` is a subtype of `V_j`.

Additionally, the `iterable` type is considered to be the same (i.e. both subtype and supertype) as `array|Traversable`.

In the following, some examples of what is allowed and what isn't are given.

### Property types

Property types are invariant, which means that types must stay the same during inheritance. However, the "same" type may be expressed in different ways. Prior to union types, one such possibility was to have two aliased classes `A` and `B`, in which case a property type may legally change from `A` to `B` or vice versa.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because I had to think this through for a moment, I'd suggest explaining why property types are invariant during inheritance. Specifically that setting a property would allow for contravariance, but that reading would only allow for covariance. The intersection of those rules is invariance.


Union types expand the possibilities in this area: For example `int|string` and `string|int` represent the same type. The following example shows a more complex case:

```php
class A {}
class B extends A {}

class Test {
public A|B $prop;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should not be allowed at all? Since A type already includes B as well, using A|B type has no added value over just A.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewhere in the RFC, this can't be known in the general case at compile-time, because classes A and B may not have been defined yet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and type analysers can do the complaining for you (though at least in my case it'll just treat as public A $prop).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. Yeah, I'm sure PHPStan will handle this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved to a separate section: Known problems or Limitations because while it should work as @enumag says, the engine doesn't make it possible ATM.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there would be some value if the language was able to point this out, but there's plenty of ways of writing redundant code, so I don't see it as a "limitation" particularly. For instance, you can already write if ( $x instanceof A || $x instanceof B ), or even if ( $x instanceof A || $x instanceof A ), which is clearly redundant. A tool like PHPStan may well spot this for you, and can probably spot redundant union types as well, because it can analyse your whole project.

}
class Test2 extends Test {
public A $prop;
}
```

In this example, the union `A|B` actually represents the same type as just `A`, and this inheritance is legal, despite the type not being syntactically the same.

Formally, we arrive at this result as follows: First, `A` is a subtype of `A|B`, because it is a subtype of `A`. Second, `A|B` is a subtype of `A`, because `A` is a subtype of `A` and `B` is a subtype of `A`.

### Adding and removing union types

It is legal to remove union types in return position and add union types in parameter position:

```php
class Test {
public function param1(int $param) {}
public function param2(int|float $param) {}

public function return1(): int|float {}
public function return2(): int {}
}

class Test2 extends Test {
public function param1(int|float $param) {} // Allowed: Adding extra param type
public function param2(int $param) {} // FORBIDDEN: Removing param type

public function return1(): int {} // Allowed: Removing return type
public function return2(): int|float {} // FORBIDDEN: Adding extra return type
}
```

### Variance of individual union members

Similarly, it is possible to restrict a union member in return position, or widen a union member in parameter position:

```php
class A {}
class B extends A {}

class Test {
public function param1(B|string $param) {}
public function param2(A|string $param) {}

public function return1(): A|string {}
public function return2(): B|string {}
}

class Test2 extends Test {
public function param1(A|string $param) {} // Allowed: Widening union member B -> A
public function param2(B|string $param) {} // FORBIDDEN: Restricting union member A -> B

public function return1(): B|string {} // Allowed: Restricting union member A -> B
public function return2(): A|string {} // FORBIDDEN: Widening union member B -> A
}
```

Of course, the same can also be done with multiple union members at a time, and be combined with the addition/removal of types mentioned previously.

## Coercive typing mode

When `strict_types` is not enabled, scalar type declarations are subject to limited implicit type coercions. These are problematic in conjunction with union types, because it is not always obvious which type the input should be converted to. For example, when passing a boolean to an `int|string` argument, both `0` and `""` would be viable coercion candidates.

If the exact type of the value is not part of the union, then the target type is chosen in the following order of preference:

1. `int`
2. `float`
3. `string`
4. `bool`

If the type both exists in the union, and the value can be coerced to the type under PHPs existing type checking semantics, then the type is chosen. Otherwise the next type is tried.

As an exception, if the value is a string and both `int` and `float` are part of the union, the preferred type is determined by the existing "numeric string" semantics. For example, for `"42"` we choose `int`, while for `"42.0"` we choose `float`.

### Examples

```php
// int|string
42 --> 42 // exact type
"42" --> "42" // exact type
new ObjectWithToString --> "Result of __toString()"
// object never compatible with int, fall back to string
42.0 --> 42 // float compatible with int
42.1 --> 42 // float compatible with int
1e100 --> "1.0E+100" // float too large for int type, fall back to string
INF --> "INF" // float too large for int type, fall back to string
true --> 1 // bool compatible with int
[] --> TypeError // array not compatible with int or string

// int|float|bool
"45" --> 45 // int numeric string
"45.0" --> 45.0 // float numeric string
"45X" --> 45 + Notice: Non well formed numeric string
// int numeric string
"" --> false // not numeric string, fall back to bool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an explicit specification of what this does for casting to union types containing false, e.g. int|float|false.

My preference would that "X" -> int|float|false would become a thrown TypeError, and "" would become false

If that was the intended behavior, the above section should also be amended to something along the below lines:

- 4. `bool`
+ 4. `bool` (or `false` if the `false` is in the union type and the value would be false when cast to `bool`)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior for false is current as specified by omission, but should be explicitly mentioned: All types that are not explicitly listed here are not eligible target types for implicit coercions. This includes null, false, class types types, etc. In other words, the only valid value for the false type is false.

Given the intended use, I don't believe that there is any use-case for an implicit coercion to false (it is essentially used as an alternative null for legacy reasons and null is never subject to coercions either). We may wish to revisit this question if/when we introduce literal types and define general coercion semantics (or possibly non-coercion semantics...) at that point. For now restricting it to non-coercing is the conservative choice.

This is also one more point in favor of not supporting a true type at this point in time. That would bring up the question of how true|false relates to bool and whether those types are considered equal. If they are, this would essentially force the coercion semantics you suggest on the true and false literal types. That's certainly a possibility, but not necessary what we actually want, especially when it comes to the pseudo-enum use-case (e.g. if you have type Weekday = 1|2|3|4|5|6|7, should this really accept true as a value because that happens to coerce to 1?)

"X" --> true // not numeric string, fall back to bool
[] --> TypeError // array not compatible with int, float or bool
```

## Property types and references

References to typed properties with union types follow the semantics outlined in the [typed properties RFC](https://wiki.php.net/rfc/typed_properties_v2#general_semantics):

> If typed properties are part of the reference set, then the value is checked against each property type. If a type check fails, a TypeError is generated and the value of the reference remains unchanged.
>
> There is one additional caveat: If a type check requires a coercion of the assigned value, it may happen that all type checks succeed, but result in different coerced values. As a reference can only have a single value, this situation also leads to a TypeError.

The [interaction with union types](https://wiki.php.net/rfc/typed_properties_v2#future_interaction_with_union_types) was already considered at the time, because it impacts the detailed reference semantics. Repeating the example given there:

```php
class Test {
public int|string $x;
public float|string $y;
}
$test = new Test;
$r = "foobar";
$test->x =& $r;
$test->y =& $r;

// Reference set: { $r, $test->x, $test->y }
// Types: { mixed, int|string, float|string }

$r = 42; // TypeError
```

The basic issue is that the final assigned value (after type coercions have been performed) must be compatible with all types that are part of the reference set. However, in this case the coerced value will be `int(42)` for property `Test::$x`, while it will be `float(42.0)` for property `Test::$y`. Because these values are not the same, this is considered illegal and a `TypeError` is thrown.

An alternative approach would be to cast the value to the only common type `string` instead, with the major disadvantage that this matches *neither* of the values you would get from a direct property assignment.

## Reflection

To support union types, a new class `ReflectionUnionType` is added:

```php
class ReflectionUnionType extends ReflectionType {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to put all stuff into the ReflectionType itself to prevent violation of Liskov principle by checking instance of given type:

$type = (new ReflectionMethod(SomeClass::class, 'method'))->getReturnType();
if ($type instanceof ReflectionUnionType) {
    // ... ok, we have a union type here
    $types = $type->getTypes();
} else {
    // not a union type here
} 

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe provide list of simple methods in the ReflectionType, like isUnion(): bool, getSubTypes(): array?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by violation of LSP? Checking for instances doesn't violate LSP, composite types only extend behaviour of generic ReflectionType. (Also see my comment about ReflectionCompositeType above).

Besides, getSubTypes() doesn't make sense on generic ReflectionType if the type is simple.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's not about LSP, but we need to write class-dependent code and this considered as a bad smell. Ideally, there should be one single ReflectionNamedType or ReflectionType class, that is responsible for providing whole information about type. Otherwise we will have several similar classes with minor difference in implementation...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. The issue is that ReflectionType was just a naive assumption. With today's needs, it should've been an interface, implemented by ReflectionNamedType and extended by ReflectionCompositeType interface. ReflectionCompositeType would then be implemented by ReflectionUnionType and ReflectionIntersectionType.
Something like this:

                  <interface> ReflectionType
                          /           \         
                         /             \
       ReflectionNamedType     <interface> ReflectionCompositeType
                                            /   \
                                           /     \
                         ReflectionUnionType     ReflectionIntersectionType

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReflectionType was indeed a naive assumption, which is why 7.1 added ReflectionNamedType specifically in preparation for future type-system extensions. As of PHP 8, ReflectionType is an abstract class, though we could also turn it into an interface -- it's just a question of shared implementation.

Having a class hierarchy is not a code smell -- trying to fit everything into a single class is the code smell. We are representing something akin to a (type) AST here, and doing that with a polymorphic object hierarchy is the standard way of going about it (in an OO language without ADTs).

Especially when you keep future type-system extensions like callable types with signatures in mind, it should be obvious that all of those different types cannot be part of the same interface without it turning into a convoluted API where some methods are only usable for some types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this probably never really came across when it was introduced, the way that ReflectionNamedType is supposed to be used is something like this...

$type = $r->getType();
if ($type instanceof ReflectionNamedType) {
    // Use $type->getName() + $type->isBuiltin() here
} else if (...) {
    // Check more type kinds here in the future
} else {
    throw new Exception("Ooops, I don't understand this type!");
}

Of course, because ReflectionNamedType is the only ReflectionType right now, what people actually end up doing is just

$type = $r->getType();
// Let's just assume that this is ReflectionNamedType!
// Use $type->getName() + $type->isBuiltin() here

which will of course break if new kinds of types are added in the future. (Though the previous variant will of course also break, just more gracefully.)

/** @return ReflectionType[] */
public function getTypes();
Copy link

@Majkl578 Majkl578 Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to split this method off to a separate abstract ReflectionCompositeType? It could later be used by ReflectionIntersectionType without moving this method around and allow forward-compatible code already.


/** @return bool */
public function allowsNull();

/** @return string */
public function __toString();
}
```

The `getTypes()` method returns an array of `ReflectionType`s that are part of the union. The types may be returned in an arbitrary order that does not match the original type declaration. The types may also be subject to equivalence transformations.

For example, the type `int|string` may return types in the order `["string", "int"]` instead. The type `iterable|array|string` might be canonicalized to `iterable|string` or `Traversable|array|string`. The only requirement on the Reflection API is that the ultimately represented type is equivalent.

The `allowsNull()` method returns whether the union additionally contains the type null. A possible alternative would be to introduce a separate `ReflectionNullableType` to represent the `?T` wrapper explicitly. I would prefer this, but we would probably not be able to use it for non-union types for backwards compatibility reasons.

The `__toString()` method returns a string representation of the type that constitutes a valid code representation of the type in a non-namespaced context. It is not necessarily the same as what was used in the original code. Notably this *will* contain the leading `?` for nullable types and not be bug-compatible with `ReflectionNamedType`.

# Backwards Incompatible Changes

This RFC does not contain any backwards incompatible changes. However, existing ReflectionType based code will have to be adjusted in order to support processing of code that uses union types.

# Future Scope

## Intersection Types

Intersection types are logically conjugated with union types. Instead of requiring that (at least) a single type constraints is satisfied, all of them must be.

For example `Traversable|Countable` requires that the passed value is either `Traversable` or `Countable`, while `Traversable&Countable` requires that it is both.

## Mixed Type

The `mixed` type allows to explicitly annotate that any value is acceptable. While specifying no type has the same behavior on the surface, it does not make clear whether the type is simply missing (because nobody bothered adding it yet, or because it can't be added for backwards compatibility reasons), or whether genuinely any value is acceptable.

We've held off on adding a `mixed` type out of fear that it would be used in cases where a more specific union could have been specified. Once union types are supported, it would probably also make sense to add the `mixed` type.

## Literal Types

The `false` pseudo-type introduced in this RFC is a special case of a "literal type", such as supported by [TypeScript](https://www.typescriptlang.org/docs/handbook/advanced-types.html#string-literal-types). They allow specifying enum-like types, which are limited to specific values.

```php
type ArrayFilterFlags = 0|ARRAY_FILTER_USE_KEY|ARRAY_FILTER_USE_BOTH;
array_filter(array $array, callable $callback, ArrayFilterFlags $flag): array;
```

Proper enums are likely a better solution to this problem space, though depending on the implementation they may not be retrofitted to existing functions for backwards-compatibility reasons.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really love to see both custom types and enums, as separate language features, so hope we keep the door open to both. In particular, I am strongly of the opinion that an enum value should not be coercible to a plain type, because that leads to nonsense like MONDAY == JANUARY; as you say, that would rule out using them for existing function signatures. Custom types, meanwhile, could have more complex definitions - e.g. combining unions, intersections, constrained callables, and domains such as int{>=0,<100}.


## Type Aliases

As types become increasingly complex, it may be worthwhile to allow reusing type declarations. There are two general ways in which this could work. One is a local alias, such as:

```php
use int|float as number;

function foo(number $x) {}
```

In this case `number` is a symbol that is only visible locally and will be resolved to the original `int|float` type during compilation.

The second possibility is an exported typedef:

```php
namespace Foo;
type number = int|float;

// Usable as \Foo\number from elsewhere
```

# Proposed Voting Choices

Simple yes/no vote.

# Statistics

To illustrate the use of union types in the wild, the use of union types in `@param` and `@return` annotations in phpdoc comments has been analyzed.

In the top two thousand composer packages there are:

* 25k parameter union types: [Full JSON data](https://gist.github.com/nikic/64ff90c5038522606643eac1259a9dae#file-param_union_types-json)
* 14k return union types: [Full JSON data](https://gist.github.com/nikic/64ff90c5038522606643eac1259a9dae#file-return_union_types-json)

In the PHP stubs for internal functions (these are incomplete right now, so the actual numbers should be at least twice as large) there are:

* 336 union return types
* of which 312 include `false` as a value

This illustrates that the `false` pseudo-type in unions is necessary to express the return type of many existing internal functions.