|
| 1 | +# Design: WP_CSS_Token_Processor |
| 2 | + |
| 3 | +**Date:** 2026-03-06 |
| 4 | +**Status:** Approved |
| 5 | +**Related:** https://github.com/WordPress/wordpress-develop/pull/11104, https://core.trac.wordpress.org/ticket/64771 |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Background |
| 10 | + |
| 11 | +When a user without `unfiltered_html` (e.g. Author role, or site admins on some multisite configurations) saves a post containing block-level custom CSS (`attrs.style.css`) with `&` or `>` characters, the `filter_block_content()` pipeline corrupts the CSS through a three-step mangling chain: |
| 12 | + |
| 13 | +1. `parse_blocks()` / `json_decode()` — `\u0026` becomes `&` |
| 14 | +2. `filter_block_kses_value()` / `wp_kses()` — `&` becomes `&`, `>` becomes `>` (KSES treats CSS as HTML) |
| 15 | +3. `serialize_block_attributes()` / `json_encode()` — `&` becomes `\u0026amp;` |
| 16 | + |
| 17 | +Each subsequent save compounds the corruption. The root cause is that `wp_kses()` is an HTML sanitizer being applied to CSS — the wrong tool for the job. This class is the right tool. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Scope |
| 22 | + |
| 23 | +### In scope (this session) |
| 24 | + |
| 25 | +- `WP_CSS_Token_Processor` class — streaming CSS tokenizer |
| 26 | +- `sanitize()` instance method — strips unsafe tokens/rules, returns safe CSS string |
| 27 | +- `validate()` instance method — returns `true|WP_Error` |
| 28 | +- `get_updated_css()` instance method — reconstruct CSS after manual token modifications |
| 29 | +- `get_removed_tokens()` instance method — inspection after `sanitize()` |
| 30 | +- Low-level navigation and modification methods |
| 31 | +- Full inline PHPDoc |
| 32 | +- `README.md` in `src/wp-includes/css-api/` |
| 33 | +- Full test suite |
| 34 | + |
| 35 | +### Out of scope (follow-on sessions) |
| 36 | + |
| 37 | +- Integration with `filter_block_kses_value()` in `blocks.php` |
| 38 | +- `WP_CSS_Processor` — rule/declaration-aware layer (v2) |
| 39 | +- Replacing `process_blocks_custom_css()` in `WP_Theme_JSON` |
| 40 | +- CSS selector query engine (TODO in `class-wp-block.php:385`) |
| 41 | +- Customizer CSS and Global Styles CSS pipeline adoption |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +## Architecture |
| 46 | + |
| 47 | +### Directory structure |
| 48 | + |
| 49 | +``` |
| 50 | +src/wp-includes/ |
| 51 | +└── css-api/ |
| 52 | + ├── class-wp-css-token-processor.php |
| 53 | + └── README.md |
| 54 | +
|
| 55 | +tests/phpunit/tests/ |
| 56 | +└── css-api/ |
| 57 | + ├── WpCssTokenProcessorTest.php |
| 58 | + ├── WpCssTokenSanitizeTest.php |
| 59 | + └── WpCssTokenValidateTest.php |
| 60 | +``` |
| 61 | + |
| 62 | +### Component map |
| 63 | + |
| 64 | +``` |
| 65 | +WP_CSS_Token_Processor — tokenizes a CSS string into a typed token stream |
| 66 | + | |
| 67 | + | sanitize(): string — strips unsafe tokens/rules, returns safe CSS |
| 68 | + | validate(): true|WP_Error — returns true, or WP_Error with reason code |
| 69 | + | get_updated_css(): string — reconstruct after manual token modifications |
| 70 | +``` |
| 71 | + |
| 72 | +The integration point (`filter_block_kses_value()` dispatching to `sanitize()` for `['style','css']` paths) is a follow-on PR and is not part of this session. |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +## `WP_CSS_Token_Processor` |
| 77 | + |
| 78 | +### Design principles |
| 79 | + |
| 80 | +- **Spec-inspired, safety-first** — follows the CSS Syntax Level 3 token vocabulary and structure, but prioritises correctness on security-relevant tokens over completeness. Gaps cause rejection/stripping rather than silent pass-through. |
| 81 | +- **Forward-only streaming** — like `WP_HTML_Tag_Processor`, the processor advances a cursor through the input. No backtracking except via bookmarks (v2). |
| 82 | +- **Non-destructive modification** — operates on the original string buffer and applies edits on output via `get_updated_css()`. |
| 83 | +- **Instance-based API** — consistent with `WP_HTML_Tag_Processor`. Create an instance, call methods, retrieve output. |
| 84 | + |
| 85 | +### Token types |
| 86 | + |
| 87 | +#### Security-critical (must be correct) |
| 88 | + |
| 89 | +| Constant | Examples | Notes | |
| 90 | +|---|---|---| |
| 91 | +| `WP_CSS_Token_Processor::URL_TOKEN` | `url(foo.png)` | Protocol-filtered against `wp_allowed_protocols()` | |
| 92 | +| `WP_CSS_Token_Processor::BAD_URL_TOKEN` | `url(foo bar)` | Malformed URL — stripped | |
| 93 | +| `WP_CSS_Token_Processor::STRING_TOKEN` | `"hello"`, `'world'` | Quoted strings | |
| 94 | +| `WP_CSS_Token_Processor::BAD_STRING_TOKEN` | Unterminated string | Stripped | |
| 95 | +| `WP_CSS_Token_Processor::AT_KEYWORD_TOKEN` | `@media`, `@import` | At-rule allowlist enforced in `sanitize()` | |
| 96 | +| `WP_CSS_Token_Processor::OPEN_CURLY_TOKEN` | `{` | Block depth tracking | |
| 97 | +| `WP_CSS_Token_Processor::CLOSE_CURLY_TOKEN` | `}` | Block depth tracking | |
| 98 | + |
| 99 | +#### Structurally important |
| 100 | + |
| 101 | +| Constant | Examples | |
| 102 | +|---|---| |
| 103 | +| `WP_CSS_Token_Processor::IDENT_TOKEN` | `color`, `red`, `sans-serif` | |
| 104 | +| `WP_CSS_Token_Processor::FUNCTION_TOKEN` | `calc(`, `var(`, `rgb(` | |
| 105 | +| `WP_CSS_Token_Processor::DELIM_TOKEN` | `&`, `>`, `+`, `~`, `*` | |
| 106 | +| `WP_CSS_Token_Processor::DIMENSION_TOKEN` | `16px`, `1.5rem`, `100vh` | |
| 107 | +| `WP_CSS_Token_Processor::PERCENTAGE_TOKEN` | `50%` | |
| 108 | +| `WP_CSS_Token_Processor::NUMBER_TOKEN` | `42`, `1.5` | |
| 109 | +| `WP_CSS_Token_Processor::HASH_TOKEN` | `#ff0000`, `#my-id` | |
| 110 | +| `WP_CSS_Token_Processor::WHITESPACE_TOKEN` | Preserved in output | |
| 111 | +| `WP_CSS_Token_Processor::SEMICOLON_TOKEN` | `;` | |
| 112 | +| `WP_CSS_Token_Processor::COLON_TOKEN` | `:` | |
| 113 | +| `WP_CSS_Token_Processor::COMMA_TOKEN` | `,` | |
| 114 | + |
| 115 | +#### Stripped unconditionally |
| 116 | + |
| 117 | +| Constant | Reason | |
| 118 | +|---|---| |
| 119 | +| `WP_CSS_Token_Processor::CDO_TOKEN` | `<!--` — HTML comments have no place in CSS | |
| 120 | +| `WP_CSS_Token_Processor::CDC_TOKEN` | `-->` — HTML comments have no place in CSS | |
| 121 | +| Null bytes | Stripped in preprocessing, before tokenization | |
| 122 | +| `</style` sequence | Injection guard — `sanitize()` returns `''`, `validate()` returns `WP_Error` | |
| 123 | + |
| 124 | +#### Out of scope for v1 (documented gaps — treated as unknown, stripped) |
| 125 | + |
| 126 | +- Unicode range tokens (`U+`) |
| 127 | +- Surrogate pair edge cases beyond basic UTF-8 |
| 128 | + |
| 129 | +### API surface |
| 130 | + |
| 131 | +#### Construction |
| 132 | + |
| 133 | +```php |
| 134 | +$processor = new WP_CSS_Token_Processor( string $css ); |
| 135 | +``` |
| 136 | + |
| 137 | +#### Low-level navigation |
| 138 | + |
| 139 | +```php |
| 140 | +$processor->next_token(): bool // Advance cursor. Returns false at EOF. |
| 141 | +$processor->get_token_type(): string // Token type constant for current token. |
| 142 | +$processor->get_token_value(): string // Raw value of current token. |
| 143 | +$processor->get_block_depth(): int // Current { } nesting depth. |
| 144 | +``` |
| 145 | + |
| 146 | +#### Low-level modification |
| 147 | + |
| 148 | +```php |
| 149 | +$processor->set_token_value( string $value ): bool // Replace current token's value. |
| 150 | +$processor->remove_token(): bool // Remove current token from output. |
| 151 | +``` |
| 152 | + |
| 153 | +#### High-level consumers (primary public API) |
| 154 | + |
| 155 | +```php |
| 156 | +$processor->sanitize(): string // Strip unsafe tokens/rules. Returns safe CSS string. |
| 157 | +$processor->validate(): true|WP_Error // true if safe, WP_Error with code if not. |
| 158 | +$processor->get_updated_css(): string // Reconstruct CSS after manual token modifications. |
| 159 | +$processor->get_removed_tokens(): array // Log of what was stripped and why, after sanitize(). |
| 160 | +``` |
| 161 | + |
| 162 | +--- |
| 163 | + |
| 164 | +## Security Policy |
| 165 | + |
| 166 | +### `sanitize()` — token-level rules |
| 167 | + |
| 168 | +Applied during tokenization, before structural analysis: |
| 169 | + |
| 170 | +| Condition | Action | |
| 171 | +|---|---| |
| 172 | +| `</style` anywhere in input | Return `''` immediately — do not continue | |
| 173 | +| Null bytes | Strip in preprocessing | |
| 174 | +| `bad-url-token`, `bad-string-token` | Strip token | |
| 175 | +| `CDO-token`, `CDC-token` | Strip token | |
| 176 | +| `url-token` with `javascript:` or `data:` | Strip token entirely | |
| 177 | +| `url-token` with other disallowed protocol | Replace URL value with `''`, preserve `url()` wrapper | |
| 178 | + |
| 179 | +### `sanitize()` — rule-level rules |
| 180 | + |
| 181 | +Applied during structural traversal, after tokenization: |
| 182 | + |
| 183 | +**At-rule allowlist:** |
| 184 | + |
| 185 | +``` |
| 186 | +Allowed: @media, @supports, @keyframes, @layer, @container, @font-face |
| 187 | +Blocked: @import, @charset, @namespace |
| 188 | +Unknown: stripped (safety-first — gaps reject, not pass-through) |
| 189 | +``` |
| 190 | + |
| 191 | +Strip granularity: declaration fails → drop declaration; rule fails → drop rule; rest of CSS preserved. |
| 192 | + |
| 193 | +### `validate()` rules |
| 194 | + |
| 195 | +Returns `WP_Error` if any of the following are present: |
| 196 | + |
| 197 | +| Condition | Error code | |
| 198 | +|---|---| |
| 199 | +| `</style` sequence | `css_injection` | |
| 200 | +| `bad-url-token` or `bad-string-token` | `css_malformed_token` | |
| 201 | +| Disallowed `url()` protocol | `css_unsafe_url` | |
| 202 | +| Blocked or unknown at-rule | `css_disallowed_at_rule` | |
| 203 | +| Null bytes | `css_null_byte` | |
| 204 | +| `CDO-token` / `CDC-token` | `css_html_comment` | |
| 205 | + |
| 206 | +`validate()` passing is a guarantee that `sanitize()` is a no-op on the same input. |
| 207 | + |
| 208 | +### What the security policy explicitly does NOT do |
| 209 | + |
| 210 | +- Does not validate property names or values — authoring intent, not a security concern |
| 211 | +- Does not restrict CSS nesting depth |
| 212 | +- Does not filter `var()` or custom properties — cannot execute code |
| 213 | +- Does not block `expression()` — IE-era only, not worth the complexity |
| 214 | + |
| 215 | +### Idempotency guarantee |
| 216 | + |
| 217 | +`sanitize()` must be idempotent: |
| 218 | + |
| 219 | +``` |
| 220 | +sanitize( sanitize( $css ) ) === sanitize( $css ) |
| 221 | +``` |
| 222 | + |
| 223 | +This is a hard requirement enforced by the test suite. It directly addresses the compounding corruption bug in PR #11104. |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## Documentation |
| 228 | + |
| 229 | +### Inline PHPDoc |
| 230 | + |
| 231 | +- Every public method: `@since`, `@param`, `@return`, usage example |
| 232 | +- Class docblock: purpose, what it is not, spec reference, usage examples, known gaps |
| 233 | +- Security decisions commented with *why*, not just *what* |
| 234 | + |
| 235 | +### README.md |
| 236 | + |
| 237 | +Located at `src/wp-includes/css-api/README.md`. Covers: |
| 238 | + |
| 239 | +- Purpose and scope |
| 240 | +- Quick usage examples for `sanitize()` and `validate()` |
| 241 | +- Token type reference |
| 242 | +- Security policy summary |
| 243 | +- Known gaps and future work |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## Testing |
| 248 | + |
| 249 | +### Test files |
| 250 | + |
| 251 | +``` |
| 252 | +tests/phpunit/tests/css-api/ |
| 253 | +├── WpCssTokenProcessorTest.php — tokenizer unit tests |
| 254 | +├── WpCssTokenSanitizeTest.php — sanitize() tests |
| 255 | +└── WpCssTokenValidateTest.php — validate() tests |
| 256 | +``` |
| 257 | + |
| 258 | +### Test categories |
| 259 | + |
| 260 | +#### Tokenizer unit tests (`WpCssTokenProcessorTest.php`) |
| 261 | + |
| 262 | +- Each token type in isolation: correct `get_token_type()` and `get_token_value()` |
| 263 | +- Token sequences: declaration, qualified rule, nested rule |
| 264 | +- Block depth tracking via `get_block_depth()` |
| 265 | +- Edge cases: empty input, whitespace-only, single character |
| 266 | +- Manual modification: `set_token_value()`, `remove_token()`, `get_updated_css()` |
| 267 | + |
| 268 | +#### Sanitize tests (`WpCssTokenSanitizeTest.php`) |
| 269 | + |
| 270 | +- CSS nesting selectors (`&`, `& > p`, `& + span`) survive unchanged |
| 271 | +- Child combinator (`>`) survives unchanged |
| 272 | +- Valid at-rules (`@media`, `@supports`, `@keyframes`) survive unchanged |
| 273 | +- Blocked at-rule (`@import`) is stripped entirely |
| 274 | +- Unknown at-rule is stripped |
| 275 | +- `url()` with allowed protocol survives |
| 276 | +- `url()` with `javascript:` is stripped entirely |
| 277 | +- `url()` with `data:` is stripped entirely |
| 278 | +- `bad-url-token` is stripped |
| 279 | +- `bad-string-token` is stripped |
| 280 | +- `</style` input returns `''` |
| 281 | +- Null bytes are stripped |
| 282 | +- `CDO` / `CDC` tokens are stripped |
| 283 | +- `get_removed_tokens()` is populated after stripping |
| 284 | +- `get_removed_tokens()` is empty when nothing is stripped |
| 285 | +- **Idempotency**: `sanitize(sanitize($css)) === sanitize($css)` over a broad fixture set |
| 286 | +- **Regression fixtures from PR #11104**: |
| 287 | + - `color: blue; & p { color: red; }` survives unchanged |
| 288 | + - `& > p { margin: 0; }` survives unchanged |
| 289 | + - Repeated saves do not compound corruption |
| 290 | + |
| 291 | +#### Validate tests (`WpCssTokenValidateTest.php`) |
| 292 | + |
| 293 | +- Valid CSS returns `true` |
| 294 | +- Each blocked condition returns `WP_Error` with the correct error code |
| 295 | +- `validate()` passing guarantees `sanitize()` is a no-op (tested over fixture set) |
| 296 | + |
| 297 | +--- |
| 298 | + |
| 299 | +## Open questions (deferred) |
| 300 | + |
| 301 | +- Should `get_removed_tokens()` be structured (array of `['token' => ..., 'reason' => ...]`) or flat? TBD during implementation. |
| 302 | +- Should the at-rule allowlist be filterable via a WordPress filter hook (like `safe_style_css`)? Likely yes, deferred to implementation. |
| 303 | +- Exact `@since` version tag — placeholder `X.X.0` during development. |
0 commit comments