Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/vendor/
/.phpcs-cache
/.phpunit.result.cache
/.phpunit.cache/
9 changes: 9 additions & 0 deletions .markdownlint.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"default": true,
"MD013": false,
"MD014": false,
"MD024": false,
"MD028": false,
"MD031": { "list_items": false },
"MD034": false
}
52 changes: 28 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@

Utilities for generating PHP code.


## Normalizers

The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings,
The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings,
[transliterating] them to ASCII and spelling out any invalid characters.

### Usage:
### Usage

The following code (forgive the Japanese - a certain translation tool tells me it means "Pet Store"):

```php
<?php

Expand All @@ -24,11 +24,13 @@ echo $namespace;
```

outputs:
```

```text
Petto\Shoppu
```

and:

```php
<?php

Expand All @@ -40,47 +42,48 @@ echo $property;
```

outputs:
```

```text
twoDollarBill
```

See the [tests] for more examples.

### Why?

You must **never** run code generated from untrusted user input. But there are a few cases where you do want to
You must **never** run code generated from untrusted user input. But there are a few cases where you do want to
_output_ code generated from (mostly) trusted input.

In my case, I need to generate classes and properties from an OpenAPI specification. There are no hard-and-fast rules
on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever
they are.
on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever
they are.

### How?

Each normalizer uses `ext-intl`'s [Transliterator] to turn the UTF-8 string into Latin-ASCII. Where a character has no
equivalent in ASCII (the "€" symbol is a good example), it uses the [Unicode name] of the character to spell it out (to
`Euro`, after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell
Each normalizer uses `ext-intl`'s [Transliterator] to turn the UTF-8 string into Latin-ASCII. Where a character has no
equivalent in ASCII (the "€" symbol is a good example), it uses the [Unicode name] of the character to spell it out (to
`Euro`, after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell
outs. For instance, a backtick "&#96;" becomes `Backtick`.

Initial digits are also spelt out: "123foo" becomes `OneTwoThreeFoo`. Finally reserved words are suffixed with a
user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would
Initial digits are also spelt out: "123foo" becomes `OneTwoThreeFoo`. Finally reserved words are suffixed with a
user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would
become `ClassController`.

The results may not be pretty. If for some mad reason your input contains ` ͖` - put your glasses on! - the label will
contain `CombiningRightArrowheadAndUpArrowheadBelow`. But it _is_ valid PHP, and stands a chance of being as unique as
The results may not be pretty. If for some mad reason your input contains `͖` - put your glasses on! - the label will
contain `CombiningRightArrowheadAndUpArrowheadBelow`. But it _is_ valid PHP, and stands a chance of being as unique as
the original. Which brings me to...


## Unique labelers

The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it
mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or
The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it
mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or
whatever your programming preference. It's gonna be lossy - nothing we can do about that.

The unique labelers' job is to add back lost uniqueness, using a `UniqueStrategyInterface` to decorate any non-unique
class names in the list it is given.

To guarantee uniqueness within a set of class name labels, use the `UniqueClassLabeller`:

```php
<?php

Expand All @@ -96,7 +99,8 @@ var_dump($unique);
```

outputs:
```

```text
array(3) {
'Déjà vu' =>
string(7) "DejaVu1"
Expand All @@ -107,10 +111,11 @@ array(3) {
}
```

There are labelers for each of the normalizers: `UniqueClassLabeler`, `UniqueConstantLabeler`, `UniquePropertyLabeler`
and `UniqueVariableLabeler`. Along with the `NumberSuffix` implementation of `UniqueStrategyInterface`, we provide a
There are labelers for each of the normalizers: `UniqueClassLabeler`, `UniqueConstantLabeler`, `UniquePropertyLabeler`
and `UniqueVariableLabeler`. Along with the `NumberSuffix` implementation of `UniqueStrategyInterface`, we provide a
`SpellOutOrdinalPrefix` strategy. Using that instead of `NumberSuffix` above would output:
```

```text
array(3) {
'Déjà vu' =>
string(11) "FirstDejaVu"
Expand All @@ -123,8 +128,7 @@ array(3) {

Kinda cute, but a bit verbose for my taste.


[transliterating]: https://unicode-org.github.io/icu/userguide/transforms/general/#script-transliteration
[tests]: ./test/AbstractNormalizerTest.php
[Transliterator]: https://www.php.net/manual/en/class.transliterator.php
[Unicode name]: https://unicode.org/charts/charindex.html
[Unicode name]: https://unicode.org/charts/charindex.html
9 changes: 5 additions & 4 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,14 @@
"sort-packages": true
},
"require": {
"php": "~8.1 || ~8.2",
"php": "~8.2.0 || ~8.3.0",
"ext-intl": "*"
},
"require-dev": {
"laminas/laminas-coding-standard": "^2.4",
"phpunit/phpunit": "^9.5",
"vimeo/psalm": "^4.27"
"laminas/laminas-coding-standard": "^3.0",
"phpunit/phpunit": "^10.5.37",
"psalm/plugin-phpunit": "^0.19.0",
"vimeo/psalm": "^5.26"
},
"autoload": {
"psr-4": {
Expand Down
Loading