Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PI processing support for common phrases #173

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Girgias
Copy link
Member

@Girgias Girgias commented Oct 29, 2024

Initial support for generic sentences via PI instead of forcing translations (and English docs) to write the same thing over and over again, also improves consistency.

See: php/doc-en#3965

@Girgias
Copy link
Member Author

Girgias commented Oct 29, 2024

@sy-records @mumumu @devnexen @leonardolara @cmb69 this should help to improve the consistency of wording in the documentation and give less work to translators/more automatic updates.

Not exactly sure how or where to store the message fragments, but I whipped this out in 2h, so any suggestion is appreciated.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea

continue;
}
// Don't care about generic phpdoc PI target
if ($pi->target === 'phpdoc') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication with str_starts_with ?

Copy link
Member Author

@Girgias Girgias Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we have a bunch of PI instruction that are just phpdoc and those that will generate documentation node have more complicated names, so the target is effectively the instruction and the data are values to pass to the corresponding function.

Maybe this design is crap, but I didn't want to figure out how to parse the function to call

configure.php Outdated Show resolved Hide resolved
configure.php Outdated Show resolved Hide resolved
@@ -0,0 +1,46 @@
<?php

function error_section_value_error_between(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some duplication between these functions which would be nice to somehow get rid of; at least it should be possible to move the import to the caller. Note that there's also an appendXML method you can use on DOMElement.

Copy link
Member

@kamil-tekiela kamil-tekiela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea, but how often are these entries so regular for this to see much use?

pi_processing/processors.php Show resolved Hide resolved
@Girgias
Copy link
Member Author

Girgias commented Oct 29, 2024

I think this is a good idea, but how often are these entries so regular for this to see much use?

The idea is to create more generic sentence, like ValueError for empty strings, which would persist longer than changelogs.

I also think our XML entity usage is somewhat annoying/problematic as it doesn't allow us to render validate/render XML files individually. Using PIs solves that problem too, and allow to basically have entities with parameters.

The number of times I wish we had this for the "This parameter is now nullable" which I needed to write and translate countless times for PHP 8.0.

Considering that conditions that trigger ValueErrors are still poorly documented from the 7.4 to 8.0 mass conversion, having this to easily add (and have translation automatically have it too if we automate the process) is something that I think is important to have.

@alfsb
Copy link
Member

alfsb commented Oct 29, 2024

Initial support for generic sentences via PI instead of forcing translations (and English docs) to write the same thing over and over again, also improves consistency.

Some overall ideas, to facilitate future developments:

  • Instead of match ($pi->target) and fixed cases, having a global $piFuncs[], where the array keys are the PI name, and values are functions. So the general /pi_processing/langs/en.php first fills this array, and any /pi_processing/langs/$LANG.php overwrites it.

New PIs are deployed only on /pi_processing/langs/, without need to change configure.php each time.

See below for an alternative.

  • Do not explode(' ', $pi->data), pass it entirely. So for any future complication can be transparently handled by PI functions.
  • Instead of $pi->parentNode->insertBefore, pass the entire $pi in functions, so they can further process if necessary. For example, to search the PI context.

The common code of simple cases can be, in turn, free stating functions deployed on en.php or any specific $LANG.php.

Not exactly sure how or where to store the message fragments, but I whipped this out in 2h, so any suggestion is appreciated.

Lets create an entities/ in each translation, with this conventions:

  • .ent files are XML entities
  • .txt files are simple text entities
  • .php files are PI files

The en/entities/ and possibly $LANG/entities/ are scanned in order by configure.php, filing two arrays.

An global $entities[], keys are basename without extension, values are the full filenames, so it would be possible to consolidate all .ent files from doc-base, en/ and code in one place, and from this auto generate an global.ent with, that in turn in imported on manual.in.

And a global $piFuncs as above. The kick is, then, that any PI function can access any message fragment, be it a simple text or XML formatted, already "translated" or en/ defaulted.

@cmb69
Copy link
Member

cmb69 commented Oct 29, 2024

I very much like the idea! Thank you!

I'm not too happy with the notation, though. We could go all in:

<?php value_error('timeout', '0', 'PHP_INT_MAX')?>

That would also allow to use optional and named parameters, and parsing would be trivial.

Or, since these are basically printf()s with only string arguments:

<?pf value-error, timeout, 0, PHP_INT_MAX?>

Always call sprintf(), but look up the first argument somewhere.

@alfsb
Copy link
Member

alfsb commented Oct 29, 2024

Going for simple cases, without fixing any syntax could be like:

<?gen_error_section_value_error_between parameter 20 30 ?>

On en/entities/pi/gen_error_section_value_error_between.php:

$PI["gen_error_section_value_error_between"] = function( DOMProcessingInstruction $pi )
{
    $args = explode( trim( $pi->value ) , ' ' );
    $text = "<tag>text {$ENT['entity']} {$args[1]} text</tag>";
    $node = DOMDocument::loadXML( $text ); // catch local errors early!
    $pi->parentNode->replaceNode( $node , $pi );
};

If the four lines above are common enough, a generic function declared on configure.php can be called, for simple "templated" generators.

Then any $LANG/entities/pi/gen_error_section_value_error_between.php may have:

$PI["gen_error_section_value_error_between"] = function( DOMProcessingInstruction $pi )
{
    pidocgen_simple_template( $pi , "entity name" );
};

And the driver on configure.php searches for PI nodes, and calls the respective $PI[] function keyed by PI's name.

@alfsb
Copy link
Member

alfsb commented Oct 29, 2024

Alternative, for (most of) simple cases. On en/entities/gen_error_section_value_error_between.php:

$PI["gen_error_section_value_error_between"] = function( DOMProcessingInstruction $pi )
{
    $args = "PARAMETER MIN MAX";
    $text = <<<'TEXT'
<simpara xmlns="http://docbook.org/ns/docbook">
 Throws a <exceptionname>ValueError</exceptionname> if <parameter>PARAMETER</parameter>
 is less than MIN or greater than MAX.
</simpara>
TEXT;
    // 1. Extracts and associates positional strings of $pi->value into a keyed array and $args
    // 2. Replaces the text or textual payloads (3th argument is `string | DOMNode` )
    // 3. Replaces the PI node with the interpolated text or interpolated argument node
    pidocgen_interpolate_replace( $pi , $args , $text );
};

So:

  • There is no freestanding functions, polluting the global namespace;
  • Texts are nearby their positional mapping names;
  • "Overloading" by array, translation may overwrite only here
  • These en/entities/*.php files are tracked on revcheck.php, as any pure textual file.

@Girgias
Copy link
Member Author

Girgias commented Oct 30, 2024

I don't really understand what benefit there is in forcing translations to deal with the functions that handle the PI nodes.

I agree that the text should be in a new folder of each translation, but having it in doc-base allows me to demonstrate the idea without needing to create a bunch of stuff in different files.

However, I would rather we move away from XML entities then have them be useful for this new mechanism, as they are very annoying.
Reason being it requires building the whole documentation, and cannot just use tooling (e.g. to verify the RelaxNG schema) on a single file, as that requires mangling the entities which defeats the point in a lot of cases.

A lot of the current entities in language-snippets could be replaced with an XInclude to a "master" version.
For example, a parameter description for a resource object could XInclude the first function that describes it (well this requires figuring out the fallback for translations, but that seems quite doable).

I also don't really see the problem with "polluting" the global namespace, if that's the problem we can just make it a class with static methods or use a namespace.

@cmb69
I don't want to use <?php as I don't really want to run PHP on the generated manual file which is not going to be a fun debugging process for people that are going to write docs.

But I quite like the idea of having the "function" to call as the first param.

@cmb69
Copy link
Member

cmb69 commented Oct 30, 2024

I don't want to use <?php as I don't really want to run PHP on the generated manual file which is not going to be a fun debugging process for people that are going to write docs.

I didn't mean to suggest to actually pre-process with PHP. That could also be <?func or <?foo. The main point was to evaluate the contents of the PI as PHP code (so there would be functions, instead of constants), and insert the resulting string (like it's currently done). But that appears to be overkill anyway.

@Girgias
Copy link
Member Author

Girgias commented Oct 30, 2024

I didn't want to use eval() either :D

@alfsb
Copy link
Member

alfsb commented Oct 30, 2024

I don't really understand what benefit there is in forcing translations to deal with the functions that handle the PI nodes.

I can think of two benefits. First, this would make possible for translations have their own PIs, by simple having an <?phpdocpi func args?> in XML and some $LANG/entities/func.php on repository, without needing any modification on doc-base code.

Second, ut would make this mechanism more general, for the cases where function arguments may need to be more complex (inline JSON text for example) or the code need be different on some translations (more/less arguments, because of language issues).

Also, the functions can be minimal. For an simple macro text injection can be as minimal as:

// 17a5268812b06c23073609c4e8ab976ebfceb61b
$PI["error_section_value_error_between"] = function( DOMProcessingInstruction $pi )
{
    $args = "PARAMETER MIN MAX";
    $text = <<<TEXT
<simpara>
 Throws a <exceptionname>ValueError</exceptionname> if <parameter>PARAMETER</parameter>
 is less than MIN or greater than MAX.
</simpara>
TEXT;
    phpdocpi_interpolate_replace( $pi , $args , $text );
};

No entities used and no namespacing issues, XML or PHP.

I agree that the text should be in a new folder of each translation, but having it in doc-base allows me to demonstrate the idea without needing to create a bunch of stuff in different files.

For demonstrations, yes. But for day by day translation work, having all translatable texts on each doc repo is far easier to debug. This is the reason why I'm arguing to have an entities/ directory, to have all "entity" usage in one place, be it a text fragment, an XML fragment or (now new) code that inject XML fragments, that is entities does.

However, I would rather we move away from XML entities then have them be useful for this new mechanism, as they are very annoying.
Reason being it requires building the whole documentation, and cannot just use tooling (e.g. to verify the RelaxNG schema) on a single file, as that requires mangling the entities which defeats the point in a lot of cases.

The example above refers to no entity.

And it do not spread a function name in three places (possibly, four): configure.php, XML sources, include/en/file.php (possibly include/$lang/file.php).

An global $PI[] on configure would spread a function name only in XML sources and en|$lang/entities/func.php. No modifications on configure.php for new functions, be it on en/ or $lang.

A lot of the current entities in language-snippets could be replaced with an XInclude to a "master" version.
For example, a parameter description for a resource object could XInclude the first function that describes it (well this requires figuring out the fallback for translations, but that seems quite doable).

I think the most simple solution is having any copyable portion of manual having an xml:id, and XIncluding it with <xi:include xpointer="id"/> or <xi:include href="file.xml" xpointer="id"/>. As no attributes in xml: namespace are not copied, this would not generate duplicated xml:is.

But I quite like the idea of having the "function" to call as the first param.

This can be simple as having something like. On XML:

<?phpdocpi error_section_value_error_between parameter min max?>

and on configure.php:

foreach ( $xpath->query('//processing-instruction()') as $pi )
{
    if ( $pi->target != 'phpdocpi')
        continue;

    $func = explode(' ', $pi->data)[0];
    if ( ! isset( $PI[$func] ) ) {
        // warning, continue
    }

    $data = $PI[$func]( $pi ); // Dynamic call, no eval()
    $node = null;

    // some PIs may work directly on XML, not returning anything

    if ( $data instanceof DOMNode )
        $node = $data;
    if ( is_string( $data )
        // loads text into DOM, extract DOM fragment
    if ( $node != null )
        $pi->parentNode->replaceNode( $node, $pi );
}

@alfsb
Copy link
Member

alfsb commented Oct 31, 2024

I don't really understand what benefit there is in forcing translations to deal with the functions that handle the PI nodes

With support on configure.php to treat PI's return strings as text to add or replace source PIs, the functions can be as minimal as:

// hash
$PI["error_section_value_error_between"] = function( DOMProcessingInstruction $pi )
{
    extract( phpdocpi_extract_args ( "PARAMETER MIN MAX" ) );
    return <<<TEXT
<simpara>
 Throws a <exceptionname>ValueError</exceptionname> if <parameter>$PARAMETER</parameter>
 is less than $MIN or greater than $MAX.
</simpara>
TEXT;
};

without loss of generality.

@cmb69
Copy link
Member

cmb69 commented Oct 31, 2024

With support on configure.php to treat PI's return strings as text to add or replace source PIs, the functions can be as minimal as:

Yeah, but will we ever need this generality? Can't we keep it simpler. I mean, all we need are printf style strings; these can be stored in INI files (or whereever; maybe even declaring "post-processable entities" might be an option). So we basically would have:

<simpara>
 Throws a <exceptionname>ValueError</exceptionname> if <parameter>%1s</parameter>
 is less than %2s or greater than %3s.
</simpara>

That may not be as expressive as other solutions, but looks good enough[TM] to me.

@alfsb
Copy link
Member

alfsb commented Oct 31, 2024

Yeah, but will we ever need this generality? Can't we keep it simpler. I mean, all we need are printf style strings; these can be stored in INI files

I think generality may be necessary in the future, and that this solution is simple, by the virtue of all data being in one place, instead of being spread over various files and various repositories, but I realize I'm not being convincing, and that's ok.

Note that using positional syntax instead of named syntax enables the self contained aspect.

For example, the call site

<?phpdocpi error_section_value_error_between $length 10 20 ?>

and the companion injected file, passed through printf $lang/entities/error_section_value_error_between.xml:

<simpara>
 Throws a <exceptionname>ValueError</exceptionname> if <parameter>[%1s]</parameter>
 is less than [%2s] or greater than [%3s].
</simpara>

is self contained, in the sense that the only modifications necessary to "run" these injections are the call and file sites, and don't need code or INI configurations each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants