序列化_阻止内容中的html标记中断

时间:2020-08-13 作者:michael

我试图通过PHP将一些数据注入块中,但由于parse\\u块/serialize\\u块破坏了我的内容,我遇到了麻烦

我使用的是默认2020主题,没有安装插件

add_action(\'wp\', function() {
    $oPost = get_post(119);

    printf("<h1>Post Content</h1><p>%s</p>", var_dump($oPost->post_content));

    $aBlocks = parse_blocks($oPost->post_content);


    printf("<h1>Parsed Blocks</h1><pre>%s</pre>", print_r($aBlocks, true));

    $sSerialisedBlocks = serialize_blocks($aBlocks);

    printf("<h1>Serialised Blocks</h1><p>%s</p>", var_dump($sSerialisedBlocks));
}, PHP_INT_MAX);
第一次打印(仅输出帖子内容)包含以下文本。。。

<h3>What types of accommodation are available in xxxx?<\\/h3>

第二个(解析成块后)包含以下内容。。。

<h3>What types of accommodation are available in xxxx?</h3>

但在重新序列化块后,我得到了这个。。。

\\u003ch3\\u003eWhat types of accommodation are available in xxxx?\\u003c\\/h3\\u003e

有人能告诉我我做错了什么吗?

<小时/>

EDIT

好的,我遵循了serialize\\u块的源代码,似乎这是serialize\\u block\\u属性显式转换某些字符的意图

我的问题是,为什么这些字符会出现在所见即所得中,而不是正确地转换回来?

1 个回复
SO网友:Tom J Nowell

This happens in serialize_block_attributes, the docblock explains why:

/**
...
 * The serialized result is a JSON-encoded string, with unicode escape sequence
 * substitution for characters which might otherwise interfere with embedding
 * the result in an HTML comment.
...
 */

So this is done as an encoding measure to avoid attributes accidentally closing a HTML comment and breaking the format of the document.

Without this, a HTML comment inside a block attribute would break the block and the rest of the content afterwards.

But How Do I Stop The Mangling?!!!

No, it isn\'t mangled. It\'s just encoding certain characters by replacing them with unicode escaped versions to prevent breakage.

Proof 1

Lets take the original code block from the question, and add the following fixes:

  • Wrap all in <pre> tags
  • Use esc_html so we can see the tags properly
  • Fix the printf by removing var_dump and using var_export with the second parameter so it returns rather than outputs
  • Add a final test case where we re-parse and re-serialize 10 times to compare the final result with the original
function reparse_reserialize( string $content, int $loops = 10 ) : string {
    $final_content = $content;
    for ($x = 0; $x <= $loops; $x++) {
        $blocks = parse_blocks( $final_content );
        $final_content = serialize_blocks( $blocks );
    }
    return $final_content;
}

add_action(
    \'wp\',
    function() {
        $p = get_post( 1 );

        echo \'<p>Original content:</p>\';
        echo \'<pre>\' . esc_html( var_export( $p->post_content, true ) ) . \'</pre>\';

        $final = reparse_reserialize( $p->post_content );

        echo \'<p>10 parse and serialize loops later:</p>\';
        echo \'<pre>\' . esc_html( var_export( $final, true ) ) . \'</pre>\';
        echo \'<hr/>\';
    },
    PHP_INT_MAX
);

Running that, we see that the content survived the process of being parsed and re-serialized 10 times. If mangling was occuring we would see progressively greater mangling occur

Proof 2

If we take the mangled markup:

\\u003ch3\\u003eWhat types of accommodation are available in xxxx?\\u003c\\/h3\\u003e

Turn it into a JSON string, then decode it:

$json = \'"\\u003ch3\\u003eWhat types of accommodation are available in xxxx?\\u003c\\/h3\\u003e"\';
echo \'<pre>\' . esc_html( json_decode( $json ) ) . \'</pre>\';

We get the original HTML:

<h3>What types of accommodation are available in xxxx?</h3>

So no mangling has taken place.

Summary

There is no mangling or corruption. It\'s just encoding the < and > to prevent breakage. JSON processors handle the unicode escape characters just fine.

If you are seeing these encoded characters in the block editor, then that is a bug, either in the block, or the ACF plugin. You should report it as such

相关推荐