Skip to content

Incorrect table column width with multibyte text.#109

Closed
miya0001 wants to merge 4 commits intowp-cli:masterfrom
miya0001:wrong-column-width
Closed

Incorrect table column width with multibyte text.#109
miya0001 wants to merge 4 commits intowp-cli:masterfrom
miya0001:wrong-column-width

Conversation

@miya0001
Copy link
Member

Incorrect table column width with multibyte text.
I added a test for this problem so it will be failed.

The output is like following.

array(8) {
  [0]=>
  string(79) "+----------------------------------------------------------------+------------+"
  [1]=>
  string(79) "| Field                                                          | Value      |"
  [2]=>
  string(79) "+----------------------------------------------------------------+------------+"
  [3]=>
  string(208) "| この文章はダミーです。文字の大きさ、量、字間、行間等を確認するために入れています。この文章はダミーです。文字の大きさ、量、字 | こんにちは |"
  [4]=>
  string(131) "| 間、行間等を確認するために入れています。この文章はダミーです。文字の大きさ、 |            |"
  [5]=>
  string(79) "| Lorem Ipsum is simply dummy text of the printing and typesetti | Hello      |"
  [6]=>
  string(79) "| ng industry.                                                   |            |"
  [7]=>
  string(79) "+----------------------------------------------------------------+------------+"
}

I am trying to fix this problem, but I haven't find a cause for now.
Are there any hints?

@miya0001
Copy link
Member Author

The result of the phpunit is following.

$ phpunit 
PHPUnit 5.6.0 by Sebastian Bergmann and contributors.

...................................F                              36 / 36 (100%)

Time: 60 ms, Memory: 4.00MB

There was 1 failure:

1) Test_Table::test_column_value_too_long_with_multibytes
Failed asserting that 142 matches expected 80.

/Users/miyauchi/www/php-cli-tools/tests/test-table.php:52

FAILURES!
Tests: 36, Assertions: 98, Failures: 1.

@miya0001 miya0001 changed the title add test for table width with multibyte text Incorrect table column width with multibyte text. Jul 23, 2017
@miya0001
Copy link
Member Author

miya0001 commented Jul 24, 2017

I wrote a sample test for investigating this problem.
I found that str_pad() isn't working as expected for multibyte text.
I am going to ask people for the solution of this problem.

	public function test_strpad() {
		$max = 'みなさんこんにちは!';
		$texts = array(
			'こんにちは',
			'Hello',
		);

		foreach ( $texts as $text ) {
			$new[] = str_pad( $text, mb_strwidth( $max ) );
		}

		// The text width has to be 19.
		$this->assertSame( mb_strwidth( $max ), mb_strwidth( $new[0] ) ); // Fail.
		$this->assertSame( mb_strwidth( $max ), mb_strwidth( $new[1] ) );
	}

Result:

1) Test_Table::test_strpad
Failed asserting that 14 is identical to 19.

@danielbachhuber
Copy link
Member

👌 thanks @miya0001

@miya0001
Copy link
Member Author

Oh ...orz
Splitting the text by mb_substr() is not working as expected in this case.

mb_substr( 'こんにちは', 0, 2 ) => "こん"
mb_substr( 'Hello', 0, 2 )     => "He"

@schlessera
Copy link
Member

@miya0001 Did you test whether the problem is still present with the changes that @gitlost had done in #107 ?

@miya0001
Copy link
Member Author

I am trying with this master branch, I think it hasn't solved.

@miya0001
Copy link
Member Author

I saw the problem with following test.

	public function test_column_value_too_long_with_multibytes() {

		$constraint_width = 80;

		$table = new cli\Table;
		$renderer = new cli\Table\Ascii;
		$renderer->setConstraintWidth( $constraint_width );
		$table->setRenderer( $renderer );
		$table->setHeaders( array( 'Field', 'Value' ) );
		$table->addRow( array( 'この文章はダミーです。文字の大きさ、量、字間、行間等を確認するために入れています。この文章はダミーです。文字の大きさ、量、字間、行間等を確認するために入れています。この文章はダミーです。文字の大きさ、', 'こんにちは' ) );
		$table->addRow( array( 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.', 'Hello' ) );

		$out = $table->getDisplayLines();
		print_r( $out );
		for ( $i = 0; $i < count( $out ); $i++ ) {
			$this->assertEquals( $constraint_width, mb_strwidth( $out[$i] ) + 1 );
		}
	}

@miya0001
Copy link
Member Author

I am trying to find a solution to split multibyte text as fixed length for now.

gitlost added a commit to gitlost/php-cli-tools that referenced this pull request Jul 24, 2017
@gitlost
Copy link
Contributor

gitlost commented Jul 24, 2017

Hi @miya0001 , I see what's happening - my fault - Ascii::row() is passing the width of the column to safe_substr() which is doubled for East Asian chars.

I was working on fixes related to #106 but am trying to fix this now - I'm thinking that maybe add a flag to safe_substr() to indicate the $length is a width, which is what I'm trying in gitlost#3 (which I accidentally pushed here), but although it's working locally it's failing on Travis so it's not the full story - maybe you can look at it...

@gitlost
Copy link
Contributor

gitlost commented Jul 24, 2017

Ok gitlost#3 is now passing Travis but it's not a great solution - it doubly loads unicode/regex.php for a start.

I'll see if there's a better solution (probably have to refactor the eaw_regex stuff I think) or let us know if you have one - perhaps there should be a safe_str_split() function or similar that does all the grimy stuff Ascii::row() is now doing...

@miya0001
Copy link
Member Author

Hi @gitlost 😄

Hmm, table looks broken on your test result.
I couldn't found the reason that why test is passed...

I see str_pad is not working for multibyte text, so I wrote a simple function to solve this problem.

https://github.com/miya0001/php-cli-tools/blob/4339c5906bc0543cb14a3fcc88cec755cb0ab7fd/lib/cli/cli.php#L257-L261

Following is a test for it. It is passed.

https://github.com/miya0001/php-cli-tools/blob/4339c5906bc0543cb14a3fcc88cec755cb0ab7fd/tests/test-cli.php#L47-L53

Thanks!

@miya0001
Copy link
Member Author

Oh! Sorry, it is just a problem of the CSS!

It looks nice!
I want to learn how did you fix this problem. 😄

Thanks!

@miya0001
Copy link
Member Author

I close this PR. :)

@miya0001 miya0001 closed this Jul 24, 2017
@gitlost
Copy link
Contributor

gitlost commented Jul 24, 2017

Hi no problem - I was wondering about that and didn't realize it was the CSS.

The fix isn't very elegant and is slow at the moment - it breaks the string into UTF-8 characters into an array and then UTF-8 character by UTF-8 character exams it to see if it's an East Asian char and then adjusts the length by 2 or 1 accordingly - there must be a better way!

@miya0001
Copy link
Member Author

I couldn't found the solution to split multibyte text as fixed length, so your solution is only way for now.
I am going to write a blog that introduces your solution. 😄

I am happy to see solving this problem.
Thanks for great work!

@gitlost
Copy link
Contributor

gitlost commented Jul 24, 2017

Thank you @miya0001! - I'll keep thinking about it though (sleeping on things is usually a good approach!) and see if there's something nicer...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants