Simplify getting Encoding in `TranscriptionOption.FlushContentToDisk` #13910

Gimly · 2020-10-27T22:09:51Z

PR Summary

Simplify getting encoding of existing file in the FlushContentToDisk method of the TranscriptionOption class called when the transcription file is flushed to disk.

PR Context

The FlushContentToDisk method was calling a complex GetEncoding utils that was trying to read the file encoding "by hand". Through a discussion with @iSazonov in #13677 and #13899 it was decided that this could be simplified by using a StreamReader and taking advantage of the CurrentEncoding property.

This allows to reduce code complexity and should also improve compatibility with different encoding. This PR should have no functional impact except improve recognition of encoding of existing files in some cases not covered by the existing code.

PR Checklist

Utils.GetEncoding was using a complicated implementation that could be easily replaced by StreamReader.GetCurrentEncoding. Used that implementation in the FlushContentToDisk method

Gimly · 2020-10-27T22:14:08Z

@iSazonov I think that's what you had in mind. I didn't remove the GetEncoding method from Utils because I thought it was still cleaner to have that code outside of the FlushContentToDisk method, but it's greatly simplified.

I've removed the Encoding property and simply call the GetEncoding method at the start of the FlushContentToDisk, I don't think that should have an impact on performances and I think it's a bit cleaner than opening a file in the setter of a property.

I decided to open a separate PR from the #13899 because I thought this, while related, is maybe a bit more open for discussion, hope that's OK for you.

src/System.Management.Automation/engine/Utils.cs

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs

src/System.Management.Automation/engine/Utils.cs

Remove the GetEncoding from Utils and two related static readonly fields. Add the default to utf8NoBom in the stream reader Put the stream reader in the try catch.

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs

rjmholt · 2020-10-30T22:28:37Z

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs

+                            Encoding currentEncoding;
+
+                            using (StreamReader reader = new StreamReader(this.Path, Utils.utf8NoBom, detectEncodingFromByteOrderMarks: true))
+                            {
+                                _ = reader.Read();
+                                currentEncoding = reader.CurrentEncoding;
+                            }
+


It seems like this is actually interspersing two pieces of logic:

Determining the encoding we want to use

Creating the file stream writer

If that's the case, it might be nicer to factor the first into its own method, similar to the pre-existing code:

private Encoding GetPathEncoding() { using (var reader = new StreamReader(Path, Utils.utf8NoBom, detectEncodingFromByteOrderMarks: true)) { _ = reader.Read(); return reader.CurrentEncoding; } } internal void FlushContentToDisk() { lock (OutputBeingLogged) { try { // Does this dispose check need to be within the lock at all? if (disposed) { return; } if (_contentWriter != null) { return; } try { _contentWriter = new StreamWriter( new FileStream(Path, FileMode.OpenOrCreate, FileAccess.ReadWrite), GetPathEncoding()); // Worth adding a comment here to explain what's happening _contentWriter.BaseStream.Seek(0, SeekOrigin.End); } catch (IOException) { _contentWriter = new StreamWriter( new FileStream(Path, FileMode.Append, FileAccess.Write, FileShare.Read), Utils.utf8NoBom); } } finally { OutputBeingLogger.Clear(); } } }

That's what I had initially, with the GetPathEncoding defined in the Utils class, but I think @iSazonov didn't like the method there.

In the end, it will be used only in this FlushContentToDisk, maybe it should be a local function? This way it is factored, but still available only in this method?

Otherwise I can put it back into Utils, in the end it's still valid, even more so if we just let the IOException bubble if there's an issue reading the file.

If a short method is used once I don't see a point to have it (especially since it should be in
try-catch block). We could add a comment before the code block.

@rjmholt what do you think?

Well I don't like making you go around in circles. So I'm happy to leave the code as it is.

@iSazonov the current PowerShell code is full of long tracts of statements, and breaking things into methods is an easy way to get readability and also improves reusability later on. Honestly I just find it a useful tool to simplify the process for writing and reviewing code. That's where my preference comes from. And given the complexity of some parts of our codebase, I don't see much argument against it.

especially since it should be in try-catch block

This is the kind of clarity that putting something into a method provides. As is, it's not clear what the full intent and purpose of the try/catch is (why do we fall back to a totally different setting in the catch block?). Once you separate the logic into two pieces and must give them a human-readable name, it's easier to assess what each operation's error semantics should be.

I don't mind changing things, in the end the goal was to make the code more readable / simple, and I tend to agree that it would probably be clearer in a method, which makes it clear why we're creating this StreamReader here.

Since what seems to bother @iSazonov is the fact that the method would be used only in this method, I think it's a good candidate for a local function, it groups the logic while keeping it "inside" the method and usable only there. Do you think this would be an acceptable proposal?

@rjmholt Thanks for clarify!
I see your point and agree that a method with an informative name adds readability. But in this case it seemed to me unnecessary. We have a request to create a cmdlet like Get-FileEncoding but I don't think this piece of code is worth worrying about reuse.
As for this try-catch block, I was afraid that we could change something in the behavior and get regression now or in the future.

So, as discussed I did a test with a local function. I think it's a good middle ground between your two opinions, we have the code factored and with an understandable name and at the same time it's available only in the FlushContentToDisk method.
Waiting on your remarks.

I think a local function sounds ideal 🙂

Use a local function to factor the path encoding code.

iSazonov · 2020-11-11T03:38:35Z

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs

                OutputBeingLogged.Clear();
            }
+
+            Encoding getPathEncoding()


Please moved the local function to start of the method.
Also please use Pascal name case - local function is de-facto a method and could be move in another scope.

Done 😄. Interesting that you prefer the local function at the start. I haven't seen any recommendation for local functions yet. Do you happen to have coding conventions available for them?

In my team we prefer to put it at the bottom, this way you have "pseudo-code" at the beginning with only the calls and you can just scroll down to see the implementation details.

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs

Co-authored-by: Ilya <darpa@yandex.ru>

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs

Co-authored-by: Ilya <darpa@yandex.ru>

ghost · 2020-11-21T02:00:08Z

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

iSazonov · 2020-12-09T20:05:17Z

@Gimly Thanks for your contribution!

ghost · 2020-12-15T21:36:09Z

🎉v7.2.0-preview.2 has been released which incorporates this pull request.:tada:

Handy links:

Release Notes

Gimly added 3 commits October 27, 2020 22:57

Remove Encoding property in TranscriptionOption

4f8f6c1

Simplify Utils.GetEncoding to use StreamReader

ca6123d

Utils.GetEncoding was using a complicated implementation that could be easily replaced by StreamReader.GetCurrentEncoding. Used that implementation in the FlushContentToDisk method

Catch a more specific exception

a5849ea

iSazonov reviewed Oct 28, 2020

View reviewed changes

src/System.Management.Automation/engine/Utils.cs Outdated Show resolved Hide resolved

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs Outdated Show resolved Hide resolved

iSazonov reviewed Oct 28, 2020

View reviewed changes

src/System.Management.Automation/engine/Utils.cs Outdated Show resolved Hide resolved

Further cleanup and fix PR comment

4390247

Remove the GetEncoding from Utils and two related static readonly fields. Add the default to utf8NoBom in the stream reader Put the stream reader in the try catch.

Gimly force-pushed the simplify_encoding_flushContentToDisk branch from a928560 to 4390247 Compare October 28, 2020 11:47

iSazonov reviewed Oct 28, 2020

View reviewed changes

Implement PR suggestions

3185744

iSazonov approved these changes Oct 28, 2020

View reviewed changes

iSazonov added the CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log label Oct 28, 2020

There is no need to initialize the variable

6c42ba2

iSazonov reviewed Oct 28, 2020

View reviewed changes

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs Outdated Show resolved Hide resolved

Gimly force-pushed the simplify_encoding_flushContentToDisk branch from baae118 to 6c42ba2 Compare October 29, 2020 21:55

iSazonov requested a review from JamesWTruher October 30, 2020 06:08

iSazonov assigned rjmholt Oct 30, 2020

rjmholt reviewed Oct 30, 2020

View reviewed changes

Factor get path encoding

64c1407

Use a local function to factor the path encoding code.

Gimly force-pushed the simplify_encoding_flushContentToDisk branch from 208271c to 64c1407 Compare November 10, 2020 21:20

iSazonov reviewed Nov 11, 2020

View reviewed changes

Moved local function to start of method and PascalCase

6c6bbf4

iSazonov approved these changes Nov 12, 2020

View reviewed changes

iSazonov reviewed Nov 13, 2020

View reviewed changes

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs Outdated Show resolved Hide resolved

Make GetPathEncoding static and use inline using

6c48338

Co-authored-by: Ilya <darpa@yandex.ru>

iSazonov reviewed Nov 13, 2020

View reviewed changes

src/System.Management.Automation/engine/hostifaces/MshHostUserInterface.cs Outdated Show resolved Hide resolved

Add back missing parameter to GetPathEncoding Path

fe64bf9

Co-authored-by: Ilya <darpa@yandex.ru>

ghost added the Review - Needed The PR is being reviewed label Nov 21, 2020

rjmholt merged commit 971c428 into PowerShell:master Dec 9, 2020

ghost removed the Review - Needed The PR is being reviewed label Dec 9, 2020

iSazonov added this to the 7.2.0-preview.2 milestone Dec 9, 2020

Simplify getting Encoding in TranscriptionOption.FlushContentToDisk #13910

Simplify getting Encoding in TranscriptionOption.FlushContentToDisk #13910

Uh oh!

Conversation

Gimly commented Oct 27, 2020

PR Summary

PR Context

PR Checklist

Uh oh!

Gimly commented Oct 27, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rjmholt Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iSazonov Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gimly Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ghost commented Nov 21, 2020

Uh oh!

iSazonov commented Dec 9, 2020

Uh oh!

ghost commented Dec 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Simplify getting Encoding in `TranscriptionOption.FlushContentToDisk` #13910

Simplify getting Encoding in `TranscriptionOption.FlushContentToDisk` #13910

rjmholt Nov 3, 2020 •

edited

Loading

iSazonov Nov 11, 2020 •

edited

Loading

Gimly Nov 11, 2020 •

edited

Loading