Skip to content

Conversation

@Gimly
Copy link
Contributor

@Gimly Gimly commented Oct 27, 2020

PR Summary

Simplify getting encoding of existing file in the FlushContentToDisk method of the TranscriptionOption class called when the transcription file is flushed to disk.

PR Context

The FlushContentToDisk method was calling a complex GetEncoding utils that was trying to read the file encoding "by hand". Through a discussion with @iSazonov in #13677 and #13899 it was decided that this could be simplified by using a StreamReader and taking advantage of the CurrentEncoding property.

This allows to reduce code complexity and should also improve compatibility with different encoding. This PR should have no functional impact except improve recognition of encoding of existing files in some cases not covered by the existing code.

PR Checklist

Gimly added 3 commits October 27, 2020 22:57
Utils.GetEncoding was using a complicated implementation that could be easily replaced
by StreamReader.GetCurrentEncoding.

Used that implementation in the FlushContentToDisk method
@Gimly
Copy link
Contributor Author

Gimly commented Oct 27, 2020

@iSazonov I think that's what you had in mind. I didn't remove the GetEncoding method from Utils because I thought it was still cleaner to have that code outside of the FlushContentToDisk method, but it's greatly simplified.

I've removed the Encoding property and simply call the GetEncoding method at the start of the FlushContentToDisk, I don't think that should have an impact on performances and I think it's a bit cleaner than opening a file in the setter of a property.

I decided to open a separate PR from the #13899 because I thought this, while related, is maybe a bit more open for discussion, hope that's OK for you.

Remove the GetEncoding from Utils and two related static readonly fields.

Add the default to utf8NoBom in the stream reader
Put the stream reader in the try catch.
@Gimly Gimly force-pushed the simplify_encoding_flushContentToDisk branch from a928560 to 4390247 Compare October 28, 2020 11:47
@iSazonov iSazonov added the CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log label Oct 28, 2020
@Gimly Gimly force-pushed the simplify_encoding_flushContentToDisk branch from baae118 to 6c42ba2 Compare October 29, 2020 21:55
@iSazonov iSazonov requested a review from JamesWTruher October 30, 2020 06:08
Comment on lines 1104 to 1111
Encoding currentEncoding;

using (StreamReader reader = new StreamReader(this.Path, Utils.utf8NoBom, detectEncodingFromByteOrderMarks: true))
{
_ = reader.Read();
currentEncoding = reader.CurrentEncoding;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is actually interspersing two pieces of logic:

  • Determining the encoding we want to use
  • Creating the file stream writer

If that's the case, it might be nicer to factor the first into its own method, similar to the pre-existing code:

private Encoding GetPathEncoding()
{
    using (var reader = new StreamReader(Path, Utils.utf8NoBom, detectEncodingFromByteOrderMarks: true))
    {
        _ = reader.Read();
       return reader.CurrentEncoding;
    }
}

internal void FlushContentToDisk()
{
    lock (OutputBeingLogged)
    {
        try
        {
            // Does this dispose check need to be within the lock at all?
            if (disposed)
            {
                return;
            }

            if (_contentWriter != null)
            {
                return;
            }

            try
            {
                _contentWriter = new StreamWriter(
                    new FileStream(Path, FileMode.OpenOrCreate, FileAccess.ReadWrite),
                    GetPathEncoding());

                // Worth adding a comment here to explain what's happening
                _contentWriter.BaseStream.Seek(0, SeekOrigin.End);
            }
            catch (IOException)
            {
                _contentWriter = new StreamWriter(
                    new FileStream(Path, FileMode.Append, FileAccess.Write, FileShare.Read),
                    Utils.utf8NoBom);
            }
        }
        finally
        {
            OutputBeingLogger.Clear();
        }
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I had initially, with the GetPathEncoding defined in the Utils class, but I think @iSazonov didn't like the method there.

In the end, it will be used only in this FlushContentToDisk, maybe it should be a local function? This way it is factored, but still available only in this method?

Otherwise I can put it back into Utils, in the end it's still valid, even more so if we just let the IOException bubble if there's an issue reading the file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a short method is used once I don't see a point to have it (especially since it should be in
try-catch block). We could add a comment before the code block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjmholt what do you think?

Copy link
Collaborator

@rjmholt rjmholt Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I don't like making you go around in circles. So I'm happy to leave the code as it is.

@iSazonov the current PowerShell code is full of long tracts of statements, and breaking things into methods is an easy way to get readability and also improves reusability later on. Honestly I just find it a useful tool to simplify the process for writing and reviewing code. That's where my preference comes from. And given the complexity of some parts of our codebase, I don't see much argument against it.

especially since it should be in try-catch block

This is the kind of clarity that putting something into a method provides. As is, it's not clear what the full intent and purpose of the try/catch is (why do we fall back to a totally different setting in the catch block?). Once you separate the logic into two pieces and must give them a human-readable name, it's easier to assess what each operation's error semantics should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind changing things, in the end the goal was to make the code more readable / simple, and I tend to agree that it would probably be clearer in a method, which makes it clear why we're creating this StreamReader here.

Since what seems to bother @iSazonov is the fact that the method would be used only in this method, I think it's a good candidate for a local function, it groups the logic while keeping it "inside" the method and usable only there. Do you think this would be an acceptable proposal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjmholt Thanks for clarify!
I see your point and agree that a method with an informative name adds readability. But in this case it seemed to me unnecessary. We have a request to create a cmdlet like Get-FileEncoding but I don't think this piece of code is worth worrying about reuse.
As for this try-catch block, I was afraid that we could change something in the behavior and get regression now or in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as discussed I did a test with a local function. I think it's a good middle ground between your two opinions, we have the code factored and with an understandable name and at the same time it's available only in the FlushContentToDisk method.
Waiting on your remarks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a local function sounds ideal 🙂

Use a local function to factor the path encoding code.
@Gimly Gimly force-pushed the simplify_encoding_flushContentToDisk branch from 208271c to 64c1407 Compare November 10, 2020 21:20
OutputBeingLogged.Clear();
}

Encoding getPathEncoding()
Copy link
Collaborator

@iSazonov iSazonov Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please moved the local function to start of the method.
Also please use Pascal name case - local function is de-facto a method and could be move in another scope.

Copy link
Contributor Author

@Gimly Gimly Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 😄. Interesting that you prefer the local function at the start. I haven't seen any recommendation for local functions yet. Do you happen to have coding conventions available for them?

In my team we prefer to put it at the bottom, this way you have "pseudo-code" at the beginning with only the calls and you can just scroll down to see the implementation details.

Co-authored-by: Ilya <darpa@yandex.ru>
Co-authored-by: Ilya <darpa@yandex.ru>
@ghost ghost added the Review - Needed The PR is being reviewed label Nov 21, 2020
@ghost
Copy link

ghost commented Nov 21, 2020

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@rjmholt rjmholt merged commit 971c428 into PowerShell:master Dec 9, 2020
@ghost ghost removed the Review - Needed The PR is being reviewed label Dec 9, 2020
@iSazonov iSazonov added this to the 7.2.0-preview.2 milestone Dec 9, 2020
@iSazonov
Copy link
Collaborator

iSazonov commented Dec 9, 2020

@Gimly Thanks for your contribution!

@ghost
Copy link

ghost commented Dec 15, 2020

🎉v7.2.0-preview.2 has been released which incorporates this pull request.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants