Skip to content

Conversation

@xtqqczze
Copy link
Contributor

@xtqqczze xtqqczze commented Aug 12, 2020

PR Summary

  • Use string.Create instead of concatenation in method ConnectToDataSource
  • Use for loop instead of foreach to avoid allocating an enumerator

PR Context

#13410 (comment)

PR Checklist

@ghost ghost assigned rjmholt Aug 12, 2020
@rjmholt

This comment has been minimized.

@rjmholt
Copy link
Collaborator

rjmholt commented Aug 13, 2020

Sorry, just realised I for some reason assumed the backing data structure was an array.

Here's an updated benchmark:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Collections.Specialized;

namespace CSharpPlayground
{
    [MemoryDiagnoser]
    public class LoopAllocationTest
    {
        public static void RunProfile()
        {
            BenchmarkRunner.Run<LoopAllocationTest>();
        }

        private const int Size = 1000000;

        private string[] _stringArray;
        private StringCollection _stringCollection;
        private List<string> _stringList;

        [GlobalSetup]
        public void Setup()
        {

            _stringArray = new string[Size];
            _stringCollection = new StringCollection();
            _stringList = new List<string>();
            var random = new Random();
            for (int i = 0; i < Size; i++)
            {
                string s = random.NextDouble().ToString();
                _stringArray[i] = s;
                _stringCollection.Add(s);
                _stringList.Add(s);
            }
        }

        [Benchmark(Baseline = true)]
        public void Run_For_Array()
        {
            for (int i = 0; i < _stringArray.Length; i++)
            {
                string s = _stringArray[i];
            }
        }

        [Benchmark]
        public void Run_ForEach_Array()
        {
            foreach (string s in _stringArray)
            {
                ;
            }
        }

        [Benchmark]
        public void Run_For_StrCollection()
        {
            for (int i = 0; i < _stringCollection.Count; i++)
            {
                string s = _stringCollection[i];
            }
        }

        [Benchmark]
        public void Run_ForEach_StrCollection()
        {
            foreach (string s in _stringCollection)
            {
                ;
            }
        }

        [Benchmark]
        public void Run_For_List()
        {
            for (int i = 0; i < _stringList.Count; i++)
            {
                string s = _stringList[i];
            }
        }

        [Benchmark]
        public void Run_ForEach_List()
        {
            foreach (string s in _stringList)
            {
                ;
            }
        }
    }
}

Results:


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.450 (2004/?/20H1)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.4.20258.7
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.25106, CoreFX 5.0.20.25106), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.25106, CoreFX 5.0.20.25106), X64 RyuJIT


|                    Method |       Mean |     Error |    StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------------- |-----------:|----------:|----------:|------:|--------:|------:|------:|------:|----------:|
|             Run_For_Array |   232.6 us |   3.81 us |   3.38 us |  1.00 |    0.00 |     - |     - |     - |         - |
|         Run_ForEach_Array |   231.3 us |   3.00 us |   2.81 us |  1.00 |    0.01 |     - |     - |     - |         - |
|     Run_For_StrCollection | 3,917.4 us |  15.95 us |  12.45 us | 16.81 |    0.28 |     - |     - |     - |         - |
| Run_ForEach_StrCollection | 5,395.3 us | 107.08 us | 131.50 us | 23.37 |    0.53 |     - |     - |     - |      80 B |
|              Run_For_List |   687.9 us |   3.39 us |   3.00 us |  2.96 |    0.05 |     - |     - |     - |         - |
|          Run_ForEach_List | 2,978.8 us |  15.77 us |  13.98 us | 12.81 |    0.22 |     - |     - |     - |         - |

So it seems to be worth switching to List<string> and perhaps also keeping the for loop for such a dramatic speedup.

@iSazonov

This comment has been minimized.

@iSazonov
Copy link
Collaborator

And it is not hot path - I don't see a value from the PR.

@rjmholt
Copy link
Collaborator

rjmholt commented Aug 13, 2020

We can not change public API.

It's on an internal class, no? If it's exposed publicly, then best not to change it. But this being on a native helper class, it feels like we can make that change.

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Aug 13, 2020

And it is not hot path - I don't see a value from the PR.

I had assumed this method was a public API so I ignored the fact there are no references to ConnectToDataSource(StringCollection blgFileNames) in our codebase.

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Aug 13, 2020

So it seems to be worth switching to List<string> and perhaps also keeping the for loop for such a dramatic speedup.

@rjmholt Given that this method appears to be unused, the performance I wouldn't think changing to List<string> justifies the hypothetical performance increase.

Thanks for your help with benchmarking though, it is good to confirm StringCollection performance is much worse then generic List<T>.

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Aug 14, 2020

@rjmholt I have written #13431 to replace StringCollection with generic List in general

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Aug 14, 2020

We can not change public API.

It's on an internal class, no? If it's exposed publicly, then best not to change it. But this being on a native helper class, it feels like we can make that change.

@rjmholt Would there be value in a general PR to change method visibility from public to internal on internal class to avoid confusion like this? I have draft PR with these changes: #13436

@rjmholt
Copy link
Collaborator

rjmholt commented Aug 14, 2020

Would there be value in a general PR to change method visibility from public to internal on internal class to avoid confusion like this? I have draft PR with these changes: #13436

Generally I personally prefer public methods on internal and private classes; a class/object has a public API and private methods, internal implies a secondary API and muddies thinking around the cohesion and contract around a class. However, in PowerShell there is a consideration that once you have an object, any public member on it is visible to PowerShell (even if the class itself is internal), which drives a lot of the other classes that have internal members on everything.

Comment on lines 545 to 559
int capacity = blgFileNames.Count + 1;
for (int i = 0; i < blgFileNames.Count; i++)
{
capacity += blgFileNames[i].Length;
}

StringBuilder doubleNullTerminated = new StringBuilder(capacity);

for (int i = 0; i < blgFileNames.Count; i++)
{
doubleNullTerminated += fileName + '\0';
string fileName = blgFileNames[i];
doubleNullTerminated.Append(fileName).Append('\0');
}

doubleNullTerminated += '\0';
doubleNullTerminated.Append('\0');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the way we calculate capacity up front here, it's relatively simple to move to string.Create():

string.Create(capacity, blgFileNames, (buf, fileNames) =>
{
    int offset = 0;
    for (int i = 0; i < fileNames.Count; i++)
    {
        string currFileName = fileNames[i];
        int currFileNameLength = currFileName.Length;
        Span<char> currSpan = buf.Slice(offset, currFileNameLength);
        currFileName.AsSpan().CopyTo(currSpan);

        offset += currFileNameLength;
        buf[offset] = '\0';
        offset++;
    }

    buf[offset] = '\0';
});

There's probably a bug or two in the above, but I think the general idea is there

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xtqqczze let me know if you're interested in trying this out. It's probably not a huge win relatively speaking (compared, for example, to native interop marshalling), but still relevant for an allocation/performance improvement PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjmholt We can also refactor the following code doing the same thing:

// set up the double-null-terminated lpDependencies parameter
IntPtr lpDependencies = IntPtr.Zero;
if (DependsOn != null)
{
int numchars = 1; // final null
foreach (string dependedOn in DependsOn)
{
numchars += dependedOn.Length + 1;
}
char[] doubleNullArray = new char[numchars];
int pos = 0;
foreach (string dependedOn in DependsOn)
{
Array.Copy(
dependedOn.ToCharArray(), 0,
doubleNullArray, pos,
dependedOn.Length
);
pos += dependedOn.Length;
doubleNullArray[pos++] = (char)0; // null terminator
}
doubleNullArray[pos++] = (char)0; // double-null terminator
Diagnostics.Assert(pos == numchars, "lpDependencies build error");
lpDependencies = Marshal.AllocHGlobal(
numchars * Marshal.SystemDefaultCharSize);
Marshal.Copy(doubleNullArray, 0, lpDependencies, numchars);
}

@ghost ghost added the Review - Needed The PR is being reviewed label Aug 25, 2020
@ghost
Copy link

ghost commented Aug 25, 2020

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@rjmholt rjmholt added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Review - Needed The PR is being reviewed labels Aug 25, 2020
@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Aug 28, 2020
@xtqqczze xtqqczze force-pushed the StringBuilder-ConnectToDataSource branch from cd94fda to 0e363ee Compare August 28, 2020 20:27
@xtqqczze
Copy link
Contributor Author

@iSazonov Do you any suggestions for CodeFactor issues here?

@iSazonov
Copy link
Collaborator

@xtqqczze I like the formatting. Please ignore these CodeFactor warnings.

@xtqqczze xtqqczze changed the title Use StringBuilder inside loop instead of concatenation Use string.Create instead of concatenation in PdhHelper Aug 29, 2020
@rjmholt
Copy link
Collaborator

rjmholt commented Aug 29, 2020

Do we have a test for this code path? Want to make sure we would identify a regression

@xtqqczze
Copy link
Contributor Author

There appear to be no references to ConnectToDataSource(StringCollection blgFileNames). No tests either.

@xtqqczze
Copy link
Contributor Author

@rjmholt Maybe we could add tests and refactor double null-terminated string generation code to a NativeUtils class. We could then reuse the code in Microsoft.PowerShell.Commands.Management/commands/management/Service.cs

@ghost ghost added the Review - Needed The PR is being reviewed label Sep 6, 2020
@ghost
Copy link

ghost commented Sep 6, 2020

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@SteveL-MSFT SteveL-MSFT added the CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log label Dec 15, 2020
@iSazonov
Copy link
Collaborator

iSazonov commented Jan 9, 2021

@xtqqczze Please resolve merge conflicts.

@ghost ghost removed the Review - Needed The PR is being reviewed label Jan 9, 2021
@iSazonov
Copy link
Collaborator

@xtqqczze Please resolve merge conflicts.

@ghost ghost added the Review - Needed The PR is being reviewed label Jan 17, 2021
@ghost
Copy link

ghost commented Jan 17, 2021

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@iSazonov
Copy link
Collaborator

The PdhHelper.cs is not in the compile so we should close the PR.

@ghost ghost removed the Review - Needed The PR is being reviewed label Jun 28, 2021
@iSazonov iSazonov self-requested a review June 28, 2021 03:26
@iSazonov iSazonov added the Resolution-Won't Fix The issue won't be fixed, possibly due to compatibility reason. label Jun 28, 2021
@rjmholt
Copy link
Collaborator

rjmholt commented Jun 28, 2021

The PdhHelper.cs is not in the compile so we should close the PR.

Assuming this is true, I'll close the PR. Let me know if there's something else we should be doing here.

@rjmholt rjmholt closed this Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log Resolution-Won't Fix The issue won't be fixed, possibly due to compatibility reason.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants