Skip to content

Conversation

@powercode
Copy link
Collaborator

@powercode powercode commented Aug 29, 2018

PR Summary

A join-object cmdlet that joins pipeline input to text.

#6697

As requested by @BrucePay.

Final syntax of the cmdlet:

Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]

Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-SingleQuote] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]

Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-DoubleQuote] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]

Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-FormatString <string>] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]
1..3 | Join-String -OutputPrefix "A " -OutputSuffix " B" -Separator "," -SingleQuote | Should -BeExactly "A '1','2','3' B" 
gci | Join-String Name -Separator "," -DoubleQuote
gci | Join-String {$_.Name * 2} "; "

PR Checklist

@iSazonov
Copy link
Collaborator

iSazonov commented Aug 30, 2018

'Object' in the cmdlet name implies that we can join objects of any type - array + array, array + hash, hash + hash, strings and so on.

This cmdlet is more like ConvertTo-String

@powercode
Copy link
Collaborator Author

It is intended to be the pipeline equivalent of the -join operator. @BrucePay suggested the name Join-Object in the issue. Convert would imply for me that it converted each item to a string. This cmdlet join them together with a delimiter.

@RichardSiddaway
Copy link

If its going to be the pipeline equivalent of -join wouldn't Join-String be a more descriptive name. Shouldn't there also be a Split-? cmdlet to match the -split operator

@powercode
Copy link
Collaborator Author

It depends. It can join objects, or the properties of objects.

The output is a string, but it works with object input. Just like Group-Object and Sort-Object.

/// </summary>
[Cmdlet(VerbsCommon.Join, "Object", RemotingCapability = RemotingCapability.None, DefaultParameterSetName = "default")]
[OutputType(typeof(string))]
public class JoinObjectCommand : PSCmdlet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be public sealed class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

public class JoinObjectCommand : PSCmdlet
{
// ReSharper disable once CollectionNeverQueried.Local
private readonly List<PSObject> _inputObjects = new List<PSObject>(50);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally don't see List size initialization unless the size is known. I am curious why you initialize the size to this value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is cheaper to allocate a list that is a little larger than to reallocate and copy when out-grown.
But the size can always be argued.

When I profile memory allocations, Array allocations on List.Resize is not uncommon, and when I see them, I try to find a sensible default size do get rid of the most common once.

namespace Microsoft.PowerShell.Commands.Utility
{
/// <summary>
/// Group-Object implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Join-Object implementation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops :)


if (PropertyName == null)
{
if (_inputObjects.Count > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just check count once and early out or just skip the if/else processing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much better!

public TypeInferenceContext()
: this(PowerShell.Create(RunspaceMode.CurrentRunspace))
{
_ownsPowerShell = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right to me. This class can own PowerShell/Runspace only if PowerShell is created with Runspace.NewRunspace mode. Otherwise you are disposing a runspace created by someone else (and possibly the thread default runspace).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting the whole file.

@powercode
Copy link
Collaborator Author

@PaulHigin Thx for the review.

CommandAst commandAst,
IDictionary fakeBoundParameters)
{
var res = new List<CompletionResult>(10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be static?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you end up with races if used from multiple runspaces.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list looks as a const for current session. Do I skip something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is captured by the local function AddMatching.
So the result is depending on wordToComplete.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. :-) It is Friday. I can not catch up with my idea :-)
I am looking at 'new CompletionResult()'. If they isn't modifed later we could make it static and only reference in target List?

/// </summary>
[Parameter(Position = 0)]
[ArgumentCompleter(typeof(PropertyNameCompleter))]
public object PropertyName { get; set; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use PSPropertyExpression type like in #6934?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PropertyExpression has some extra semantics, handling hashtables like @{n="x";ex={2,3,4}}.

I looked at it, but was not convinced it was a fit. But there's definitely an overlap. Input here is most welcome.

case "\r": return "`r";
case "\n": return "`n";
case "\r\n": return "`r`n";
default: return Environment.NewLine.Replace("\r", "`r").Replace("\n", "`n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the switch is superfluous. We could leave only
``'c#
return Environment.NewLine.Replace("\r", "r").Replace("\n", "n");

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could. I wrote it the way I did to reduce allocated objects. May not be worth it.
The string literals will be loaded from metadata and interned. There will only exist one instance of them. The default line will create two new strings each time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment.NewLine is static. So I think we should use the usual method

public static string NewLineText
{
    get
    {
#if UNIX
        return "`n"
#else
        return "`r`n"
#endif
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think MacOS uses `r

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only very old MacOs versions. See my comment below.

Copy link
Collaborator Author

@powercode powercode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question is how we should handle null and AutomationNull.Value in the input?

CommandAst commandAst,
IDictionary fakeBoundParameters)
{
var res = new List<CompletionResult>(10);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is captured by the local function AddMatching.
So the result is depending on wordToComplete.

@iSazonov
Copy link
Collaborator

AutomationNull.Value is ignored by default in pipes

/// AutomationNull.Value is ignored

Ex.:

$null,$null | % { 1 }
1
1

[System.Management.Automation.Internal.AutomationNull]::Value,[System.Management.Automation.Internal.AutomationNull]::Value | % { 1 }

@powercode
Copy link
Collaborator Author

Should it be named Join-String instead?

Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vexx32
Copy link
Collaborator

vexx32 commented Aug 31, 2018

I'd be inclined to call it Join-String instead, since nothing of the original objects are preserved (counter to things like Group-Object or Select-Object)

@anmenaga
Copy link

anmenaga commented Sep 5, 2018

I restarted failed test runs; we'll see how they go...

@anmenaga
Copy link

anmenaga commented Sep 5, 2018

@BrucePay can you please take a look again at this PR? Thanks.


/// <summary>
/// Gets or sets a format string that is applied to each input object.
/// </summary>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expand the comment with what is the format and/or add a link on docs.

[OutputType(typeof(string))]
public sealed class JoinStringCommand : PSCmdlet
{
/// <summary>A bigger default to not get re-allocations in common use cases.</summary>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use file pattern:

/// <Summary>
/// A bigger default to not get re-allocations in common use cases.
/// </Summary> 

{
_outputBuilder.Append(_quoteChar);
_outputBuilder.Append(stringValue);
_outputBuilder.Append(_quoteChar);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use

_outputBuilder.Append(_quoteChar).Append(stringValue).(_quoteChar); 

}
else
{
_outputBuilder.AppendFormat(CultureInfo.CurrentCulture, FormatString, stringValue);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to get confirmation that this should be CurrentCulture not InvariantCulture.
And maybe discuss -FormatCulture parameter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Now sure what is right here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PowerShell consistently uses the invariant culture in to/from-string conversions, so it should probably be InvariantCulture, at least by default, perhaps with an optional -UseCulture switch (analogous to the *-Csv cmdlets).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a guideline, we use InvariantCulture when manipulating data so that the behaviour of the code is predictable (invariant) across locales and we use current culture when presenting to the user. This cmdlet could be used in both scenarios but I expect it will be used more for data processing so I suggest defaulting to InvariantCulture with an option to override for current culture.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding UseCulture switch as suggested.

Copy link
Collaborator

@iSazonov iSazonov Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 gcm -ParameterName Culture

CommandType     Name                                               Version    Source
-----------     ----                                               -------    ------
Cmdlet          Compare-Object                                     3.1.0.0    Microsoft.PowerShell.Utility
Cmdlet          Group-Object                                       3.1.0.0    Microsoft.PowerShell.Utility
Cmdlet          New-PSSessionOption                                6.0.0.0    Microsoft.PowerShell.Core
Cmdlet          Sort-Object                                        3.1.0.0    Microsoft.PowerShell.Utility

We already have -Culture parameter. Make sense re-use the name?
Or -UseCurrentCulture? Looks as limited option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov:

I think -UseCulture is the correct name, after all:

  • The -Culture <string> parameters you reference:

    • require an argument, namely a given culture
    • are used with cmdlets that are current-culture-sensitive by default, to allow opt-in to a different culture.
  • The -UseCulture switch, by contrast:

    • is used with cmdlets that are culture-invariant by default, to allow opt-in to the current culture (currently applies only to the *-Csv cmdlets).

The latter is what is being implemented here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mklement0 Thanks! We need discuss Select-String too.

get
{
#if UNIX
return Platform.IsMacOS ? "`r" : "`n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment.


It "Should be called using an object as piped without error with no switches" {
{$testObject | Join-String } | Should -Not -Throw
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove the test - any test below do the test.

Copy link
Collaborator

@BrucePay BrucePay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anmenaga
Copy link

@powercode Looks like this PR still needs a couple of minor updates that are mentioned in @iSazonov's review; especially InvariantCulture issue. Otherwise it seems close to being done.

@powercode
Copy link
Collaborator Author

Sorry - been a bit preoccupied.

@powercode
Copy link
Collaborator Author

powercode commented Oct 27, 2018

I just realized that when joining dates, I always get the "US", Invariant formatting, which is literally never the output a Swede would want.

To get the UseCulture switch to have effect, I need to do some minor changes in how LanguagePrimitives makes the conversion to strings, and provide an overload that accepts an IFormatProvider.

I will add a new commit with this change - feel free to ignore it if you don't think it should go with this PR.

@daxian-dbw
Copy link
Member

The last commit ("Make TryConvertTo<string> respect the passed formatting provider.") looks good to me.

@daxian-dbw daxian-dbw added Documentation Needed in this repo Documentation is needed in this repo CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log labels Nov 14, 2018
@daxian-dbw daxian-dbw merged commit 877b9a9 into PowerShell:master Nov 14, 2018
@iSazonov
Copy link
Collaborator

What is current policy: don't merge until doc issue is created?

iSazonov pushed a commit to iSazonov/PowerShell that referenced this pull request Nov 29, 2018
…Shell#7660)

The cmdlet syntax is as follows:
```
Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]

Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-SingleQuote] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]

Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-DoubleQuote] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]

Join-String [[-Property] <pspropertyexpression>] [[-Separator] <string>] [-OutputPrefix <string>] [-OutputSuffix <string>] [-FormatString <string>] [-UseCulture] [-InputObject <psobject>] [<CommonParameters>]
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Documentation Needed in this repo Documentation is needed in this repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.