Skip to content

[DPL Analysis] [Event mixing] Introducing Binning Policy for block combinations#8299

Merged
ktf merged 12 commits intoAliceO2Group:devfrom
saganatt:binning-policy-base
Mar 15, 2022
Merged

[DPL Analysis] [Event mixing] Introducing Binning Policy for block combinations#8299
ktf merged 12 commits intoAliceO2Group:devfrom
saganatt:binning-policy-base

Conversation

@saganatt
Copy link
Copy Markdown
Collaborator

@saganatt saganatt commented Mar 9, 2022

Ciao @ktf,

The BinningPolicy classes replace the special hash task / table for block combinations.

Sorry for a huge PR, but I needed to modify in tests all lines that call block combinations, otherwise it won't compile.
The core changes are: new BinningPolicy.h and rewritten groupTable() in ASoAHelpers.h.

@saganatt saganatt requested a review from a team as a code owner March 9, 2022 13:41
@saganatt
Copy link
Copy Markdown
Collaborator Author

saganatt commented Mar 9, 2022

Hi @jgrosseo,

in the next PR I will adjust event mixing interface. With the changes introduced here, event mixing will look like this:

std::vector<double> zBins{VARIABLE_WIDTH, -7.0, -5.0, -3.0, -1.0, 1.0, 3.0, 5.0, 7.0};
std::vector<double> yBins{VARIABLE_WIDTH, 0, 5, 10, 20, 30, 40, 50, 101};
PairBinningPolicy<aod::cent::CentEstV0M, aod::collision::PosZ> pairBinning{yBins, zBins};
SameKindPair<aod::Collisions, aod::Tracks> pair{pairBinning, 5, -1};
void process (soa::Join<aod::Collisions, aod::CentV0Ms>>& collisions, aodTracks const& tracks) {
    for (auto& [c1, tracks1, c2, tracks2] : pair) { ... }
}

"pair binning" means "calculate bins based on 2 columns". Similarly, there is SingleBinning and TripleBinning. We can think about a better naming, if it is confusing.

What do you think about this approach?

auto binningColumns = binningPolicy.getColumns();
auto arrowColumns = o2::framework::binning_helpers::getArrowColumns(arrowTable, binningColumns);
auto chunksCount = arrowColumns[0]->num_chunks();
// TODO: Are such checks needed or can we safely assume chunks are always the same?
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktf?
If I don't need to check the alignments, then I can get rid of getArrowColumns() and getChunks() altogether.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure arrow arrows mixing chunks sizes / number of chunks. I would not assume they are necessarily the same (although it might be in our case so far).

template <typename... Cs>
std::array<arrow::ChunkedArray*, sizeof...(Cs)> getArrowColumns(arrow::Table* table, pack<Cs...>)
{
static_assert(std::conjunction_v<typename Cs::persistent...>, "BinningPolicy: only persistent columns accepted (not dynamic and not index ones");
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important point: one cannot do event mixing / block combinations based on values in dynamic columns. Simply because they do not exist as real columns.

@jgrosseo
Copy link
Copy Markdown
Collaborator

Thanks for this work! The interface looks good to me. I let @ktf comment on the technicalities.

About the naming. So we cannot have just BinningPolicy with a different number of template arguments? Then maybe simply BinningPolicy1 BinningPolicy2 ... ?

@jgrosseo
Copy link
Copy Markdown
Collaborator

It breaks the compilation of O2Physics so we would need to force merge both in parallel. Is there an easy trick to avoid this?

@saganatt
Copy link
Copy Markdown
Collaborator Author

About the naming. So we cannot have just BinningPolicy with a different number of template arguments? Then maybe simply BinningPolicy1 BinningPolicy2 ... ?

BinningPolicyBase has a variable number of template arguments. I created Single, Pair, Triple policies so as to override getBin(). Perhaps this function could be generalized to any number of input columns, but I would need more time to think how.

On the other hand, are there many (any?) use cases of mixing based on 4 or more event properties?

@jgrosseo
Copy link
Copy Markdown
Collaborator

On the other hand, are there many (any?) use cases of mixing based on 4 or more event properties?

No, 1-3 is sufficient.

@saganatt
Copy link
Copy Markdown
Collaborator Author

It breaks the compilation of O2Physics so we would need to force merge both in parallel. Is there an easy trick to avoid this?

Ah, right. I changed the selfCombinations parameters. I can write a temporary binning class for backward compatibility but perhaps this is not worthy the effort? I see it is used in 4 places, in connection with event mixing. With my next PR that will need to change anyways.

@saganatt
Copy link
Copy Markdown
Collaborator Author

I added the backward compability.

About the naming. So we cannot have just BinningPolicy with a different number of template arguments? Then maybe simply BinningPolicy1 BinningPolicy2 ... ?

I implemented a single binning class. But I am curious whether this code looks good (wrt performance) to @ktf ? Or whether old implementation with multiple classes was better?

I also commented lines that caused the problem in macOS, as I am not sure we will need them at the end (see my review comments above, ASoAHelpers.h:88).

Comment on lines +241 to +257
{
template <typename T>
struct is_string {
static const bool value = false;
};

template <class T, class Traits, class Alloc>
struct is_string<std::basic_string<T, Traits, Alloc>> {
static const bool value = true;
};

template <std::size_t N>
struct is_string<char[N]> {
static const bool value = true;
};
} // namespace old_interface

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just use is_specialization, no?


#include "Framework/ASoA.h"
#include "Framework/BinningPolicy.h"
#include "Framework/Logger.h"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, no Logger.h here.

uint64_t ind = 0;
uint64_t selInd = 0;
gsl::span<int64_t const> selectedRows;
std::vector<std::pair<uint64_t, uint64_t>> groupedIndices;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a struct rather than a pair.

Copy link
Copy Markdown
Member

@ktf ktf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid the changes are too large for me to review them thoroughly, but the idea looks fine, apart from a few nitpicks that can be fixed later (please make sure you drop Logger.h, in particular). Assuming you tested them and in particular you made sure grouping still works, I am fine with them.

@ktf ktf merged commit 9f30bb8 into AliceO2Group:dev Mar 15, 2022
ktf pushed a commit that referenced this pull request Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants