Conversation
29a20c2 to
c3ccc6c
Compare
| int dim = ndims - hist_dim; | ||
|
|
||
| // Starting from inner dimension, look if we should reduce this axis | ||
| // If we can do so until we can collapse next dimention. | ||
| auto raxis = reduction_axes.rbegin(); | ||
|
|
||
| for (; raxis != reduction_axes.rend(); ++raxis) { | ||
| // TODO: Handle multidimentional histograms correctly | ||
| if (dim != *raxis) { | ||
| break; | ||
| } | ||
| --dim; | ||
| } | ||
|
|
||
| // If there's no reduction axes left, we won't need transpose any axis. | ||
| if (raxis == reduction_axes.rend()) | ||
| return true; | ||
|
|
||
| return false; |
There was a problem hiding this comment.
This seems a bit overcomplicated - isn't it equivalent to the following?
| int dim = ndims - hist_dim; | |
| // Starting from inner dimension, look if we should reduce this axis | |
| // If we can do so until we can collapse next dimention. | |
| auto raxis = reduction_axes.rbegin(); | |
| for (; raxis != reduction_axes.rend(); ++raxis) { | |
| // TODO: Handle multidimentional histograms correctly | |
| if (dim != *raxis) { | |
| break; | |
| } | |
| --dim; | |
| } | |
| // If there's no reduction axes left, we won't need transpose any axis. | |
| if (raxis == reduction_axes.rend()) | |
| return true; | |
| return false; | |
| if (reduction_axes.size() != ndims - hist_dims) | |
| return false; // not all dimensions reduced | |
| for (int i = 0; i < ndims - hist_dims; i++) | |
| if (reduction_axes[i] != i) | |
| return false; | |
| } | |
| return true |
|
|
||
| SmallVector<int, 6> axes_order; | ||
| axes_order.reserve(ndims); | ||
| for (int axis : non_rediction_axes) { |
There was a problem hiding this comment.
| for (int axis : non_rediction_axes) { | |
| for (int axis : non_reduction_axes) { |
| const int ndims = input_shapes.sample_dim(); | ||
|
|
||
| auto shape_span | ||
| auto non_rediction_axes = GetNonReductionAxes(ndims); |
There was a problem hiding this comment.
| auto non_rediction_axes = GetNonReductionAxes(ndims); | |
| auto non_reduction_axes = GetNonReductionAxes(ndims); |
However, how about using bit masks for reduced axes? This would greatly simplify the code. You'd create a mask once and then you'd just check if given axis is marked as reduced or not. We even have a bitmask class, so there's no need to manually shift/mask stuff.
| return false; | ||
| } | ||
|
|
||
| // Collapses all inner dimensions |
There was a problem hiding this comment.
Can't we also collapse all outer dimensions?
After rearranging the axes so that we have
non-reduced reduced [channel]
We can then collapse (or expand!) each of these groups, so that we have exactly one non-reduced dimension, one reduced dimension and one channel dimension.
If it happens, that one of these groups is empty, we can always insert a phony axis with extent 1.
c3ccc6c to
efcc9e3
Compare
| int hist_dim = 1) { | ||
| int dim = ndims - hist_dim; | ||
|
|
||
| DALI_ENFORCE(dim >= 1); |
There was a problem hiding this comment.
I don't think simple queries like that should throw. Passing hist_dim < 1 constitutes a coding error. Since this function is internal, assert should be enough.
efcc9e3 to
f7d86d1
Compare
|
CI MESSAGE: [3840090]: BUILD STARTED |
a399a60 to
d9c477f
Compare
|
|
||
| TensorListShape<ret_ndim> result(nshapes, dyn_out_ndim); | ||
|
|
||
| if (out_ndim == 0) { |
There was a problem hiding this comment.
| if (out_ndim == 0) { | |
| if (dyn_out_ndim == 0) { |
|
|
||
| const int dyn_out_ndim = shape.size() - 1; | ||
|
|
||
| int nshapes = shape[0]; |
There was a problem hiding this comment.
| int nshapes = shape[0]; | |
| int outer_extent = shape[0]; |
or
| int nshapes = shape[0]; | |
| int nsamples = shape[0]; |
| } | ||
|
|
||
| void HistogramCPU::PrepareInputShapesForTranspose(const TensorListShape<> &input_shapes, | ||
| const SmallVector<int, 6> &non_reduction_axes) { |
There was a problem hiding this comment.
I think it would be easier to work with a bit mask or a set.
8ca05b9 to
d221251
Compare
2ee5046 to
606fafb
Compare
| ~HistogramCPU() override = default; | ||
|
|
||
| private: | ||
| int VerifyRangeArguments(const workspace_t<CPUBackend> &ws, int num_samples); |
There was a problem hiding this comment.
Grammar Nazi: Verify -> Validate
| auto lo_view = view<const float>(ranges_lo); | ||
| auto hi_view = view<const float>(ranges_hi); | ||
|
|
||
| int hist_dim = ranges_lo.num_samples() / num_samples; |
There was a problem hiding this comment.
Perhaps it would be better to enforce exact shape (scalar when there are is no channel dimension and [num_channels] otherwise).
| int HistogramCPU::VerifyNonUniformRangeArguments(const workspace_t<CPUBackend> &ws, int num_samples) { | ||
| assert(!uniform_); | ||
|
|
||
| DALI_ENFORCE(ws.NumInput() % 2 == 1, "Should have both ranges"); // FIXME message |
Signed-off-by: Piotr Rak <[email protected]>
| @@ -0,0 +1,18 @@ | |||
| # Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
| # Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
To be precise:
| # Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. |
or
| # Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2019, 2022, NVIDIA CORPORATION. All rights reserved. |
| # Get all the source files and dump test files | ||
| collect_headers(DALI_INST_HDRS PARENT_SCOPE) | ||
| collect_sources(DALI_OPERATOR_SRCS PARENT_SCOPE) | ||
| collect_test_sources(DALI_OPERATOR_TEST_SRCS PARENT_SCOPE) No newline at end of file |
There was a problem hiding this comment.
nitpick: missing empty line at the end
| @@ -0,0 +1,102 @@ | |||
| // Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
| // Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| @@ -0,0 +1,60 @@ | |||
| // Copyright (c) 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
| // Copyright (c) 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // Copyright (c) 2020-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
|
|
||
| #define DALI_SCHEMA_REG(OpName) \ | ||
| int DALI_OPERATOR_SCHEMA_REQUIRED_FOR_##OpName() { \ | ||
| int CONCAT_2(DALI_OPERATOR_SCHEMA_REQUIRED_FOR_, OpName)() { \ |
There was a problem hiding this comment.
Just a note. In general, I think it's best to keep unrelated changes separate (separate small PR)
| template <> | ||
| struct CVMatType<uint8_t> { | ||
| static int get(int nchannel) noexcept { | ||
| return CV_MAKETYPE(CV_8U, nchannel); | ||
| } | ||
| }; |
There was a problem hiding this comment.
The struct seems unnecessary. How about:
template <typename Ty_>
int CVMatType(int nchannel) {
return CV_MAKETYPE(CV_8U, nchannel);
}
| std::vector<Type *> tmp_pointers; | ||
| tmp_pointers.reserve(transposed_shapes.num_samples()); | ||
|
|
||
| for (int i = 0; i < transposed_shapes.num_samples(); ++i) { | ||
| auto tmp = scratch.template AllocTensor<mm::memory_kind::host, Type>(transposed_shapes[i]); | ||
| tmp_pointers.push_back(tmp.data); | ||
| } | ||
|
|
||
| TensorListView<StorageCPU, Type> transpose_out_view(std::move(tmp_pointers), | ||
| std::move(transposed_shapes)); |
There was a problem hiding this comment.
| std::vector<Type *> tmp_pointers; | |
| tmp_pointers.reserve(transposed_shapes.num_samples()); | |
| for (int i = 0; i < transposed_shapes.num_samples(); ++i) { | |
| auto tmp = scratch.template AllocTensor<mm::memory_kind::host, Type>(transposed_shapes[i]); | |
| tmp_pointers.push_back(tmp.data); | |
| } | |
| TensorListView<StorageCPU, Type> transpose_out_view(std::move(tmp_pointers), | |
| std::move(transposed_shapes)); | |
| auto transpose_out_view = scratch.template AllocTensorList<mm::memory_kind::host, Type>(std::move(transposed_shapes)); | |
| } |
| template <typename Type, typename ScratchAlloc, typename Coll> | ||
| TensorListView<StorageCPU, const Type> transpose_view( | ||
| dali::ThreadPool &thread_pool, ScratchAlloc &scratch, | ||
| const TensorListView<StorageCPU, const Type> &in_view, const Coll &transpose_axes_order) { |
There was a problem hiding this comment.
| template <typename Type, typename ScratchAlloc, typename Coll> | |
| TensorListView<StorageCPU, const Type> transpose_view( | |
| dali::ThreadPool &thread_pool, ScratchAlloc &scratch, | |
| const TensorListView<StorageCPU, const Type> &in_view, const Coll &transpose_axes_order) { | |
| template <typename Type> | |
| TensorListView<StorageCPU, const Type> transpose_view( | |
| dali::ThreadPool &thread_pool, Scratchpad &scratch, | |
| const TensorListView<StorageCPU, const Type> &in_view, span<const int> transpose_axes_order) { |
Scratchpad is an interface. Also, I think we could use span for the axes
| for (int i = 0; i < transpose_out_view.num_samples(); ++i) { | ||
| thread_pool.AddWork([&, i](int thread_id) { | ||
| auto perm = make_span(transpose_axes_order); | ||
| kernels::Transpose(transpose_out_view[i], in_view[i], perm); |
There was a problem hiding this comment.
maybe kernels::TransposeGrouped as it can automatically simplify the shapes when applicable?
| }); | ||
| } | ||
| thread_pool.RunAll(true); | ||
| return reinterpret<const Type>(transpose_out_view, transpose_out_view.shape); |
There was a problem hiding this comment.
| return reinterpret<const Type>(transpose_out_view, transpose_out_view.shape); | |
| return transpose_out_view; |
would do, I think
There was a problem hiding this comment.
Yeah, the conversion to const view is implicit. Reinterpret is a heavy hammer - like with a cast, it will hide any possible errors until run-time.
| using namespace dali; | ||
| using namespace dali::hist_detail; | ||
|
|
||
| #define id_(x) x |
| auto range_view = view<const float>(dim_ranges); | ||
| for (int i = 0; i < range_view[sample].num_elements(); ++i) { | ||
| ranges.push_back(range_view.tensor_data(sample)[i]); |
There was a problem hiding this comment.
I'd recommend hoisting the sample access to the outer loop.
Also, num_elements() is not trivial and the compiler might fail to identify it as a loop invariant.
| auto range_view = view<const float>(dim_ranges); | |
| for (int i = 0; i < range_view[sample].num_elements(); ++i) { | |
| ranges.push_back(range_view.tensor_data(sample)[i]); | |
| auto range_view = view<const float>(dim_ranges)[sample]; | |
| for (int i = 0, n = range_view.num_elements(); i < n; ++i) { | |
| ranges.push_back(range_view.data[i]); |
| template <typename Ty_> | ||
| struct CVMatType { | ||
| static int get(int) { | ||
| DALI_ENFORCE(false, "Unreachable - invalid type"); | ||
| } | ||
| }; |
There was a problem hiding this comment.
How about getting rid of the specializations and going with:
| template <typename Ty_> | |
| struct CVMatType { | |
| static int get(int) { | |
| DALI_ENFORCE(false, "Unreachable - invalid type"); | |
| } | |
| }; | |
| template <typename ChannelType> | |
| struct CVMatType { | |
| static int get(int nchannels) { | |
| return CV_MAKETYPE(cv::DataDepth<ChannelType>>::type, nchannels); | |
| } | |
| }; |
?
Also, the trailing underscore in template arguments is not a coding style we use.
| if (is_identity_) { | ||
| auto out_view_id = view<Type>(output); | ||
| run_identity<Type>(thread_pool, in_view, out_view_id); | ||
| return; |
There was a problem hiding this comment.
indentation is slightly off
| std::vector<cv::Mat> images = { | ||
| cv::Mat(1, in_sizes.data(), in_type, splited_in_views[i].data)}; | ||
|
|
||
| cv::InputArray input_mat(images); | ||
|
|
There was a problem hiding this comment.
I don't think you need an cv::InputArray. You can pass cv::Mat to cv::calcHist
| TensorView<StorageCPU, float> hist_view(hist_data, splited_out_views[i].shape); | ||
| kernels::copy(splited_out_views[i], hist_view); |
There was a problem hiding this comment.
Indentation is off in those two lines
| ret.resize(num_samples, hist_dim_); | ||
|
|
||
| for (int i = 0; i < num_samples; ++i) { | ||
| TensorShape<> bin_shape{make_span(batch_bins_[i].data(), hist_dim_)}; |
There was a problem hiding this comment.
| TensorShape<> bin_shape{make_span(batch_bins_[i].data(), hist_dim_)}; | |
| TensorShape<> bin_shape{make_span(batch_bins_[i])}; |
| split_mapping_ = std::move(split_mapping); | ||
| splited_input_shapes_ = std::move(splited_input_shapes); | ||
| splited_output_shapes_ = std::move(splited_output_shapes); |
There was a problem hiding this comment.
Why not operating directly on those? Avoiding unnecessary allocations
| .AddParent("HistogramBase"); | ||
|
|
||
| DALI_SCHEMA(UniformHistogramOpName) | ||
| .DocStr(R"code(Calculates 1D or ND histogram of the input tensor with uniform histogram bin ranges. |
There was a problem hiding this comment.
| .DocStr(R"code(Calculates 1D or ND histogram of the input tensor with uniform histogram bin ranges. | |
| .DocStr(R"code(Calculates 1D or ND histogram of with uniform histogram bin ranges. |
| Calculates histogram of of uniform bin ranges, second input tensor specifies lower bound of range of values and third argument specfies | ||
| upper range of values in each histogram dimension. | ||
|
|
||
| For example lower range (2nd tensor argument) ``[0]`` and upper range (3rd tensor argument) ``[255]`` and ``num_bins=[16]`` will calculate histogram | ||
| with 16 bins of uniformly subdivided in range ``[0, 255]`` |
There was a problem hiding this comment.
| Calculates histogram of of uniform bin ranges, second input tensor specifies lower bound of range of values and third argument specfies | |
| upper range of values in each histogram dimension. | |
| For example lower range (2nd tensor argument) ``[0]`` and upper range (3rd tensor argument) ``[255]`` and ``num_bins=[16]`` will calculate histogram | |
| with 16 bins of uniformly subdivided in range ``[0, 255]`` | |
| This operators calculates ``num_bins`` uniform ranges covering the range defined by the lower bound and upper bound defined by positional inputs. | |
| For example, a lower range (2nd positional input) ``[0]`` and an upper range (3rd positional input) ``[255]``, together with a ``num_bins=16`` will calculate a histogram with 16 uniform bins in the range ``[0, 255]``. |
| The histogram of the input tensor when ``channel_axis`` or ``channel_axis_name`` is specified is calculated as multidimensional | ||
| histogram ie. as if histogram would be calculated for each seperate channel of this axis. | ||
| If channel axis is not specified 1D histogram is calculated. | ||
| Current implentation supports up to 32 channels for histogram calculation. | ||
|
|
||
| Histogram calculation supports specifing arbitrary axes of reduction. | ||
|
|
||
| For example for tensor with layout "HWC" one could calculate different single and multidimentional histograms. |
There was a problem hiding this comment.
I posted some comments in the other occurrence of this text. I think those comments apply here as well
| bool IsIdentityTransform() const { | ||
| return is_identity_; | ||
| } | ||
| bool IsSimpleReduction1() const; |
| std::vector<std::vector<float>> batch_ranges_; | ||
| std::vector<SmallVector<int, 3>> batch_bins_; | ||
| ArgValue<int, 1> param_num_bins_; | ||
| kernels::ScratchpadAllocator transpose_mem_; |
There was a problem hiding this comment.
| kernels::ScratchpadAllocator transpose_mem_; | |
| kernels::DynamicScratchpad transpose_mem_; |
| for ret_sz, expected_sz in zip(out.shape(), sz.as_array()): | ||
| assert(ret_sz == tuple(expected_sz)) | ||
|
|
||
| def test_uniform_hist_args(): |
There was a problem hiding this comment.
instead of running several tests in one function, I'd suggest to create independent functions for each test. This way when a testcase fails, we can see exactly which one.
| } | ||
|
|
||
| split_mapping_ = std::move(split_mapping); | ||
| splited_input_shapes_ = std::move(splited_input_shapes); |
There was a problem hiding this comment.
"Split" is an irregular verb. split/split/split.
| splited_input_shapes_ = std::move(splited_input_shapes); | |
| split_input_shapes_ = std::move(split_input_shapes); |
Signed-off-by: Piotr Rak [email protected]
Description
What happened in this PR
Additional information
Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A