Skip to content

Commit cfcce5a

Browse files
dragosmgjonkeane
authored andcommitted
ARROW-14844: [R] Implement decimal256()
@jonkeane & @romainfrancois this is the 2nd attempt at implementing `decimal256()`. First one is apache#11805 Closes apache#11898 from dragosmg/ARROW-14844_decimal256_take2 Lead-authored-by: Dragos Moldovan-Grünfeld <dragos.mold@gmail.com> Co-authored-by: Dragoș Moldovan-Grünfeld <dragos.mold@gmail.com> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
1 parent 281dee5 commit cfcce5a

14 files changed

Lines changed: 237 additions & 48 deletions

r/NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,7 @@ export(date32)
215215
export(date64)
216216
export(decimal)
217217
export(decimal128)
218+
export(decimal256)
218219
export(default_memory_pool)
219220
export(dictionary)
220221
export(duration)

r/NEWS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,12 @@
1919

2020
# arrow 6.0.1.9000
2121

22+
* Added `decimal256()`. Updated `decimal()`, which now calls `decimal256()` or `decimal128()` based on the value of the `precision` argument.
2223
* updated `write_csv_arrow()` to follow the signature of `readr::write_csv()`. The following arguments are supported:
2324
* `file` identical to `sink`
2425
* `col_names` identical to `include_header`
2526
* other arguments are currently unsupported, but the function errors with a meaningful message.
26-
* Added `decimal128()` (identical to `decimal()`) as the name is more explicit and updated docs to encourage its use.
27+
* Added `decimal128()` (~~identical to `decimal()`~~) as the name is more explicit and updated docs to encourage its use.
2728
* Source builds now by default use `pkg-config` to search for system dependencies (such as `libz`) and link to them
2829
if present. To retain the previous behaviour of downloading and building all dependencies, set `ARROW_DEPENDENCY_SOURCE=BUNDLED`.
2930

r/R/array.R

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,30 @@ Array$create <- function(x, type = NULL) {
187187
}
188188
return(out)
189189
}
190-
vec_to_Array(x, type)
190+
191+
if (is.null(type)) {
192+
return(vec_to_Array(x, type))
193+
}
194+
195+
# when a type is given, try to create a vector of the desired type. If that
196+
# fails, attempt to cast and if casting is successful, suggest to the user
197+
# to try casting manually. If the casting fails, return the original error
198+
# message.
199+
tryCatch(
200+
vec_to_Array(x, type),
201+
error = function(cnd) {
202+
attempt <- try(vec_to_Array(x, NULL)$cast(type), silent = TRUE)
203+
abort(
204+
c(conditionMessage(cnd),
205+
i = if (!inherits(attempt, "try-error")) {
206+
"You might want to try casting manually with `Array$create(...)$cast(...)`."
207+
}
208+
)
209+
)
210+
}
211+
)
191212
}
213+
192214
#' @include arrowExports.R
193215
Array$import_from_c <- ImportArray
194216

r/R/arrowExports.R

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

r/R/type.R

Lines changed: 55 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -157,8 +157,11 @@ DecimalType <- R6Class("DecimalType",
157157
scale = function() DecimalType__scale(self)
158158
)
159159
)
160+
160161
Decimal128Type <- R6Class("Decimal128Type", inherit = DecimalType)
161162

163+
Decimal256Type <- R6Class("Decimal256Type", inherit = DecimalType)
164+
162165
NestedType <- R6Class("NestedType", inherit = DataType)
163166

164167
#' Apache Arrow data types
@@ -188,7 +191,7 @@ NestedType <- R6Class("NestedType", inherit = DataType)
188191
#' `bit64::integer64` object) by setting `options(arrow.int64_downcast =
189192
#' FALSE)`.
190193
#'
191-
#' `decimal128()` creates a `decimal128` type. Arrow decimals are fixed-point
194+
#' `decimal128()` creates a `Decimal128Type`. Arrow decimals are fixed-point
192195
#' decimal numbers encoded as a scalar integer. The `precision` is the number of
193196
#' significant digits that the decimal type can represent; the `scale` is the
194197
#' number of digits after the decimal point. For example, the number 1234.567
@@ -204,21 +207,30 @@ NestedType <- R6Class("NestedType", inherit = DataType)
204207
#' negative, `scale` causes the number to be expressed using scientific notation
205208
#' and power of 10.
206209
#'
207-
#' `decimal()` is identical to `decimal128()`, defined for backward compatibility.
208-
#' Use `decimal128()` as the name is more informative and `decimal()` might be
209-
#' deprecated in the future.
210+
#' `decimal256()` creates a `Decimal256Type`, which allows for higher maximum
211+
#' precision. For most use cases, the maximum precision offered by `Decimal128Type`
212+
#' is sufficient, and it will result in a more compact and more efficient encoding.
213+
#'
214+
#' #' `decimal()` creates either a `Decimal128Type` or a `Decimal256Type`
215+
#' depending on the value for `precision`. If `precision` is greater than 38 a
216+
#' `Decimal256Type` is returned, otherwise a `Decimal128Type`.
217+
#'
218+
#' Use `decimal128()` or `decimal256()` as the names are more informative than
219+
#' `decimal()`.
210220
#'
211221
#' @param unit For time/timestamp types, the time unit. `time32()` can take
212222
#' either "s" or "ms", while `time64()` can be "us" or "ns". `timestamp()` can
213223
#' take any of those four values.
214224
#' @param timezone For `timestamp()`, an optional time zone string.
215225
#' @param byte_width byte width for `FixedSizeBinary` type.
216226
#' @param list_size list size for `FixedSizeList` type.
217-
#' @param precision For `decimal()`, `decimal128()` the number of significant
218-
#' digits the arrow `decimal` type can represent. The maximum precision for
219-
#' `decimal()` and `decimal128()` is 38 significant digits.
220-
#' @param scale For `decimal()` and `decimal128()`, the number of digits after
221-
#' the decimal point. It can be negative.
227+
#' @param precision For `decimal()`, `decimal128()`, and `decimal256()` the
228+
#' number of significant digits the arrow `decimal` type can represent. The
229+
#' maximum precision for `decimal128()` is 38 significant digits, while for
230+
#' `decimal256()` it is 76 digits. `decimal()` will use it to choose which
231+
#' type of decimal to return.
232+
#' @param scale For `decimal()`, `decimal128()`, and `decimal256()` the number
233+
#' of digits after the decimal point. It can be negative.
222234
#' @param type For `list_of()`, a data type to make a list-of-type
223235
#' @param ... For `struct()`, a named list of types to define the struct columns
224236
#'
@@ -399,25 +411,49 @@ timestamp <- function(unit = c("s", "ms", "us", "ns"), timezone = "") {
399411
Timestamp__initialize(unit, timezone)
400412
}
401413

414+
#' @rdname data-type
415+
#' @export
416+
decimal <- function(precision, scale) {
417+
args <- check_decimal_args(precision, scale)
418+
419+
if (args$precision > 38) {
420+
decimal256(args$precision, args$scale)
421+
} else {
422+
decimal128(args$precision, args$scale)
423+
}
424+
}
425+
402426
#' @rdname data-type
403427
#' @export
404428
decimal128 <- function(precision, scale) {
429+
args <- check_decimal_args(precision, scale)
430+
Decimal128Type__initialize(args$precision, args$scale)
431+
}
432+
433+
#' @rdname data-type
434+
#' @export
435+
decimal256 <- function(precision, scale) {
436+
args <- check_decimal_args(precision, scale)
437+
Decimal256Type__initialize(args$precision, args$scale)
438+
}
439+
440+
check_decimal_args <- function(precision, scale) {
405441
if (is.numeric(precision)) {
406-
precision <- as.integer(precision)
442+
precision <- vec_cast(precision, to = integer())
443+
vctrs::vec_assert(precision, size = 1L)
407444
} else {
408-
stop('"precision" must be an integer', call. = FALSE)
445+
stop("`precision` must be an integer", call. = FALSE)
409446
}
447+
410448
if (is.numeric(scale)) {
411-
scale <- as.integer(scale)
449+
scale <- vec_cast(scale, to = integer())
450+
vctrs::vec_assert(scale, size = 1L)
412451
} else {
413-
stop('"scale" must be an integer', call. = FALSE)
452+
stop("`scale` must be an integer", call. = FALSE)
414453
}
415-
Decimal128Type__initialize(precision, scale)
416-
}
417454

418-
#' @rdname data-type
419-
#' @export
420-
decimal <- decimal128
455+
list(precision = precision, scale = scale)
456+
}
421457

422458
StructType <- R6Class("StructType",
423459
inherit = NestedType,
@@ -520,6 +556,7 @@ canonical_type_str <- function(type_str) {
520556
null = "null",
521557
timestamp = "timestamp",
522558
decimal128 = "decimal128",
559+
decimal256 = "decimal256",
523560
struct = "struct",
524561
list_of = "list",
525562
list = "list",

r/man/data-type.Rd

Lines changed: 23 additions & 11 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

r/src/array_to_vector.cpp

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -960,6 +960,7 @@ class Converter_Timestamp : public Converter_Time<value_type, TimestampType> {
960960
}
961961
};
962962

963+
template <typename Type>
963964
class Converter_Decimal : public Converter {
964965
public:
965966
explicit Converter_Decimal(const std::shared_ptr<ChunkedArray>& chunked_array)
@@ -974,8 +975,9 @@ class Converter_Decimal : public Converter {
974975

975976
Status Ingest_some_nulls(SEXP data, const std::shared_ptr<arrow::Array>& array,
976977
R_xlen_t start, R_xlen_t n, size_t chunk_index) const {
978+
using DecimalArray = typename TypeTraits<Type>::ArrayType;
977979
auto p_data = REAL(data) + start;
978-
const auto& decimals_arr = checked_cast<const arrow::Decimal128Array&>(*array);
980+
const auto& decimals_arr = checked_cast<const DecimalArray&>(*array);
979981

980982
auto ingest_one = [&](R_xlen_t i) {
981983
p_data[i] = std::stod(decimals_arr.FormatValue(i).c_str());
@@ -1275,7 +1277,10 @@ std::shared_ptr<Converter> Converter::Make(
12751277
}
12761278

12771279
case Type::DECIMAL128:
1278-
return std::make_shared<arrow::r::Converter_Decimal>(chunked_array);
1280+
return std::make_shared<arrow::r::Converter_Decimal<Decimal128Type>>(chunked_array);
1281+
1282+
case Type::DECIMAL256:
1283+
return std::make_shared<arrow::r::Converter_Decimal<Decimal256Type>>(chunked_array);
12791284

12801285
// nested
12811286
case Type::STRUCT:
@@ -1303,7 +1308,7 @@ std::shared_ptr<Converter> Converter::Make(
13031308
break;
13041309
}
13051310

1306-
cpp11::stop("cannot handle Array of type ", type->name().c_str());
1311+
cpp11::stop("cannot handle Array of type <%s>", type->name().c_str());
13071312
}
13081313

13091314
std::shared_ptr<ChunkedArray> to_chunks(const std::shared_ptr<Array>& array) {

r/src/arrowExports.cpp

Lines changed: 17 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

r/src/datatype.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ const char* r6_class_name<arrow::DataType>::get(
8484

8585
case Type::DECIMAL128:
8686
return "Decimal128Type";
87+
case Type::DECIMAL256:
88+
return "Decimal256Type";
8789

8890
case Type::LIST:
8991
return "ListType";
@@ -182,6 +184,13 @@ std::shared_ptr<arrow::DataType> Decimal128Type__initialize(int32_t precision,
182184
return ValueOrStop(arrow::Decimal128Type::Make(precision, scale));
183185
}
184186

187+
// [[arrow::export]]
188+
std::shared_ptr<arrow::DataType> Decimal256Type__initialize(int32_t precision,
189+
int32_t scale) {
190+
// Use the builder that validates inputs
191+
return ValueOrStop(arrow::Decimal256Type::Make(precision, scale));
192+
}
193+
185194
// [[arrow::export]]
186195
std::shared_ptr<arrow::DataType> FixedSizeBinary__initialize(R_xlen_t byte_width) {
187196
if (byte_width == NA_INTEGER) {

r/tests/testthat/test-Array.R

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -801,6 +801,24 @@ test_that("Array$create() should have helpful error", {
801801
expect_error(Array$create(list()), "Requires at least one element to infer")
802802
expect_error(Array$create(list(lgl, lgl, int)), "Expecting a logical vector")
803803
expect_error(Array$create(list(char, num, char)), "Expecting a character vector")
804+
805+
# hint at casting if direct fails and casting looks like it might work
806+
expect_error(
807+
Array$create(as.double(1:10), type = decimal(4, 2)),
808+
"You might want to try casting manually"
809+
)
810+
811+
expect_error(
812+
Array$create(1:10, type = decimal(12, 2)),
813+
"You might want to try casting manually"
814+
)
815+
816+
a <- expect_error(Array$create("one", int32()))
817+
b <- expect_error(vec_to_Array("one", int32()))
818+
# the captured conditions (errors) are not identical, but their messages should be
819+
expect_s3_class(a, "rlang_error")
820+
expect_s3_class(b, "simpleError")
821+
expect_equal(a$message, b$message)
804822
})
805823

806824
test_that("Array$View() (ARROW-6542)", {

0 commit comments

Comments
 (0)