0

I am trying to make a flexible mergesort function for primitive data types like char, byte, int, long etc, where the fucntion can take an array of any type and sort it by dividing it based on the size provided. The function parameters are:

void mergesort(void* arr, size_t n, size_t size)

where n is the number of elements in the array and size is the size of each member in the array.

I am trying to use memcmp to compare the elements:

memcmp((arr1+i*size),(arr2+j*size),size)

where arr1 and arr2 are pointers to the two divided arrays. While this does work with strings and positive numbers, it gives a problem with negative integers because of the sign bit.

Is there any way I can modify the input or manupulate the bits of input to memcmp so that it can compare with negative integers too?

I do not wish to pass a function as a paramter or use if-else to determine the data type from the size and use different cases for different data types.

14
  • 8
    I do not wish to pass a function as a paramter [sic] But that's exactly what you need to do if you want to make a generic sorting function. Commented Aug 27, 2024 at 20:05
  • 2
    Passing a pointer to a comparison function is what the standard qsort function does, because there's really no other way to do a generic comparison otherwise. Think for example about sorting an array of structures, where only one member of the structure is relevant for the ordering. If you use memcmp you will use the whole structure, and all of its members plus eventual padding which will contain indeterminate data. Commented Aug 27, 2024 at 20:10
  • 1
    ...and in cases where the primary members compare equal you can then decide the order from other members. memcmp() doen't seem to be the right tool for the job when there is an array of struct. Commented Aug 27, 2024 at 20:23
  • 3
    memcmp() also is not the right tool for the job when sorting integers larger than one byte, as the result will depend on byte order, which is likely undesirable. You're probably on a little-endian machine. In that case, try sorting a collection of positive ints having a wide range of values, such that they don't all fit in a single byte. Commented Aug 27, 2024 at 20:32
  • 1
    @argo Note your concern is about negative integers, yet char is signed or unsigned, depending on an implementation detail. Commented Aug 27, 2024 at 21:47

2 Answers 2

2

Is there any way I can modify the input or manupulate the bits of input to memcmp so that it can compare with negative integers too?

Corresponding signed and unsigned integer types have the same size and consistent representation. You cannot tell from the representation of an integer alone whether to interpret it as signed or unsigned, and you need to compare differently to sort as signed numbers than you do to sort as unsigned numbers.

You could modify inputs of signed types, type-specifically, so that you can sort them as unsigned numbers, then convert back after sorting. For type int, that might look like so:

void mergesort_int(int* arr, size_t n) {
    unsigned int *uarr = (unsigned int *)arr;

    for (size_t i = 0; i < n; i++) {
        *uarr -= (unsigned int) INT_MIN;
    }

    mergesort(uarr, n, sizeof *uarr);

    for (size_t i = 0; i < n; i++) {
        *uarr += (unsigned int) INT_MIN;
    }
}

But such conversions are type-specific, so you will need one for each supported signed type. And this is not going to be all in one function, unless that function is a godawful ugly mess.

Additionally, you need to account for byte order. That could be layered on top of the above as a conversion to and from big-endian byte order, but that only makes the mess worse.

I do not wish to pass a function as a paramter or use if-else to determine the data type from the size and use different cases for different data types.

Mick Jagger has some sage advice for you: "You can't always get what you want."

There is a good reason why the standard library's qsort() takes a comparison function as an argument. Different data types do not have similar enough representation to compare consistently via a single, universal function. Even restricting to integer types is not sufficient for that. On a big-endian machine, you might get away with a universal comparison function for unsigned integer types only, but even that does not work on a little-endian machine if you're looking for numeric order.

Passing a comparison function is the cleanest approach I know of.

Sign up to request clarification or add additional context in comments.

1 Comment

Or "tap-dancing on land mines...." (Aerosmith). Funny how all that great type information melts away through the const void* (or now const void s[.n]) parameters to memcmp().
0

Let's say the array is some signed integer type, maybe (0,) 1, 2, 3 ... size bytes long.

memcmp() compares bytes as if they were unsigned, so compare the byte with the sign bit first using signed operations.

For practicality, we'll assume 2's complement.

// Compare 2 arrays as if they were some sort of signed integers.
int arjo_compare(const void* arr1, const void* arr1, size_t size) {
  if (size > 0) {
    const signed char* int1 = arr1;
    const signed char* int2 = arr2;
    #if BIG_ENDIAN
      if (*int1 != *int2) {
        return *int1 - *int2;
      }
      if (size > 0) {
        return memcmp(++int1, ++int2, size - 1);
      }
    #else 
      // TBD code
      // If not Big Endian, we cannot use `memcmp()`.
      // Consider a loop.
    #endif
  }
  return 0;
}

If the array is something else, then need more info from OP.


I do not wish to pass a function as a parameter or use if-else to determine the data type from the size and use different cases for different data types.

Research _Generic so the compare function passed is derive from the types coded and the user does not need to explicitly call it.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.