-3

I need to read a database table (sorted rows) with 300+ million TIMESTAMP as LocalDateTime in Java, and I need to get a single hash of all of them. Then, I need to get the same hash from the migrated database (different brand and all) and get the hash to compare.

I think I can use LocalDateTime.toString() to get a String, then get their bytes and use these to update the hash.

However, it's 300 millions values... twice. I'll run this during the database migration so hopefully it should be fast.

What's a good efficient way of getting a byte representation of a LocalDateTime?

By efficient I mean, that I could compare both databases in a reduced time frame, to avoid delaying the whole migration.

17
  • 1
    It'd be a lot better to state your goal Commented Jul 29, 2024 at 17:31
  • 2
    OK. Maybe something like MessageDigest m = MessageDigest.getInstance("SHA256"); ByteBuffer bb = ByteBuffer.allocate(8);bb.putLong(LocalDateTime.now().toEpochSecond(ZoneOffset.UTC));m.update(bb); bb.rewind(); //Repeat n million times Commented Jul 29, 2024 at 17:43
  • 1
    A lot depends on DB engine, you should really update this question and include the exact DB engine and version you use; you should always do that when you're asking for optimization, as most abstractions (such as SQL) no longer hold at that point). Until exact DB engine is available, a directly workable answer 'do THIS, because it will be faster than all alternatives' is not possible to provide. Commented Jul 29, 2024 at 17:49
  • 3
    LocalDateTime already implements hashCode() Commented Jul 29, 2024 at 17:57
  • 1
    @user85421 I'm not sure how solid that would be, but I'm guessing probably not solid enough Commented Jul 29, 2024 at 18:06

1 Answer 1

3

A lot depends on your DB engine. That's generally the case when asking 'I need to operation X a few million times twice a day with a DB engine'. SQL is a standard for a syntax, not for a performance profile.

For the rest of this answer I shall assume postgreSQL.

PostgreSQL adheres to the SQL standard on this and treats the type timestamp as short for timestamp without time zone, which, indeed, matches java class java.time.LocalDateTime the best.

However, a ton of conversion has to happen even if you just invoke rs.getObject(1, java.time.LocalDateTime.class) on your JDBC ResultSet. Yes, the JDBC 4.2 spec will guarantee this works, and, yes, this causes guaranteed lossless conversion. However, Java's LDT type has a boatload of fields (one for year, one for month, and so on), whereas psql bitpacks the data into an 8-byte sequence. Hence, if you so much as ask JDBC to give you a LocalDateTime object, you've already pretty much lost the game then and there - that's doing a boatload of work that isn't required. In fact, it's actively painful if the goal is to produce a hash.

So, don't, if you can. Let the DB to the work:

SELECT EXTRACT(epoch FROM TIMESTAMP '1999-01-08 04:05:06')

You can then get that via rs.getLong(1) via JDBC.

This gets you 915768306. Which is the amount of seconds that have passed since the epoch (midnight, jan 1st, 1970), for the UTC timezone. If you find the millisecond value relevant, you'd have to select 1000 * EXTRACT instead:

try (var stmt = con.createStatement()) {
  try (var rs = stmt.executeQuery("SELECT 1000 * EXTRACT(epoch FROM TIMESTAMP '1999-01-08 04:05:06')")) {
    rs.next();
    long a = rs.getLong(1);
    long b = LocalDateTime.of(1999, 1, 8, 4, 5, 6).toInstant(ZoneOffset.UTC).toEpochMilli();
    assertEquals(a, b); // this will hold.
  }
}

Is that faster? Probably. Certainly converting an LDT to a string and hashing that is making it worse. just call .hashCode() on your LDT if you must.

Sign up to request clarification or add additional context in comments.

4 Comments

That's not a bad idea, actually. I would need to do the same thing in the source database (DB2) making sure I implement the same logic implemented to compare.
That's OK if you can be sure that both RDBMSs are going to respond similarly to such a function call. I would guess that "give me a LocaDateTime" is going to be a lot safer when asked of both
If you really want to let the DB do the work remove Java and JDBC from the picture and generate the hash on the DB side.
What would be vastly superior is if you can ask postgres to just give you those 8 bytes verbatim as a long, possibly endian-adjusted but no other conversion applied to it. I scoured the manuals and couldn't find it. But, hey, that's specifically for psql, maybe for other DB engines that is possible. And maybe there's some obscure way, possibly with stored procedures, to get that number.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.