Recently, I encountered a task in my work to migrate an old system. The main task was to process the Excel files uploaded by users according to certain rules, and store the processed data in the database using SQL Server. The old system runs on a. net system, and a single task takes eight to nine hours. The task given to me was to move to the Java platform and improve performance by 10 times. In order to complete the task, it is necessary to have many more efficient operations. Here is a record of the processing plan for inserting data into the database.
The final number of database rows generated by a single Excel file ranges from 400w to 500w. Due to SQL Server's limitation on the number of pre compiled parameters for SQL, with a maximum of 2100 parameters, it is clearly not possible to insert directly using the insert statement here. After research, it was found that SQL Server has an API called SQL bulk copy that can be used to solve the performance issues of inserting large amounts of data. Here is a brief record of the usage process and the problems encountered.
Let's first release the final key code, as follows:
public <T extends BaseBulkCopyEntity> void bulkCopy(String tableName, List<T> records) {
log.info("start sql bulk copy, table: {}, dataSet size: {}", tableName, records.size());
Connection conn = SqlConnPool.getConnection();
//here, i am taking the current values of each table id the maximum value of has been recorded and can be reduced db the number of times, the same logic, and the following ResultSet
AtomicInteger currentId = BulkCopyCache.getId(tableName);
int rollbackId = currentId.get();
try {
ResultSet resultSet = BulkCopyCache.getResultSet(tableName);
String[] columnNames = BulkCopyCache.getColumnName(tableName);
CachedRowSetImpl crs = new CachedRowSetImpl();
crs.populate(resultSet);
for (T record : records) {
int id = currentId.incrementAndGet();
record.setID(id);
crs.moveToInsertRow();
populate(crs, record, columnNames);
crs.insertRow();
crs.moveToCurrentRow();
}
SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(conn);
SQLServerBulkCopyOptions sqlServerBulkCopyOptions = new SQLServerBulkCopyOptions();
sqlServerBulkCopyOptions.setKeepIdentity(true);
sqlServerBulkCopyOptions.setBatchSize(records.size());
bulkCopy.setBulkCopyOptions(sqlServerBulkCopyOptions);
bulkCopy.setDestinationTableName(tableName);
bulkCopy.writeToServer(crs);
crs.close();
bulkCopy.close();
}
}
//here, the main method is to obtain the attribute values of the object to be inserted through reflection based on the column names of the table
private <T> void populate(CachedRowSetImpl crs, T record, String[] dbFieldNames) throws Exception {
Class clazz = record.getClass();
for (String fieldName : dbFieldNames) {
StringBuilder getMethodName = new StringBuilder("get");
if (fieldName.contains("_")){
String[] singleWords = fieldName.split("_");
for (String singleWord : singleWords) {
getMethodName.append(CommonUtils.upperFirstChar(singleWord));
}
}else {
getMethodName.append(CommonUtils.upperFirstChar(fieldName));
}
Method method = clazz.getMethod(getMethodName.toString(), null);
Object value = method.invoke(record, null);
updateCRS(crs, fieldName, value);
}
}
//according to the type of data value, set the value to rowset inside here value whether it is empty or not, all must be done crs.update operation, otherwise there will be issues bug
private void updateCRS(CachedRowSetImpl crs, String dbFieldName, Object value) throws SQLException {
if (value instanceof String){
crs.updateString(dbFieldName, (String) value);
}else if (value instanceof Integer){
crs.updateInt(dbFieldName, (int) value);
}else if (value instanceof Double){
crs.updateDouble(dbFieldName, (double) value);
}else if (value instanceof Long){
crs.updateLong(dbFieldName, (long) value);
}else if (value instanceof Float){
crs.updateFloat(dbFieldName, (float) value);
}else if (value instanceof Timestamp){
crs.updateTimestamp(dbFieldName, (Timestamp) value);
}else if (value instanceof java.util.Date){
crs.updateDate(dbFieldName, new java.sql.Date(((java.util.Date)value).getTime()));
}else {
crs.updateObject(dbFieldName, value);
}
}
Here, due to unfamiliarity with the API and stepping into pitfalls, one of them is unfamiliarity with the CachedRowSetImpl class. At the beginning, in the updateCRS method, the value will be evaluated, and if it is null, it will be directly returned, leading to confusion in the final result. After investigation, it was changed to the current situation.
This insertion method is indeed much faster than insert, and my batch starts at least 10000. All of them were created based on the limited online blogs and my own trial and error. I have stepped on many pitfalls in the process. Here is a record to help those in need in the future.