Some batch processing methods based on Python 1

Why use Python batch processing

In our work, when encountering a large amount of images and data, manual classification requires a lot of time and effort. To improve efficiency, I have several batch processing methods here, namely for CSV files and images. Also, make a note of your own code here.

Batch processing for CSV (splitting files)

Scenario:

The superior gave a very large CSV file, which would take a long time just to open. The second column and third column of the file represent the time series (minutes, seconds). According to the time series, divide the CSV in order and store it in a folder.

Firstly, we need to cut the file. That is, 2s is a cutting window, retaining all data of the corresponding row.

import csv
import os
csvpath = './room32_3.csv'
folder_path = '.room323'
#the purpose of this file is to divide the data file into time periods, with each time period's data stored in a csv in the file 
#the above two paths, csvpath the file path to be processed. folder_path to create a new folder path 
#open csv file 
with open(csvpath, 'r') as f:
#establish csv reader 
reader = csv.reader(f)
#read all data 
data = list(reader)
#store minutes, seconds, and data 
minutes = []
seconds = []
values = []
for row in data:
minutes.append(int(row[1]))
seconds.append(float(row[2]))
values.append([str(x) if i < 9 else str(x) for i, x in enumerate(row[3:])])
#calculate time period 
time_intervals = [int((minutes[i] * 60 + seconds[i])) // 2 for i in range(len(minutes))]
#storing data 
data_intervals = [[] for _in range(len(set(time_intervals)))]
for i in range(len(values)):
data_intervals[time_intervals[i]].append(values[i])
#create folder 
if not os.path.exists(folder_path):
os.makedirs(folder_path)
#save data 
for i, data_interval in enumerate(data_intervals):
if i < 150:
  file_path = os.path.join(folder_path, str(i + 1) + '.csv')
  print('file {} done'.format(i + 1))
else:
  file_path = os.path.join(folder_path, '151.csv')
  print('file {} done'.format(151))
with open(file_path, 'w', newline='') as f:
  writer = csv.writer(f)
  writer.writerow(data[0])
  for j in range(len(data_interval)):
      if j % 2 == 0:
          writer.writerow([str(x).replace('(', '').replace(')', '') for x in data_interval[j]])
  print('data done')
#traverse all folders csv file 
for file_name in os.listdir(folder_path):
if file_name.endswith('.csv'):
  # csv file path 
  file_path = os.path.join(folder_path, file_name)
  #open csv file 
  with open(file_path, 'r') as f:
      #establish csv reader 
      reader = csv.reader(f)
      #read all data 
      data = list(reader)
  #delete the first three columns of data in the first row 
  for row in data:
      del row[0:3]
  #move the remaining data in the first row forward by 3 columns 
  for i in range(3, len(data[0])):
      data[0][i-3] = data[0][i]
  del data[0][-3:]
  #save processed data 
  with open(file_path, 'w', newline='') as f:
      writer = csv.writer(f)
      writer.writerows(data)

Code interpretation:

values.append([str(x) if i < 9 else str(x) for i, x in enumerate(row[3:])])

This is where minutes, seconds, and entire rows of data are stored. But in reality, in my CSV file, columns 4-12 are all integers, and after columns 12, all are plural. Therefore, it is necessary to handle it this way. Previously considered using the plural Complex Using for integers Int . But the data storage section will report an error later. Because the list is not accepted Float '.

#calculate time period 
time_intervals = [int((minutes[i] * 60 + seconds[i])) // 2 for i in range(len(minutes))]

This section is about the time counting segmentation operation. We have already stored the minutes and seconds in the array earlier. So cut according to each window of 2s. This array also needs to be converted to Int The reason is the same as above.

#traverse all folders csv file 
for file_name in os.listdir(folder_path):
if file_name.endswith('.csv'):
  # csv file path 
  file_path = os.path.join(folder_path, file_name)
  #open csv file 
  with open(file_path, 'r') as f:
      #establish csv reader 
      reader = csv.reader(f)
      #read all data 
      data = list(reader)
  #delete the first three columns of data in the first row 
  for row in data:
      del row[0:3]
  #move the remaining data in the first row forward by 3 columns 
  for i in range(3, len(data[0])):
      data[0][i-3] = data[0][i]
  del data[0][-3:]
  #save processed data 
  with open(file_path, 'w', newline='') as f:
      writer = csv.writer(f)
      writer.writerows(data)

What is this part? Many people don't quite understand it
It's like this. After completing the previous operation, I found that the first line of all the files after cutting has three more columns than the other lines, and the elements are: 0 0 0
This is obviously a problem I had when dealing with it earlier. The array retains the original three elements of hours, minutes, and seconds. But I really don't want to look back, so I completed this part: Delete the first three elements of the first line of each file and move the subsequent elements forward by 3 spaces .

Python Batch Processing 2: Batch File Classification

Scene

After I finished segmenting the files earlier, the boss gave me a txt text file and asked me to classify the segmented CSV files in the corresponding order and put them in their respective folders.

What does it mean
There is a series of numbers in txt, separated by/t. There are a total of 151 numbers. The number is 0-3, which means there are a total of four folders. Convert txt into an array, with each element corresponding to a CSV file that I have segmented over time. Now send the corresponding file into the folder corresponding to the array.

This is not difficult, it's just about getting all the CSV files in the folder, then iterating through them in order, finding the corresponding txt array number, and saving the file.

mport numpy as np
import os
from matplotlib import pyplot as plt
import shutil
def readtxt(path):
with open(path, 'r') as f:
  data = f.read().split('t')
data = [int(x.strip()) for x in data]
data = np.array(data)
return data
path = 'store what you need here txt file'
folder_path = 'csv the folder where the file is located'
new_folder_path = 'you want to copy csv a new folder for files, creating a 4-minute small file in the new folder, as implemented in the following text'
data = readtxt(path)
#traverse all folders csv file 
for i, file_name in enumerate(os.listdir(folder_path)):
if file_name.endswith('.csv'):
  # csv file path 
  file_path = os.path.join(folder_path, file_name)
  #if data if the corresponding number in the array is 0, copy the file to a new path 
  if data[i] == 0:
      folder_path_num = os.path.join(new_folder_path, str(data[i]))
      new_file_name = 'room323'+file_name
      new_file_path = os.path.join(folder_path_num, new_file_name)
      shutil.copyfile(file_path, new_file_path)
  elif data[i] == 1:
      folder_path_num = os.path.join(new_folder_path, str(data[i]))
      new_file_name = 'room323'+file_name
      new_file_path = os.path.join(folder_path_num, new_file_name)
      shutil.copyfile(file_path, new_file_path)
  elif data[i] == 2:
      folder_path_num = os.path.join(new_folder_path, str(data[i]))
      new_file_name = 'room323'+file_name
      new_file_path = os.path.join(folder_path_num, new_file_name)
      shutil.copyfile(file_path, new_file_path)
  elif data[i] == 3:
      folder_path_num = os.path.join(new_folder_path, str(data[i]))
      new_file_name = 'room323'+file_name
      new_file_path = os.path.join(folder_path_num, new_file_name)
      shutil.copyfile(file_path, new_file_path)

Related articles