JVM Languages

gzip In .tar.gz Format In Kotlin Example

Kotlin, being fully interoperable with Java, allows us to work seamlessly with Java libraries for file compression. Let us delve into understanding how to gzip in .tar.gz format in Kotlin, exploring how to efficiently handle compressed files and implement related functionalities.

1. Understanding Kotlin and Gzip

1.1 Introduction to Kotlin

Kotlin is a modern, statically typed programming language developed by JetBrains and officially supported by Google for Android development. It is designed to be concise, expressive, and interoperable with Java, making it a powerful choice for various software development projects. Kotlin is widely used in mobile app development, web applications, server-side programming, and even data science.

1.1.1 Use Cases of Kotlin

  • Android Development: Kotlin is the preferred language for Android app development, offering null safety, extension functions, and coroutines for asynchronous programming.
  • Backend Development: With frameworks like Ktor and Spring Boot, Kotlin is widely used for server-side applications.
  • Web Development: Kotlin/JS allows developers to write frontend applications using Kotlin, compiling to JavaScript.
  • Cross-Platform Development: Kotlin Multiplatform enables code sharing between Android, iOS, and other platforms.
  • Data Science & Machine Learning: Kotlin is emerging in data science, integrating with tools like Apache Spark.
  • Game Development: Some game developers use Kotlin with game engines such as LibGDX.

1.1.2 Benefits of Kotlin

  • Concise Syntax: Reduces boilerplate code compared to Java.
  • Interoperability with Java: Kotlin can seamlessly work with existing Java codebases.
  • Null Safety: Eliminates null pointer exceptions through safe call operators.
  • Coroutines for Asynchronous Programming: Provides lightweight threads for better performance.
  • Smart Type Inference: Reduces the need for explicit type declarations.
  • Modern Functional Features: Supports lambda expressions, higher-order functions, and more.

1.2 What is Gzip?

Gzip (GNU zip) is a widely used file compression format and software utility that is primarily used to reduce the size of files for storage and transmission. It works by applying the DEFLATE compression algorithm and is especially effective when used in combination with archiving tools like tar. Since Gzip compresses only a single file stream, it is often paired with tar to create compressed archives of multiple files and directories, resulting in a .tar.gz (or .tgz) file. Gzip is supported across UNIX, Linux, and most modern platforms and is essential in systems programming, backups, and software distribution. For more technical details, refer to the official Gzip documentation.

1.2.1 Difference Between Gzip and Java ZIP

  • Gzip: Designed to compress a single file stream. It does not include file metadata or support for multiple files or directories natively. This makes it lightweight but dependent on another tool (like tar) for bundling files.
  • TAR + Gzip: tar is used first to archive multiple files and folder structures into a single uncompressed file, and then gzip compresses that file, resulting in a .tar.gz archive.
  • Java ZIP: The .zip format supports both archiving and compression in one file. In Java, this functionality is available through the java.util.zip package, which allows developers to read, create, and extract ZIP files directly using built-in classes like ZipInputStream and ZipOutputStream.

2. Code Example

2.1 Add Dependencies (build.gradle)

To get started, we need to add the necessary dependency to our project. We’ll be using Apache Commons Compress, a robust Java library for working with compressed and archived file formats including .tar.gz. Add the following line to your build.gradle file under the dependencies block to include it in your Kotlin project.

implementation("org.apache.commons:commons-compress:latest__jar__version")

2.2 Code Example

Once the dependency is added, you can implement the following Kotlin code to handle compression, decompression, and updating of .tar.gz archives. This example demonstrates how to create a .tar.gz archive from a folder, extract it to a target directory, and update an existing archive by adding new files or folders.

import java.io.*
import org.apache.commons.compress.archivers.tar.*
import org.apache.commons.compress.compressors.gzip.*
import java.nio.file.*

fun compressToTarGz(sourceDir: File, outputFile: File) {
    GzipCompressorOutputStream(BufferedOutputStream(FileOutputStream(outputFile))).use { gzipOut ->
        TarArchiveOutputStream(gzipOut).use { tarOut ->
            tarOut.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU)
            addFilesToTar(tarOut, sourceDir, "")
        }
    }
}

fun addFilesToTar(tarOut: TarArchiveOutputStream, file: File, parent: String) {
    val entryName = if (parent.isEmpty()) file.name else "$parent/${file.name}"
    val entry = tarOut.createArchiveEntry(file, entryName)

    tarOut.putArchiveEntry(entry)
    if (file.isFile) {
        file.inputStream().use { it.copyTo(tarOut) }
    }
    tarOut.closeArchiveEntry()

    if (file.isDirectory) {
        file.listFiles()?.forEach { child ->
            addFilesToTar(tarOut, child, entryName)
        }
    }
}

fun decompressTarGz(inputFile: File, outputDir: File) {
    GzipCompressorInputStream(BufferedInputStream(FileInputStream(inputFile))).use { gzipIn ->
        TarArchiveInputStream(gzipIn).use { tarIn ->
            var entry: TarArchiveEntry?
            while (tarIn.nextTarEntry.also { entry = it } != null) {
                val outPath = File(outputDir, entry!!.name)
                if (entry!!.isDirectory) {
                    outPath.mkdirs()
                } else {
                    outPath.parentFile.mkdirs()
                    FileOutputStream(outPath).use { outStream ->
                        tarIn.copyTo(outStream)
                    }
                }
            }
        }
    }
}

fun updateTarGzArchive(existingArchive: File, filesToAdd: List<File>, tempDir: File) {
    val tempExtractDir = File(tempDir, "extracted").apply { mkdirs() }

    // Step 1: Decompress existing archive
    decompressTarGz(existingArchive, tempExtractDir)

    // Step 2: Copy new files to extracted directory
    filesToAdd.forEach { file ->
        val target = File(tempExtractDir, file.name)
        file.copyRecursively(target, true)
    }

    // Step 3: Recompress everything
    compressToTarGz(tempExtractDir, existingArchive)
}

fun main() {
    // Compress folder into tar.gz
    val source = File("test_folder") // Folder you want to compress
    val output = File("archive.tar.gz")
    compressToTarGz(source, output)
    println("Compression complete: ${output.absolutePath}")

    // Decompress the tar.gz file
    val outputDir = File("output_folder")
    decompressTarGz(output, outputDir)
    println("Decompression complete to ${outputDir.absolutePath}")

    // Add new files to the existing tar.gz archive
    val newFile = File("newfile.txt")
    val newFolder = File("newfolder")
    val filesToAdd = listOf(newFile, newFolder)
    updateTarGzArchive(output, filesToAdd, File("temp"))
    println("Archive updated: ${output.absolutePath}")
}

2.2.1 Code Explanation

The provided Kotlin code defines functions to handle compression, decompression, and updating of .tar.gz archives using Apache Commons Compress. The compressToTarGz function takes a source directory and output file, wraps a GzipCompressorOutputStream around a BufferedOutputStream and FileOutputStream, then wraps that with a TarArchiveOutputStream where it sets the long file mode to GNU to handle long paths. It then calls addFilesToTar, a recursive function that creates archive entries for each file and directory inside the source folder. If the item is a file, it streams its content into the tar output; if it is a directory, it continues recursion on its children. The decompressTarGz function reads a .tar.gz archive using GzipCompressorInputStream and TarArchiveInputStream, iterates through each TarArchiveEntry, and either create the required directory structure or writes out the extracted files while preserving paths. The updateTarGzArchive function first decompresses the existing archive into a temporary directory, adds or overwrites files from a given list using copyRecursively, and then recompresses the result using compressToTarGz, effectively updating the archive in place. Finally, the main function demonstrates these utilities by compressing a folder named test_folder into archive.tar.gz, decompressing it into output_folder, and updating the archive with a file newfile.txt and a folder newfolder using a temporary directory named temp.

2.2.2 Code Output

When the Kotlin program is executed, it first compresses the contents of the folder named test_folder into a .tar.gz file named archive.tar.gz, then decompresses this archive into a new folder called output_folder, and finally updates the archive by adding or replacing its contents with newfile.txt and all contents of the newfolder directory using a temporary directory named temp for extraction and recompression; during execution, the console output displays the absolute path of the compressed archive after compression (Compression complete: /absolute/path/to/archive.tar.gz), the destination folder after decompression (Decompression complete to /absolute/path/to/output_folder), and the confirmation of archive update (Archive updated: /absolute/path/to/archive.tar.gz), where /absolute/path/to/ reflects the actual working directory on the user’s file system.

3. Conclusion

Working with .tar.gz files in Kotlin is straightforward when using libraries like Apache Commons Compress.
You’ve seen how to compress folders, decompress archives, and even append new files using Kotlin. Although tar.gz doesn’t support direct appends like ZIP, the workaround is simple and efficient. With Kotlin’s concise syntax and Java interoperability, managing compressed files becomes both elegant and effective.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button