Skip to content
/ s3git Public

s3git: git for Cloud Storage. Distributed Version Control for Data. Create decentralized and versioned repos that scale infinitely to 100s of millions of files. Clone huge PB-scale repos on your local SSD to make changes, commit and push back. Oh yeah, it dedupes too and offers directory versioning.

License

Notifications You must be signed in to change notification settings

s3git/s3git

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

s3git: git for Cloud Storage

s3git applies the git philosophy to Cloud Storage. If you know git, you will know how to use s3git!

s3git is a simple CLI tool that allows you to create a distributed, decentralized and versioned repository. It scales limitlessly to 100s of millions of files and PBs of storage and stores your data safely in S3. Yet huge repos can be cloned on the SSD of your laptop for making local changes, committing and pushing back.

Exactly like git, s3git does not require any server-side components, just download and run the executable. It imports the golang package s3git-go that can be used from other applications as well.

Download binaries

DISCLAIMER: These are PRE-RELEASE binaries -- use at your own peril for now

OSX

Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.0/s3git-darwin-amd64

$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.0/s3git-darwin-amd64
$ chmod +x s3git
$ ./s3git

Linux

Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.0/s3git-linux-amd64

$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.0/s3git-linux-amd64
$ mv s3git-linux-amd64 s3git
$ chmod +x s3git
$ ./s3git

Building from source

Unfortunately not yet possible due to missing SDK, stay tuned!

Example workflow

Here is a simple workflow to create a new repository and populate it with some data:

$ mkdir s3git_repo
$ cd s3git_repo
$ s3git init
Initialized empty s3git repository in .../test/s3git_repo
$ echo "hello s3git" | s3git add
Added: 18e622875a89cede0d7019b2c8afecf8928c21eac18ec51e38a8e6b829b82c3ef306dec34227929fa77b1c7c329b3d4e50ed9e72dc4dc885be0932d3f28d7053
$ s3git add "*.mp4"
$ s3git commit -m "My first commit"
$ s3git log --pretty

Push to cloud storage

$ s3git remote add "primary" -r s3://yourbucket -a "YOUR_ACCESS_KEY" -s "YOUR_SECRET_KEY"
$ s3git push
$ s3git cat 18e6
hello s3git

Clone the YFCC100M dataset

Clone a large repo with 100 million files totaling 11.5 TB in size (Multimedia Commons), yet requiring only 7 GB local disk space (takes several minutes):

$ s3git clone s3://s3git-100m -a "AKIAI26TSIF6JIMMDSPQ" -s "5NvshAhI0KMz5Gbqkp7WNqXYlnjBjkf9IaJD75x7"
Cloning into ...
Done. Totaling 97,345,456 objects.
$ cd s3git-100m
$ s3git ls 123456
|100 kB| 12345649755b9f489df2470838a76c9df1d4ee85e864b15cf328441bd12fdfc23d5b95f8abffb9406f4cdf05306b082d3773f0f05090766272e2e8c8b8df5997
|100 kB| 123456629a711c83c28dc63f0bc77ca597c695a19e498334a68e4236db18df84a2cdd964180ab2fcf04cbacd0f26eb345e09e6f9c6957a8fb069d558cadf287e
|100 kB| 123456675eaecb4a2984f2849d3b8c53e55dd76102a2093cbca3e61668a3dd4e8f148a32c41235ab01e70003d4262ead484d9158803a1f8d74e6acad37a7a296
|100 kB| 123456e6c21c054744742d482960353f586e16d33384f7c42373b908f7a7bd08b18768d429e01a0070fadc2c037ef83eef27453fc96d1625e704dd62931be2d1
$ s3git cat cafebad > olympic.jpg
$ s3git ls | wc -l
97345456

And collaborate

Continuing as alice from the example above, clone it again as bob on a different computer or in a different directory

alice $
bob $

Contributions

Contributions are welcome! Please see CONTRIBUTING.md.

Key features

  • Easy: Use a workflow and syntax that you already know and love

  • Fast: Lightning fast operation, especially on large files and huge repositories

  • Infinite scalability: Stop worrying about maximum repository sizes and have the ability to grow indefinitely

  • Work from local SSD: Make a huge cloud disk appear like a local drive

  • Instant sync: Push local changes and pull down instantly on other clones

  • Versioning: Keep previous versions safe and have the ability to undo or go back in time

  • Forking: Ability to make many variants by forking

  • Verifiable: Be sure that you have everything and be tamper-proof (“data has not been messed with”)

  • Deduplication: Do not store the same data twice

  • Simplicity: Simple by design and provide one way to accomplish tasks

Command Line Help

$ s3git help
s3git applies the git philosophy to Cloud Storage. If you know git, you will know how to use s3git.

s3git is a simple CLI tool that allows you to create a distributed, decentralized and versioned repository.
It scales limitlessly to 100s of millions of files and PBs of storage and stores your data safely in S3.
Yet huge repos can be cloned on the SSD of your laptop for making local changes, committing and pushing back.

Usage:
  s3git [command]

Available Commands:
  add         Add file(s) to the repository
  cat         Read a file from the repository
  clone       Clone a repository into a new directory
  commit      Commit the changes in the repository
  init        Create an empty repository
  ls          List files in the repository
  pull        Update local repository
  push        Update remote repositories
  remote      Manage remote reposities
  status      Show changes to repository

Flags:
  -h, --help[=false]: help for s3git

Use "s3git [command] --help" for more information about a command.

Use cases

s3git commit "Holiday pictures"
s3git commit "Photos from birthday"
s3git log

License

s3git is released under the Apache License v2.0. You can find the complete text in the file LICENSE.

FAQ

Q Why don't you provide a FUSE interface?
A Supporting FUSE would mean introducing a lot of complexity related to POSIX which we would rather avoid.

About

s3git: git for Cloud Storage. Distributed Version Control for Data. Create decentralized and versioned repos that scale infinitely to 100s of millions of files. Clone huge PB-scale repos on your local SSD to make changes, commit and push back. Oh yeah, it dedupes too and offers directory versioning.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages