|
| 1 | +<!--- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# An Apache Parquet implementation in Rust |
| 21 | + |
| 22 | +## Usage |
| 23 | +Add this to your Cargo.toml: |
| 24 | +```toml |
| 25 | +[dependencies] |
| 26 | +parquet = "0.4" |
| 27 | +``` |
| 28 | + |
| 29 | +and this to your crate root: |
| 30 | +```rust |
| 31 | +extern crate parquet; |
| 32 | +``` |
| 33 | + |
| 34 | +Example usage of reading data: |
| 35 | +```rust |
| 36 | +use std::fs::File; |
| 37 | +use std::path::Path; |
| 38 | +use parquet::file::reader::{FileReader, SerializedFileReader}; |
| 39 | + |
| 40 | +let file = File::open(&Path::new("/path/to/file")).unwrap(); |
| 41 | +let reader = SerializedFileReader::new(file).unwrap(); |
| 42 | +let mut iter = reader.get_row_iter(None).unwrap(); |
| 43 | +while let Some(record) = iter.next() { |
| 44 | + println!("{}", record); |
| 45 | +} |
| 46 | +``` |
| 47 | +See [crate documentation](https://docs.rs/crate/parquet/0.4.2) on available API. |
| 48 | + |
| 49 | +## Supported Parquet Version |
| 50 | +- Parquet-format 2.4.0 |
| 51 | + |
| 52 | +To update Parquet format to a newer version, check if [parquet-format](https://github.com/sunchao/parquet-format-rs) |
| 53 | +version is available. Then simply update version of `parquet-format` crate in Cargo.toml. |
| 54 | + |
| 55 | +## Features |
| 56 | +- [X] All encodings supported |
| 57 | +- [X] All compression codecs supported |
| 58 | +- [X] Read support |
| 59 | + - [X] Primitive column value readers |
| 60 | + - [X] Row record reader |
| 61 | + - [ ] Arrow record reader |
| 62 | +- [X] Statistics support |
| 63 | +- [X] Write support |
| 64 | + - [X] Primitive column value writers |
| 65 | + - [ ] Row record writer |
| 66 | + - [ ] Arrow record writer |
| 67 | +- [ ] Predicate pushdown |
| 68 | +- [ ] Parquet format 2.5 support |
| 69 | +- [ ] HDFS support |
| 70 | + |
| 71 | +## Requirements |
| 72 | +- Rust nightly |
| 73 | + |
| 74 | +See [Working with nightly Rust](https://github.com/rust-lang-nursery/rustup.rs/blob/master/README.md#working-with-nightly-rust) |
| 75 | +to install nightly toolchain and set it as default. |
| 76 | + |
| 77 | +## Build |
| 78 | +Run `cargo build` or `cargo build --release` to build in release mode. |
| 79 | +Some features take advantage of SSE4.2 instructions, which can be |
| 80 | +enabled by adding `RUSTFLAGS="-C target-feature=+sse4.2"` before the |
| 81 | +`cargo build` command. |
| 82 | + |
| 83 | +## Test |
| 84 | +Run `cargo test` for unit tests. |
| 85 | + |
| 86 | +## Binaries |
| 87 | +The following binaries are provided (use `cargo install` to install them): |
| 88 | +- **parquet-schema** for printing Parquet file schema and metadata. |
| 89 | +`Usage: parquet-schema <file-path> [verbose]`, where `file-path` is the path to a Parquet file, |
| 90 | +and optional `verbose` is the boolean flag that allows to print full metadata or schema only |
| 91 | +(when not specified only schema will be printed). |
| 92 | + |
| 93 | +- **parquet-read** for reading records from a Parquet file. |
| 94 | +`Usage: parquet-read <file-path> [num-records]`, where `file-path` is the path to a Parquet file, |
| 95 | +and `num-records` is the number of records to read from a file (when not specified all records will |
| 96 | +be printed). |
| 97 | + |
| 98 | +If you see `Library not loaded` error, please make sure `LD_LIBRARY_PATH` is set properly: |
| 99 | +``` |
| 100 | +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib |
| 101 | +``` |
| 102 | + |
| 103 | +## Benchmarks |
| 104 | +Run `cargo bench` for benchmarks. |
| 105 | + |
| 106 | +## Docs |
| 107 | +To build documentation, run `cargo doc --no-deps`. |
| 108 | +To compile and view in the browser, run `cargo doc --no-deps --open`. |
| 109 | + |
| 110 | +## License |
| 111 | +Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0. |
0 commit comments