Skip to content

Commit 46b1bc7

Browse files
sunchaowesm
authored andcommitted
ARROW-4160: [Rust] Add README and executable files to parquet
Author: Chao Sun <sunchao@apache.org> Closes apache#3314 from sunchao/ARROW-4160 and squashes the following commits: 9d215df <Chao Sun> ARROW-4160: Add README and executable files to parquet
1 parent 857deae commit 46b1bc7

4 files changed

Lines changed: 289 additions & 1 deletion

File tree

rust/parquet/Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,11 @@
1717

1818
[package]
1919
name = "parquet"
20-
version = "0.12.0-SNAPSHOT"
20+
version = "0.5.0-SNAPSHOT"
2121
license = "Apache-2.0"
2222
description = "Apache Parquet implementation in Rust"
23+
homepage = "https://github.com/apache/arrow"
24+
repository = "https://github.com/apache/arrow"
2325
authors = ["Apache Arrow <dev@arrow.apache.org>"]
2426
keywords = [ "arrow", "parquet", "hadoop" ]
2527
readme = "README.md"

rust/parquet/README.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# An Apache Parquet implementation in Rust
21+
22+
## Usage
23+
Add this to your Cargo.toml:
24+
```toml
25+
[dependencies]
26+
parquet = "0.4"
27+
```
28+
29+
and this to your crate root:
30+
```rust
31+
extern crate parquet;
32+
```
33+
34+
Example usage of reading data:
35+
```rust
36+
use std::fs::File;
37+
use std::path::Path;
38+
use parquet::file::reader::{FileReader, SerializedFileReader};
39+
40+
let file = File::open(&Path::new("/path/to/file")).unwrap();
41+
let reader = SerializedFileReader::new(file).unwrap();
42+
let mut iter = reader.get_row_iter(None).unwrap();
43+
while let Some(record) = iter.next() {
44+
println!("{}", record);
45+
}
46+
```
47+
See [crate documentation](https://docs.rs/crate/parquet/0.4.2) on available API.
48+
49+
## Supported Parquet Version
50+
- Parquet-format 2.4.0
51+
52+
To update Parquet format to a newer version, check if [parquet-format](https://github.com/sunchao/parquet-format-rs)
53+
version is available. Then simply update version of `parquet-format` crate in Cargo.toml.
54+
55+
## Features
56+
- [X] All encodings supported
57+
- [X] All compression codecs supported
58+
- [X] Read support
59+
- [X] Primitive column value readers
60+
- [X] Row record reader
61+
- [ ] Arrow record reader
62+
- [X] Statistics support
63+
- [X] Write support
64+
- [X] Primitive column value writers
65+
- [ ] Row record writer
66+
- [ ] Arrow record writer
67+
- [ ] Predicate pushdown
68+
- [ ] Parquet format 2.5 support
69+
- [ ] HDFS support
70+
71+
## Requirements
72+
- Rust nightly
73+
74+
See [Working with nightly Rust](https://github.com/rust-lang-nursery/rustup.rs/blob/master/README.md#working-with-nightly-rust)
75+
to install nightly toolchain and set it as default.
76+
77+
## Build
78+
Run `cargo build` or `cargo build --release` to build in release mode.
79+
Some features take advantage of SSE4.2 instructions, which can be
80+
enabled by adding `RUSTFLAGS="-C target-feature=+sse4.2"` before the
81+
`cargo build` command.
82+
83+
## Test
84+
Run `cargo test` for unit tests.
85+
86+
## Binaries
87+
The following binaries are provided (use `cargo install` to install them):
88+
- **parquet-schema** for printing Parquet file schema and metadata.
89+
`Usage: parquet-schema <file-path> [verbose]`, where `file-path` is the path to a Parquet file,
90+
and optional `verbose` is the boolean flag that allows to print full metadata or schema only
91+
(when not specified only schema will be printed).
92+
93+
- **parquet-read** for reading records from a Parquet file.
94+
`Usage: parquet-read <file-path> [num-records]`, where `file-path` is the path to a Parquet file,
95+
and `num-records` is the number of records to read from a file (when not specified all records will
96+
be printed).
97+
98+
If you see `Library not loaded` error, please make sure `LD_LIBRARY_PATH` is set properly:
99+
```
100+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
101+
```
102+
103+
## Benchmarks
104+
Run `cargo bench` for benchmarks.
105+
106+
## Docs
107+
To build documentation, run `cargo doc --no-deps`.
108+
To compile and view in the browser, run `cargo doc --no-deps --open`.
109+
110+
## License
111+
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
//! Binary file to read data from a Parquet file.
19+
//!
20+
//! # Install
21+
//!
22+
//! `parquet-read` can be installed using `cargo`:
23+
//! ```
24+
//! cargo install parquet
25+
//! ```
26+
//! After this `parquet-read` should be globally available:
27+
//! ```
28+
//! parquet-read XYZ.parquet
29+
//! ```
30+
//!
31+
//! The binary can also be built from the source code and run as follows:
32+
//! ```
33+
//! cargo run --bin parquet-read XYZ.parquet
34+
//! ```
35+
//!
36+
//! # Usage
37+
//!
38+
//! ```
39+
//! parquet-read <file-path> [num-records]
40+
//! ```
41+
//! where `file-path` is the path to a Parquet file and `num-records` is the optional
42+
//! numeric option that allows to specify number of records to read from a file.
43+
//! When not provided, all records are read.
44+
//!
45+
//! Note that `parquet-read` reads full file schema, no projection or filtering is
46+
//! applied.
47+
48+
extern crate parquet;
49+
50+
use std::{env, fs::File, path::Path, process};
51+
52+
use parquet::file::reader::{FileReader, SerializedFileReader};
53+
54+
fn main() {
55+
let args: Vec<String> = env::args().collect();
56+
if args.len() != 2 && args.len() != 3 {
57+
println!("Usage: parquet-read <file-path> [num-records]");
58+
process::exit(1);
59+
}
60+
61+
let mut num_records: Option<usize> = None;
62+
if args.len() == 3 {
63+
match args[2].parse() {
64+
Ok(value) => num_records = Some(value),
65+
Err(e) => panic!("Error when reading value for [num-records], {}", e),
66+
}
67+
}
68+
69+
let path = Path::new(&args[1]);
70+
let file = File::open(&path).unwrap();
71+
let parquet_reader = SerializedFileReader::new(file).unwrap();
72+
73+
// Use full schema as projected schema
74+
let mut iter = parquet_reader.get_row_iter(None).unwrap();
75+
76+
let mut start = 0;
77+
let end = num_records.unwrap_or(0);
78+
let all_records = num_records.is_none();
79+
80+
while all_records || start < end {
81+
match iter.next() {
82+
Some(row) => println!("{}", row),
83+
None => break,
84+
}
85+
start += 1;
86+
}
87+
}
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
//! Binary file to print the schema and metadata of a Parquet file.
19+
//!
20+
//! # Install
21+
//!
22+
//! `parquet-schema` can be installed using `cargo`:
23+
//! ```
24+
//! cargo install parquet
25+
//! ```
26+
//! After this `parquet-schema` should be globally available:
27+
//! ```
28+
//! parquet-schema XYZ.parquet
29+
//! ```
30+
//!
31+
//! The binary can also be built from the source code and run as follows:
32+
//! ```
33+
//! cargo run --bin parquet-schema XYZ.parquet
34+
//! ```
35+
//!
36+
//! # Usage
37+
//!
38+
//! ```
39+
//! parquet-schema <file-path> [verbose]
40+
//! ```
41+
//! where `file-path` is the path to a Parquet file and `verbose` is the optional boolean
42+
//! flag that allows to print schema only, when set to `false` (default behaviour when
43+
//! not provided), or print full file metadata, when set to `true`.
44+
45+
extern crate parquet;
46+
47+
use std::{env, fs::File, path::Path, process};
48+
49+
use parquet::{
50+
file::reader::{FileReader, SerializedFileReader},
51+
schema::printer::{print_file_metadata, print_parquet_metadata},
52+
};
53+
54+
fn main() {
55+
let args: Vec<String> = env::args().collect();
56+
if args.len() != 2 && args.len() != 3 {
57+
println!("Usage: parquet-schema <file-path> [verbose]");
58+
process::exit(1);
59+
}
60+
let path = Path::new(&args[1]);
61+
let mut verbose = false;
62+
if args.len() == 3 {
63+
match args[2].parse() {
64+
Ok(b) => verbose = b,
65+
Err(e) => panic!(
66+
"Error when reading value for [verbose] (expected either 'true' or 'false'): {}",
67+
e
68+
),
69+
}
70+
}
71+
let file = match File::open(&path) {
72+
Err(e) => panic!("Error when opening file {}: {}", path.display(), e),
73+
Ok(f) => f,
74+
};
75+
match SerializedFileReader::new(file) {
76+
Err(e) => panic!("Error when parsing Parquet file: {}", e),
77+
Ok(parquet_reader) => {
78+
let metadata = parquet_reader.metadata();
79+
println!("Metadata for file: {}", &args[1]);
80+
println!("");
81+
if verbose {
82+
print_parquet_metadata(&mut std::io::stdout(), &metadata);
83+
} else {
84+
print_file_metadata(&mut std::io::stdout(), &metadata.file_metadata());
85+
}
86+
}
87+
}
88+
}

0 commit comments

Comments
 (0)