Skip to content

Commit b3a3a74

Browse files
trxcllntwesm
authored andcommitted
ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests
This PR adds a workaround for reading the metadata layout for C++ dictionary-encoded vectors. I added tests that validate against the C++/Java integration suite. In order to make the new tests pass, I had to update the generated flatbuffers format and add a few types the JS version didn't have yet (Bool, Date32, and Timestamp). It also uses the new `isDelta` flag on DictionaryBatches to determine whether the DictionaryBatch vector should replace or append to the existing dictionary. I also added a script for generating test arrow files from the C++ and Java implementations, so we don't break the tests updating the format in the future. I saved the generated Arrow files in with the tests because I didn't see a way to pipe the JSON test data through the C++/Java json-to-arrow commands without writing to a file. If I missed something and we can do it all in-memory, I'd be happy to make that change! This PR is marked WIP because I added an [integration test](apache@6e98874#diff-18c6be12406c482092d4b1f7bd70a8e1R22) that validates the JS reader reads C++ and Java files the same way, but unfortunately it doesn't. Debugging, I noticed a number of other differences between the buffer layout metadata between the C++ and Java versions. If we go ahead with @jacques-n [comment in ARROW-1693](https://issues.apache.org/jira/browse/ARROW-1693?focusedCommentId=16244812&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16244812) and remove/ignore the metadata, this test should pass too. cc @TheNeuralBit Author: Paul Taylor <paul.e.taylor@me.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1294 from trxcllnt/generate-js-test-files and squashes the following commits: f907d5a [Paul Taylor] fix aggressive closure-compiler mangling in the ES5 UMD bundle 57c7df4 [Paul Taylor] remove arrow files from perf tests 5972349 [Paul Taylor] update performance tests to use generated test data 14be77f [Paul Taylor] fix Date64Vector TypedArray, enable datetime integration tests 5660eb3 [Wes McKinney] Use openjdk8 for integration tests, jdk7 for main Java CI job 019e8e2 [Paul Taylor] update closure compiler with full support for ESModules, and remove closure-compiler-scripts 4811129 [Paul Taylor] Add support for reading Arrow buffers < MetadataVersion 4 c72134a [Paul Taylor] compile JS source in integration tests c83a700 [Wes McKinney] Hack until ARROW-1837 resolved. Constrain unsigned integers max to signed max for bit width fd3ed47 [Wes McKinney] Uppercase hex values 224e041 [Wes McKinney] Remove hard-coded file name to prevent primitive JSON file from being clobbered 0882d8e [Paul Taylor] separate JS unit tests from integration tests in CI 1f6a81b [Paul Taylor] add missing mkdirp for test json data 19136fb [Paul Taylor] remove test data files in favor of auto-generating them in CI 9f19568 [Paul Taylor] Generate test files when the test run if they don't exist 0cdb74e [Paul Taylor] Add a cli arg to integration_test.py generate test JSON files for JS cc74456 [Paul Taylor] resolve LICENSE.txt conflict 3391623 [Paul Taylor] move js license to top-level license.txt d0b61f4 [Paul Taylor] add validate package script back in, make npm-release.sh suitable for ASF release process 7e3be57 [Paul Taylor] Copy license.txt and notice.txt into target dirs from arrow root. c8125d2 [Paul Taylor] Update readme to reflect new Table.from signature 49ac339 [Paul Taylor] allow unrecognized cli args in gulpfile 3c52587 [Paul Taylor] re-enable node_js job in travis cb142f1 [Paul Taylor] add npm release script, remove unused package scripts d51793d [Paul Taylor] run tests on src folder for accurate jest coverage statistics c087f48 [Paul Taylor] generate test data in build scripts 1d814d0 [Paul Taylor] excise test data csvs 14d4896 [Paul Taylor] stringify Struct Array cells 1f00496 [Paul Taylor] rename FixedWidthListVector to FixedWidthNumericVector be73c91 [Paul Taylor] add BinaryVector, change ListVector to always return an Array 02fb300 [Paul Taylor] compare iterator results in integration tests e67a66a [Paul Taylor] remove/ignore test snapshots (getting too big) de7d96a [Paul Taylor] regenerate test arrows from master a6d3c83 [Paul Taylor] enable integration tests 44889fb [Paul Taylor] report errors generating test arrows fd68d51 [Paul Taylor] always increment validity buffer index while reading 562eba7 [Paul Taylor] update test snapshots d4399a8 [Paul Taylor] update integration tests, add custom jest vector matcher 8d44dcd [Paul Taylor] update tests 6d2c03d [Paul Taylor] clean arrows folders before regenerating test data 4166a9f [Paul Taylor] hard-code reader to Arrow spec and ignore field layout metadata c60305d [Paul Taylor] refactor: flatten vector folder, add more types ba984c6 [Paul Taylor] update dependencies 5eee3ea [Paul Taylor] add integration tests to compare how JS reads cpp vs. java arrows d4ff57a [Paul Taylor] update test snapshots 407b9f5 [Paul Taylor] update reader/table tests for new generated arrows 8549706 [Paul Taylor] update cli args to execute partial test runs for debugging eefc256 [Paul Taylor] remove old test arrows, add new generated test arrows 0cd31ab [Paul Taylor] add generate-arrows script to tests 3ff7138 [Paul Taylor] Add bool, date, time, timestamp, and ARROW-1693 workaround in reader 4a34247 [Paul Taylor] export Row type 141194e [Paul Taylor] use fieldNode.length as vector length c45718e [Paul Taylor] support new DictionaryBatch isDelta flag 9d8fef9 [Paul Taylor] split DateVector into Date32 and Date64 types 8592ff3 [Paul Taylor] update generated format flatbuffers
1 parent d92735e commit b3a3a74

99 files changed

Lines changed: 2468 additions & 6081 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.travis.yml

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ matrix:
8787
- $TRAVIS_BUILD_DIR/ci/travis_script_manylinux.sh
8888
- language: java
8989
os: linux
90-
jdk: openjdk8
90+
jdk: openjdk7
9191
script:
9292
- $TRAVIS_BUILD_DIR/ci/travis_script_java.sh
9393
- language: java
@@ -103,23 +103,24 @@ matrix:
103103
- language: java
104104
os: linux
105105
env: ARROW_TEST_GROUP=integration
106-
jdk: openjdk7
106+
jdk: openjdk8
107107
before_script:
108108
- source $TRAVIS_BUILD_DIR/ci/travis_install_clang_tools.sh
109109
- export CC="clang-4.0"
110110
- export CXX="clang++-4.0"
111+
- nvm install node
111112
- $TRAVIS_BUILD_DIR/ci/travis_lint.sh
113+
- $TRAVIS_BUILD_DIR/ci/travis_before_script_js.sh
112114
- $TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh
113115
script:
114116
- $TRAVIS_BUILD_DIR/ci/travis_script_integration.sh
115-
# TODO(wesm): Re-enable after issues in ARROW-1409 resolved
116-
# - language: node_js
117-
# os: linux
118-
# node_js: node
119-
# before_script:
120-
# - $TRAVIS_BUILD_DIR/ci/travis_before_script_js.sh
121-
# script:
122-
# - $TRAVIS_BUILD_DIR/ci/travis_script_js.sh
117+
- language: node_js
118+
os: linux
119+
node_js: node
120+
before_script:
121+
- $TRAVIS_BUILD_DIR/ci/travis_before_script_js.sh
122+
script:
123+
- $TRAVIS_BUILD_DIR/ci/travis_script_js.sh
123124
- compiler: gcc
124125
language: cpp
125126
os: linux

LICENSE.txt

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -457,3 +457,98 @@ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
457457
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
458458
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
459459
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
460+
461+
--------------------------------------------------------------------------------
462+
463+
This project includes code from the Boost project
464+
465+
Boost Software License - Version 1.0 - August 17th, 2003
466+
467+
Permission is hereby granted, free of charge, to any person or organization
468+
obtaining a copy of the software and accompanying documentation covered by
469+
this license (the "Software") to use, reproduce, display, distribute,
470+
execute, and transmit the Software, and to prepare derivative works of the
471+
Software, and to permit third-parties to whom the Software is furnished to
472+
do so, all subject to the following:
473+
474+
The copyright notices in the Software and this entire statement, including
475+
the above license grant, this restriction and the following disclaimer,
476+
must be included in all copies of the Software, in whole or in part, and
477+
all derivative works of the Software, unless such copies or derivative
478+
works are solely in the form of machine-executable object code generated by
479+
a source language processor.
480+
481+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
482+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
483+
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
484+
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
485+
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
486+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
487+
DEALINGS IN THE SOFTWARE.
488+
489+
--------------------------------------------------------------------------------
490+
491+
This project includes code from the mapbox/variant project, BSD 3-clause
492+
license
493+
494+
Copyright (c) MapBox
495+
All rights reserved.
496+
497+
Redistribution and use in source and binary forms, with or without modification,
498+
are permitted provided that the following conditions are met:
499+
500+
- Redistributions of source code must retain the above copyright notice, this
501+
list of conditions and the following disclaimer.
502+
- Redistributions in binary form must reproduce the above copyright notice, this
503+
list of conditions and the following disclaimer in the documentation and/or
504+
other materials provided with the distribution.
505+
- Neither the name "MapBox" nor the names of its contributors may be
506+
used to endorse or promote products derived from this software without
507+
specific prior written permission.
508+
509+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
510+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
511+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
512+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
513+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
514+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
515+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
516+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
517+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
518+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
519+
520+
--------------------------------------------------------------------------------
521+
522+
This project includes code from the FlatBuffers project
523+
524+
Copyright 2014 Google Inc.
525+
526+
Licensed under the Apache License, Version 2.0 (the "License");
527+
you may not use this file except in compliance with the License.
528+
You may obtain a copy of the License at
529+
530+
http://www.apache.org/licenses/LICENSE-2.0
531+
532+
Unless required by applicable law or agreed to in writing, software
533+
distributed under the License is distributed on an "AS IS" BASIS,
534+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
535+
See the License for the specific language governing permissions and
536+
limitations under the License.
537+
538+
--------------------------------------------------------------------------------
539+
540+
This project includes code from the tslib project
541+
542+
Copyright 2015 Microsoft Corporation. All rights reserved.
543+
544+
Licensed under the Apache License, Version 2.0 (the "License");
545+
you may not use this file except in compliance with the License.
546+
You may obtain a copy of the License at
547+
548+
http://www.apache.org/licenses/LICENSE-2.0
549+
550+
Unless required by applicable law or agreed to in writing, software
551+
distributed under the License is distributed on an "AS IS" BASIS,
552+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
553+
See the License for the specific language governing permissions and
554+
limitations under the License.

ci/travis_script_integration.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,22 @@ conda install -y pip numpy six
4444
python integration_test.py --debug
4545

4646
popd
47+
48+
pushd $ARROW_JS_DIR
49+
50+
# lint and compile JS source
51+
npm run lint
52+
npm run build
53+
# create initial test data
54+
npm run test:createTestData
55+
# run once to write the snapshots
56+
npm test -- -t ts -u --integration
57+
# run again to test all builds against the snapshots
58+
npm test -- --integration
59+
# run tests against source to generate coverage data
60+
npm run test:coverage -- --integration
61+
# Uncomment to upload to coveralls
62+
# cat ./coverage/lcov.info | ./node_modules/coveralls/bin/coveralls.js;
63+
64+
65+
popd

ci/travis_script_js.sh

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,15 @@
1717
# specific language governing permissions and limitations
1818
# under the License.
1919

20-
set -e
20+
set -ex
2121

22-
JS_DIR=${TRAVIS_BUILD_DIR}/js
22+
source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh
2323

24-
pushd $JS_DIR
24+
pushd $ARROW_JS_DIR
2525

26-
npm run validate
27-
28-
# Uncomment to use coveralls
29-
# npm run test:coverage
30-
# cat ./coverage/lcov.info | ./node_modules/coveralls/bin/coveralls.js;
26+
npm run lint
27+
npm run build
28+
# run the non-snapshot unit tests
29+
npm test
3130

3231
popd

integration/integration_test.py

Lines changed: 43 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import subprocess
2929
import tempfile
3030
import uuid
31+
import errno
3132

3233
import numpy as np
3334

@@ -198,9 +199,18 @@ def __init__(self, name, is_signed, bit_width, nullable=True,
198199
self.min_value = min_value
199200
self.max_value = max_value
200201

201-
@property
202-
def numpy_type(self):
203-
return ('int' if self.is_signed else 'uint') + str(self.bit_width)
202+
def _get_generated_data_bounds(self):
203+
signed_iinfo = np.iinfo('int' + str(self.bit_width))
204+
if self.is_signed:
205+
min_value, max_value = signed_iinfo.min, signed_iinfo.max
206+
else:
207+
# ARROW-1837 Remove this hack and restore full unsigned integer
208+
# range
209+
min_value, max_value = 0, signed_iinfo.max
210+
211+
lower_bound = max(min_value, self.min_value)
212+
upper_bound = min(max_value, self.max_value)
213+
return lower_bound, upper_bound
204214

205215
def _get_type(self):
206216
return OrderedDict([
@@ -210,9 +220,7 @@ def _get_type(self):
210220
])
211221

212222
def generate_column(self, size, name=None):
213-
iinfo = np.iinfo(self.numpy_type)
214-
lower_bound = max(iinfo.min, self.min_value)
215-
upper_bound = min(iinfo.max, self.max_value)
223+
lower_bound, upper_bound = self._get_generated_data_bounds()
216224
return self.generate_range(size, lower_bound, upper_bound, name=name)
217225

218226
def generate_range(self, size, lower, upper, name=None):
@@ -521,7 +529,7 @@ def get_json(self):
521529
class BinaryColumn(PrimitiveColumn):
522530

523531
def _encode_value(self, x):
524-
return frombytes(binascii.hexlify(x))
532+
return frombytes(binascii.hexlify(x).upper())
525533

526534
def _get_buffers(self):
527535
offset = 0
@@ -785,7 +793,7 @@ def _generate_file(name, fields, batch_sizes, dictionaries=None):
785793
return JsonFile(name, schema, batches, dictionaries)
786794

787795

788-
def generate_primitive_case(batch_sizes):
796+
def generate_primitive_case(batch_sizes, name='primitive'):
789797
types = ['bool', 'int8', 'int16', 'int32', 'int64',
790798
'uint8', 'uint16', 'uint32', 'uint64',
791799
'float32', 'float64', 'binary', 'utf8']
@@ -796,7 +804,7 @@ def generate_primitive_case(batch_sizes):
796804
fields.append(get_field(type_ + "_nullable", type_, True))
797805
fields.append(get_field(type_ + "_nonnullable", type_, False))
798806

799-
return _generate_file("primitive", fields, batch_sizes)
807+
return _generate_file(name, fields, batch_sizes)
800808

801809

802810
def generate_decimal_case():
@@ -874,8 +882,8 @@ def _temp_path():
874882
return
875883

876884
file_objs = [
877-
generate_primitive_case([7, 10]),
878-
generate_primitive_case([0, 0, 0]),
885+
generate_primitive_case([17, 20], name='primitive'),
886+
generate_primitive_case([0, 0, 0], name='primitive_zerolength'),
879887
generate_decimal_case(),
880888
generate_datetime_case(),
881889
generate_nested_case(),
@@ -1079,11 +1087,33 @@ def run_all_tests(debug=False):
10791087
print('-- All tests passed!')
10801088

10811089

1090+
def write_js_test_json(directory):
1091+
generate_nested_case().write(os.path.join(directory, 'nested.json'))
1092+
generate_decimal_case().write(os.path.join(directory, 'decimal.json'))
1093+
generate_datetime_case().write(os.path.join(directory, 'datetime.json'))
1094+
(generate_dictionary_case()
1095+
.write(os.path.join(directory, 'dictionary.json')))
1096+
(generate_primitive_case([7, 10])
1097+
.write(os.path.join(directory, 'primitive.json')))
1098+
(generate_primitive_case([0, 0, 0])
1099+
.write(os.path.join(directory, 'primitive-empty.json')))
1100+
1101+
10821102
if __name__ == '__main__':
10831103
parser = argparse.ArgumentParser(description='Arrow integration test CLI')
1104+
parser.add_argument('--write_generated_json', dest='generated_json_path',
1105+
action='store', default=False,
1106+
help='Generate test JSON')
10841107
parser.add_argument('--debug', dest='debug', action='store_true',
10851108
default=False,
10861109
help='Run executables in debug mode as relevant')
1087-
10881110
args = parser.parse_args()
1089-
run_all_tests(debug=args.debug)
1111+
if args.generated_json_path:
1112+
try:
1113+
os.makedirs(args.generated_json_path)
1114+
except OSError as e:
1115+
if e.errno != errno.EEXIST:
1116+
raise
1117+
write_js_test_json(args.generated_json_path)
1118+
else:
1119+
run_all_tests(debug=args.debug)

js/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,3 +83,8 @@ package-lock.json
8383
# compilation targets
8484
dist
8585
targets
86+
87+
# test data files
88+
test/data/
89+
# jest snapshots (too big)
90+
test/__snapshots__/

0 commit comments

Comments
 (0)