Sourcery refactored master branch #1

sourcery-ai · 2022-03-25T09:53:57Z

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

sourcery-ai

Due to GitHub API limits, only the first 60 comments can be shown.

sourcery-ai · 2022-03-25T09:53:59Z

digital_recognition.py

 sess = tf.InteractiveSession()
 sess.run(init)
-for i in range(1000):
+for _ in range(1000):


Lines 28-28 refactored with the following changes:

Replace unused for index with underscore (for-index-underscore)

sourcery-ai · 2022-03-25T09:53:59Z

word_segment.py

-    output_file = open(output, "w")
-    while True:
-        line = input_file.readline()
-        if line:
-            line = line.strip()
-            seg_list = jieba.cut(line)
-            segments = ""
-            for str in seg_list:
-                segments = segments + " " + str
-            segments = segments + "\n"
-            output_file.write(segments)
-        else:
-            break
-    input_file.close()
-    output_file.close()
+    with open(output, "w") as output_file:
+        while True:
+            if line := input_file.readline():
+                line = line.strip()
+                seg_list = jieba.cut(line)
+                segments = ""
+                for str in seg_list:
+                    segments = f'{segments} {str}'
+                segments = segments + "\n"
+                output_file.write(segments)
+            else:
+                break
+        input_file.close()


Function segment refactored with the following changes:

Use with when opening file to ensure closure (ensure-file-closed)

Use named expression to simplify assignment and conditional (use-named-expression)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-03-25T09:53:59Z

chatbotv2/my_seq2seq.py

-    file_object = open('zhenhuanzhuan.segment', 'r')
-    vocab_dict = {}
-    while True:
-        line = file_object.readline()
-        if line:
+    with open('zhenhuanzhuan.segment', 'r') as file_object:
+        vocab_dict = {}
+        while True:
+            if not (line := file_object.readline()):
+                break
            for word in line.decode('utf-8').split(' '):
                if word_vector_dict.has_key(word):
                    seq.append(word_vector_dict[word])
-        else:
-            break
-    file_object.close()


Function init_seq refactored with the following changes:

Swap if/else branches (swap-if-else-branches)

Remove unnecessary else after guard condition (remove-unnecessary-else)

Use with when opening file to ensure closure (ensure-file-closed)

Use named expression to simplify assignment and conditional (use-named-expression)

sourcery-ai · 2022-03-25T09:53:59Z

chatbotv2/my_seq2seq.py

-    len = 0
-    for item in vector:
-        len += item * item
+    len = sum(item * item for item in vector)


Function vector_sqrtlen refactored with the following changes:

Convert for loop into call to sum() (sum-comprehension)

sourcery-ai · 2022-03-25T09:54:00Z

chatbotv2/my_seq2seq.py

-    value = 0
-    for item1, item2 in zip(v1, v2):
-        value += item1 * item2
+    value = sum(item1 * item2 for item1, item2 in zip(v1, v2))


Function vector_cosine refactored with the following changes:

Convert for loop into call to sum() (sum-comprehension)

sourcery-ai · 2022-03-25T09:54:01Z

chatbotv4/data_utils.py

  """Download the WMT en-fr training corpus to directory unless it's there."""
-  train_path = os.path.join(directory, "train")
-  return train_path
+  return os.path.join(directory, "train")


Function get_wmt_enfr_train_set refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

sourcery-ai · 2022-03-25T09:54:01Z

chatbotv4/data_utils.py

-  dev_path = os.path.join(directory, dev_name)
-  return dev_path
+  return os.path.join(directory, dev_name)


Function get_wmt_enfr_dev_set refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

sourcery-ai · 2022-03-25T09:54:01Z

chatbotv4/data_utils.py

-  if not gfile.Exists(vocabulary_path):
-    print("Creating vocabulary %s from data %s" % (vocabulary_path, data_path))
-    vocab = {}
-    with gfile.GFile(data_path, mode="rb") as f:
-      counter = 0
-      for line in f:
-        counter += 1
-        if counter % 100000 == 0:
-          print("  processing line %d" % counter)
-        line = tf.compat.as_bytes(line)
-        tokens = tokenizer(line) if tokenizer else basic_tokenizer(line)
-        for w in tokens:
-          word = _DIGIT_RE.sub(b"0", w) if normalize_digits else w
-          if word in vocab:
-            vocab[word] += 1
-          else:
-            vocab[word] = 1
-      vocab_list = _START_VOCAB + sorted(vocab, key=vocab.get, reverse=True)
-      if len(vocab_list) > max_vocabulary_size:
-        vocab_list = vocab_list[:max_vocabulary_size]
-      with gfile.GFile(vocabulary_path, mode="wb") as vocab_file:
-        for w in vocab_list:
-          vocab_file.write(w + b"\n")
+  if gfile.Exists(vocabulary_path):
+    return
+  print(f"Creating vocabulary {vocabulary_path} from data {data_path}")
+  vocab = {}
+  with gfile.GFile(data_path, mode="rb") as f:
+    for counter, line in enumerate(f, start=1):
+      if counter % 100000 == 0:
+        print("  processing line %d" % counter)
+      line = tf.compat.as_bytes(line)
+      tokens = tokenizer(line) if tokenizer else basic_tokenizer(line)
+      for w in tokens:
+        word = _DIGIT_RE.sub(b"0", w) if normalize_digits else w
+        if word in vocab:
+          vocab[word] += 1
+        else:
+          vocab[word] = 1
+    vocab_list = _START_VOCAB + sorted(vocab, key=vocab.get, reverse=True)
+    if len(vocab_list) > max_vocabulary_size:
+      vocab_list = vocab_list[:max_vocabulary_size]
+    with gfile.GFile(vocabulary_path, mode="wb") as vocab_file:
+      for w in vocab_list:
+        vocab_file.write(w + b"\n")


Function create_vocabulary refactored with the following changes:

Add guard clause (last-if-guard)

Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Replace manual loop counter with call to enumerate (convert-to-enumerate)

sourcery-ai · 2022-03-25T09:54:01Z

chatbotv4/data_utils.py

-  if gfile.Exists(vocabulary_path):
-    rev_vocab = []
-    with gfile.GFile(vocabulary_path, mode="rb") as f:
-      rev_vocab.extend(f.readlines())
-    rev_vocab = [tf.compat.as_bytes(line.strip()) for line in rev_vocab]
-    vocab = dict([(x, y) for (y, x) in enumerate(rev_vocab)])
-    return vocab, rev_vocab
-  else:
+  if not gfile.Exists(vocabulary_path):
    raise ValueError("Vocabulary file %s not found.", vocabulary_path)
+  rev_vocab = []
+  with gfile.GFile(vocabulary_path, mode="rb") as f:
+    rev_vocab.extend(f.readlines())
+  rev_vocab = [tf.compat.as_bytes(line.strip()) for line in rev_vocab]
+  vocab = dict([(x, y) for (y, x) in enumerate(rev_vocab)])
+  return vocab, rev_vocab


Function initialize_vocabulary refactored with the following changes:

Swap if/else branches (swap-if-else-branches)

Remove unnecessary else after guard condition (remove-unnecessary-else)

sourcery-ai · 2022-03-25T09:54:02Z

chatbotv4/data_utils.py

-  if tokenizer:
-    words = tokenizer(sentence)
-  else:
-    words = basic_tokenizer(sentence)
+  words = tokenizer(sentence) if tokenizer else basic_tokenizer(sentence)


Function sentence_to_token_ids refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2022-03-25T09:54:05Z

chatbotv4/data_utils.py

-  if not gfile.Exists(target_path):
-    print("Tokenizing data in %s" % data_path)
-    vocab, _ = initialize_vocabulary(vocabulary_path)
-    with gfile.GFile(data_path, mode="rb") as data_file:
-      with gfile.GFile(target_path, mode="w") as tokens_file:
-        counter = 0
-        for line in data_file:
-          counter += 1
-          if counter % 100000 == 0:
-            print("  tokenizing line %d" % counter)
-          token_ids = sentence_to_token_ids(tf.compat.as_bytes(line), vocab,
-                                            tokenizer, normalize_digits)
-          tokens_file.write(" ".join([str(tok) for tok in token_ids]) + "\n")
+  if gfile.Exists(target_path):
+    return
+  print(f"Tokenizing data in {data_path}")
+  vocab, _ = initialize_vocabulary(vocabulary_path)
+  with gfile.GFile(data_path, mode="rb") as data_file:
+    with gfile.GFile(target_path, mode="w") as tokens_file:
+      for counter, line in enumerate(data_file, start=1):
+        if counter % 100000 == 0:
+          print("  tokenizing line %d" % counter)
+        token_ids = sentence_to_token_ids(tf.compat.as_bytes(line), vocab,
+                                          tokenizer, normalize_digits)
+        tokens_file.write(" ".join([str(tok) for tok in token_ids]) + "\n")


Function data_to_token_ids refactored with the following changes:

Add guard clause (last-if-guard)

Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Replace manual loop counter with call to enumerate (convert-to-enumerate)

sourcery-ai · 2022-03-25T09:54:05Z

chatbotv4/data_utils.py

-  input_train_path = train_path + ".input"
-  output_train_path = train_path + ".output"
-  input_dev_path = dev_path + ".input"
-  output_dev_path = dev_path + ".output"
+  input_train_path = f'{train_path}.input'
+  output_train_path = f'{train_path}.output'
+  input_dev_path = f'{dev_path}.input'
+  output_dev_path = f'{dev_path}.output'


Function prepare_wmt_data refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-03-25T09:54:05Z

chatbotv4/demo.py

-    train_set = [[[5, 7, 9], [11, 13, 15, EOS_ID]], [[5, 7, 9], [11, 13, 15, EOS_ID]]]
-    encoder_input_0 = [PAD_ID] * (input_seq_len - len(train_set[0][0])) + train_set[0][0]
-    encoder_input_1 = [PAD_ID] * (input_seq_len - len(train_set[1][0])) + train_set[1][0]
-    decoder_input_0 = [GO_ID] + train_set[0][1] + [PAD_ID] * (output_seq_len - len(train_set[0][1]) - 1)
-    decoder_input_1 = [GO_ID] + train_set[1][1] + [PAD_ID] * (output_seq_len - len(train_set[1][1]) - 1)
-
-    encoder_inputs = []
    decoder_inputs = []
    target_weights = []
-    for length_idx in xrange(input_seq_len):
-        encoder_inputs.append(np.array([encoder_input_0[length_idx], encoder_input_1[length_idx]], dtype=np.int32))
+    train_set = [
+        [[5, 7, 9], [11, 13, 15, EOS_ID]],
+        [[5, 7, 9], [11, 13, 15, EOS_ID]],
+    ]
+
+    encoder_input_0 = [PAD_ID] * (
+        input_seq_len - len(train_set[0][0])
+    ) + train_set[0][0]
+
+    encoder_input_1 = [PAD_ID] * (
+        input_seq_len - len(train_set[1][0])
+    ) + train_set[1][0]
+
+    decoder_input_0 = (
+        [GO_ID]
+        + train_set[0][1]
+        + [PAD_ID] * (output_seq_len - len(train_set[0][1]) - 1)
+    )
+
+    decoder_input_1 = (
+        [GO_ID]
+        + train_set[1][1]
+        + [PAD_ID] * (output_seq_len - len(train_set[1][1]) - 1)
+    )
+
+    encoder_inputs = [
+        np.array(
+            [encoder_input_0[length_idx], encoder_input_1[length_idx]],
+            dtype=np.int32,
+        )
+        for length_idx in xrange(input_seq_len)
+    ]
+


Function get_samples refactored with the following changes:

Move assignment closer to its usage within a block (move-assign-in-block)

Convert for loop into list comprehension (list-comprehension)

sourcery-ai · 2022-03-25T09:54:05Z

chatbotv4/demo.py

-    encoder_inputs = []
-    decoder_inputs = []
-    target_weights = []
-    for i in xrange(input_seq_len):
-        encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i)))
-    for i in xrange(output_seq_len + 1):
-        decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i)))
-    for i in xrange(output_seq_len):
-        target_weights.append(tf.placeholder(tf.float32, shape=[None], name="weight{0}".format(i)))
+    encoder_inputs = [
+        tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i))
+        for i in xrange(input_seq_len)
+    ]
+
+    decoder_inputs = [
+        tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i))
+        for i in xrange(output_seq_len + 1)
+    ]
+
+    target_weights = [
+        tf.placeholder(tf.float32, shape=[None], name="weight{0}".format(i))
+        for i in xrange(output_seq_len)
+    ]


Function get_model refactored with the following changes:

Move assignment closer to its usage within a block (move-assign-in-block)

Convert for loop into list comprehension (list-comprehension)

sourcery-ai · 2022-03-25T09:54:05Z

chatbotv4/seq2seq_model.py

-    input_feed = {}
-    for l in xrange(encoder_size):
-      input_feed[self.encoder_inputs[l].name] = encoder_inputs[l]
+    input_feed = {
+        self.encoder_inputs[l].name: encoder_inputs[l]
+        for l in xrange(encoder_size)
+    }


Function Seq2SeqModel.step refactored with the following changes:

Swap if/else branches of if expression to remove negation (swap-if-expression)

Convert for loop into dictionary comprehension (dict-comprehension)

Replace a for append loop with list extend (for-append-to-extend)

Replace if statement with if expression (assign-if-exp)

This removes the following comments ( why? ):

# No gradient norm, loss, outputs. # Output logits. # Gradient norm, loss, no outputs.

sourcery-ai · 2022-03-25T09:54:07Z

lstm_code/nicodjimenez/lstm.py

-        if s_prev == None: s_prev = np.zeros_like(self.state.s)
-        if h_prev == None: h_prev = np.zeros_like(self.state.h)
+        if s_prev is None: s_prev = np.zeros_like(self.state.s)
+        if h_prev is None: h_prev = np.zeros_like(self.state.h)


Function LstmNode.bottom_data_is refactored with the following changes:

Use x is None rather than x == None (none-compare)

sourcery-ai · 2022-03-25T09:54:07Z

lstm_code/nicodjimenez/test2.py

-        self.primes = list()
+        self.primes = []
        for i in range(2, 100):
-            is_prime = True
-            for j in range(2, i-1):
-                if i % j == 0:
-                    is_prime = False
+            is_prime = all(i % j != 0 for j in range(2, i-1))


Function Primes.__init__ refactored with the following changes:

Replace list() with [] (list-literal)

Use any() instead of for loop (use-any)

Invert any/all to simplify comparisons (invert-any-all)

sourcery-ai · 2022-03-25T09:54:07Z

seq2seq/tflearn_prj/07_lstm.py

-    
+
    for i in range(100):
        for start, end in zip(range(0, len(trX), batch_size), range(batch_size, len(trX)+1, batch_size)):
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
-    
+
        test_indices = np.arange(len(teX))  # Get A Test Batch
        np.random.shuffle(test_indices)
-        test_indices = test_indices[0:test_size]
-        
+        test_indices = test_indices[:test_size]
+


Lines 79-87 refactored with the following changes:

Replace a[0:x] with a[:x] and a[x:len(a)] with a[x:] (remove-redundant-slice-index)

sourcery-ai · 2022-03-25T09:54:07Z

seq2seq/tflearn_prj/my_lstm_test.py

-        self.primes = list()
+        self.primes = []
        for i in range(2, 100):
-            is_prime = True
-            for j in range(2, i-1):
-                if i % j == 0:
-                    is_prime = False
+            is_prime = all(i % j != 0 for j in range(2, i-1))


Function Primes.__init__ refactored with the following changes:

Replace list() with [] (list-literal)

Use any() instead of for loop (use-any)

Invert any/all to simplify comparisons (invert-any-all)

sourcery-ai · 2022-03-25T09:54:08Z