Skip to content

Conversation

@sourcery-ai
Copy link

@sourcery-ai sourcery-ai bot commented Mar 25, 2022

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot force-pushed the sourcery/master branch from 72f4575 to 821b9fd Compare March 25, 2022 09:53
@sourcery-ai sourcery-ai bot requested a review from allendred March 25, 2022 09:53
Copy link
Author

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to GitHub API limits, only the first 60 comments can be shown.

sess = tf.InteractiveSession()
sess.run(init)
for i in range(1000):
for _ in range(1000):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 28-28 refactored with the following changes:

Comment on lines -12 to +24
output_file = open(output, "w")
while True:
line = input_file.readline()
if line:
line = line.strip()
seg_list = jieba.cut(line)
segments = ""
for str in seg_list:
segments = segments + " " + str
segments = segments + "\n"
output_file.write(segments)
else:
break
input_file.close()
output_file.close()
with open(output, "w") as output_file:
while True:
if line := input_file.readline():
line = line.strip()
seg_list = jieba.cut(line)
segments = ""
for str in seg_list:
segments = f'{segments} {str}'
segments = segments + "\n"
output_file.write(segments)
else:
break
input_file.close()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function segment refactored with the following changes:

Comment on lines -66 to -76
file_object = open('zhenhuanzhuan.segment', 'r')
vocab_dict = {}
while True:
line = file_object.readline()
if line:
with open('zhenhuanzhuan.segment', 'r') as file_object:
vocab_dict = {}
while True:
if not (line := file_object.readline()):
break
for word in line.decode('utf-8').split(' '):
if word_vector_dict.has_key(word):
seq.append(word_vector_dict[word])
else:
break
file_object.close()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function init_seq refactored with the following changes:

Comment on lines -79 to +76
len = 0
for item in vector:
len += item * item
len = sum(item * item for item in vector)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function vector_sqrtlen refactored with the following changes:

Comment on lines -90 to +85
value = 0
for item1, item2 in zip(v1, v2):
value += item1 * item2
value = sum(item1 * item2 for item1, item2 in zip(v1, v2))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function vector_cosine refactored with the following changes:

"""Download the WMT en-fr training corpus to directory unless it's there."""
train_path = os.path.join(directory, "train")
return train_path
return os.path.join(directory, "train")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_wmt_enfr_train_set refactored with the following changes:

Comment on lines -66 to +65
dev_path = os.path.join(directory, dev_name)
return dev_path
return os.path.join(directory, dev_name)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_wmt_enfr_dev_set refactored with the following changes:

Comment on lines -96 to +115
if not gfile.Exists(vocabulary_path):
print("Creating vocabulary %s from data %s" % (vocabulary_path, data_path))
vocab = {}
with gfile.GFile(data_path, mode="rb") as f:
counter = 0
for line in f:
counter += 1
if counter % 100000 == 0:
print(" processing line %d" % counter)
line = tf.compat.as_bytes(line)
tokens = tokenizer(line) if tokenizer else basic_tokenizer(line)
for w in tokens:
word = _DIGIT_RE.sub(b"0", w) if normalize_digits else w
if word in vocab:
vocab[word] += 1
else:
vocab[word] = 1
vocab_list = _START_VOCAB + sorted(vocab, key=vocab.get, reverse=True)
if len(vocab_list) > max_vocabulary_size:
vocab_list = vocab_list[:max_vocabulary_size]
with gfile.GFile(vocabulary_path, mode="wb") as vocab_file:
for w in vocab_list:
vocab_file.write(w + b"\n")
if gfile.Exists(vocabulary_path):
return
print(f"Creating vocabulary {vocabulary_path} from data {data_path}")
vocab = {}
with gfile.GFile(data_path, mode="rb") as f:
for counter, line in enumerate(f, start=1):
if counter % 100000 == 0:
print(" processing line %d" % counter)
line = tf.compat.as_bytes(line)
tokens = tokenizer(line) if tokenizer else basic_tokenizer(line)
for w in tokens:
word = _DIGIT_RE.sub(b"0", w) if normalize_digits else w
if word in vocab:
vocab[word] += 1
else:
vocab[word] = 1
vocab_list = _START_VOCAB + sorted(vocab, key=vocab.get, reverse=True)
if len(vocab_list) > max_vocabulary_size:
vocab_list = vocab_list[:max_vocabulary_size]
with gfile.GFile(vocabulary_path, mode="wb") as vocab_file:
for w in vocab_list:
vocab_file.write(w + b"\n")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function create_vocabulary refactored with the following changes:

Comment on lines -140 to +144
if gfile.Exists(vocabulary_path):
rev_vocab = []
with gfile.GFile(vocabulary_path, mode="rb") as f:
rev_vocab.extend(f.readlines())
rev_vocab = [tf.compat.as_bytes(line.strip()) for line in rev_vocab]
vocab = dict([(x, y) for (y, x) in enumerate(rev_vocab)])
return vocab, rev_vocab
else:
if not gfile.Exists(vocabulary_path):
raise ValueError("Vocabulary file %s not found.", vocabulary_path)
rev_vocab = []
with gfile.GFile(vocabulary_path, mode="rb") as f:
rev_vocab.extend(f.readlines())
rev_vocab = [tf.compat.as_bytes(line.strip()) for line in rev_vocab]
vocab = dict([(x, y) for (y, x) in enumerate(rev_vocab)])
return vocab, rev_vocab
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function initialize_vocabulary refactored with the following changes:

Comment on lines -170 to +166
if tokenizer:
words = tokenizer(sentence)
else:
words = basic_tokenizer(sentence)
words = tokenizer(sentence) if tokenizer else basic_tokenizer(sentence)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function sentence_to_token_ids refactored with the following changes:

Comment on lines -196 to +200
if not gfile.Exists(target_path):
print("Tokenizing data in %s" % data_path)
vocab, _ = initialize_vocabulary(vocabulary_path)
with gfile.GFile(data_path, mode="rb") as data_file:
with gfile.GFile(target_path, mode="w") as tokens_file:
counter = 0
for line in data_file:
counter += 1
if counter % 100000 == 0:
print(" tokenizing line %d" % counter)
token_ids = sentence_to_token_ids(tf.compat.as_bytes(line), vocab,
tokenizer, normalize_digits)
tokens_file.write(" ".join([str(tok) for tok in token_ids]) + "\n")
if gfile.Exists(target_path):
return
print(f"Tokenizing data in {data_path}")
vocab, _ = initialize_vocabulary(vocabulary_path)
with gfile.GFile(data_path, mode="rb") as data_file:
with gfile.GFile(target_path, mode="w") as tokens_file:
for counter, line in enumerate(data_file, start=1):
if counter % 100000 == 0:
print(" tokenizing line %d" % counter)
token_ids = sentence_to_token_ids(tf.compat.as_bytes(line), vocab,
tokenizer, normalize_digits)
tokens_file.write(" ".join([str(tok) for tok in token_ids]) + "\n")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function data_to_token_ids refactored with the following changes:

Comment on lines -234 to +229
input_train_path = train_path + ".input"
output_train_path = train_path + ".output"
input_dev_path = dev_path + ".input"
output_dev_path = dev_path + ".output"
input_train_path = f'{train_path}.input'
output_train_path = f'{train_path}.output'
input_dev_path = f'{dev_path}.input'
output_dev_path = f'{dev_path}.output'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function prepare_wmt_data refactored with the following changes:

Comment on lines -36 to +70
train_set = [[[5, 7, 9], [11, 13, 15, EOS_ID]], [[5, 7, 9], [11, 13, 15, EOS_ID]]]
encoder_input_0 = [PAD_ID] * (input_seq_len - len(train_set[0][0])) + train_set[0][0]
encoder_input_1 = [PAD_ID] * (input_seq_len - len(train_set[1][0])) + train_set[1][0]
decoder_input_0 = [GO_ID] + train_set[0][1] + [PAD_ID] * (output_seq_len - len(train_set[0][1]) - 1)
decoder_input_1 = [GO_ID] + train_set[1][1] + [PAD_ID] * (output_seq_len - len(train_set[1][1]) - 1)

encoder_inputs = []
decoder_inputs = []
target_weights = []
for length_idx in xrange(input_seq_len):
encoder_inputs.append(np.array([encoder_input_0[length_idx], encoder_input_1[length_idx]], dtype=np.int32))
train_set = [
[[5, 7, 9], [11, 13, 15, EOS_ID]],
[[5, 7, 9], [11, 13, 15, EOS_ID]],
]

encoder_input_0 = [PAD_ID] * (
input_seq_len - len(train_set[0][0])
) + train_set[0][0]

encoder_input_1 = [PAD_ID] * (
input_seq_len - len(train_set[1][0])
) + train_set[1][0]

decoder_input_0 = (
[GO_ID]
+ train_set[0][1]
+ [PAD_ID] * (output_seq_len - len(train_set[0][1]) - 1)
)

decoder_input_1 = (
[GO_ID]
+ train_set[1][1]
+ [PAD_ID] * (output_seq_len - len(train_set[1][1]) - 1)
)

encoder_inputs = [
np.array(
[encoder_input_0[length_idx], encoder_input_1[length_idx]],
dtype=np.int32,
)
for length_idx in xrange(input_seq_len)
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_samples refactored with the following changes:

Comment on lines -59 to +96
encoder_inputs = []
decoder_inputs = []
target_weights = []
for i in xrange(input_seq_len):
encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i)))
for i in xrange(output_seq_len + 1):
decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i)))
for i in xrange(output_seq_len):
target_weights.append(tf.placeholder(tf.float32, shape=[None], name="weight{0}".format(i)))
encoder_inputs = [
tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i))
for i in xrange(input_seq_len)
]

decoder_inputs = [
tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i))
for i in xrange(output_seq_len + 1)
]

target_weights = [
tf.placeholder(tf.float32, shape=[None], name="weight{0}".format(i))
for i in xrange(output_seq_len)
]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_model refactored with the following changes:

Comment on lines -232 to +235
input_feed = {}
for l in xrange(encoder_size):
input_feed[self.encoder_inputs[l].name] = encoder_inputs[l]
input_feed = {
self.encoder_inputs[l].name: encoder_inputs[l]
for l in xrange(encoder_size)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Seq2SeqModel.step refactored with the following changes:

This removes the following comments ( why? ):

# No gradient norm, loss, outputs.
# Output logits.
# Gradient norm, loss, no outputs.

Comment on lines -82 to +83
if s_prev == None: s_prev = np.zeros_like(self.state.s)
if h_prev == None: h_prev = np.zeros_like(self.state.h)
if s_prev is None: s_prev = np.zeros_like(self.state.s)
if h_prev is None: h_prev = np.zeros_like(self.state.h)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function LstmNode.bottom_data_is refactored with the following changes:

Comment on lines -22 to +24
self.primes = list()
self.primes = []
for i in range(2, 100):
is_prime = True
for j in range(2, i-1):
if i % j == 0:
is_prime = False
is_prime = all(i % j != 0 for j in range(2, i-1))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Primes.__init__ refactored with the following changes:

Comment on lines -79 to +87

for i in range(100):
for start, end in zip(range(0, len(trX), batch_size), range(batch_size, len(trX)+1, batch_size)):
sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})

test_indices = np.arange(len(teX)) # Get A Test Batch
np.random.shuffle(test_indices)
test_indices = test_indices[0:test_size]
test_indices = test_indices[:test_size]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 79-87 refactored with the following changes:

Comment on lines -15 to +17
self.primes = list()
self.primes = []
for i in range(2, 100):
is_prime = True
for j in range(2, i-1):
if i % j == 0:
is_prime = False
is_prime = all(i % j != 0 for j in range(2, i-1))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Primes.__init__ refactored with the following changes:

def __init__(self, name=None, in_seq_len=None, out_seq_len=None):
if name is not None:
assert hasattr(self, "%s_sequence" % name)
assert hasattr(self, f"{name}_sequence")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SequencePattern.__init__ refactored with the following changes:

This procedure defines the pattern which the seq2seq RNN will be trained to find.
'''
return getattr(self, "%s_sequence" % self.PATTERN_NAME)(x)
return getattr(self, f"{self.PATTERN_NAME}_sequence")(x)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SequencePattern.generate_output_sequence refactored with the following changes:

'''
ret = np.array( sorted(x) )[:self.OUTPUT_SEQUENCE_LENGTH]
return ret
return np.array( sorted(x) )[:self.OUTPUT_SEQUENCE_LENGTH]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SequencePattern.sorted_sequence refactored with the following changes:

sl = seq2seq.sequence_loss(logits, targets, weights)
#print ("my_sequence_loss return = %s" % sl)
return sl
return seq2seq.sequence_loss(logits, targets, weights)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TFLearnSeq2Seq.sequence_loss refactored with the following changes:

This removes the following comments ( why? ):

#print ("my_sequence_loss weights=%s" % (weights,))

Comment on lines -134 to +139
def accuracy(self, y_pred, y_true, x_in): # y_pred is [-1, self.out_seq_len, num_decoder_symbols]; y_true is [-1, self.out_seq_len]
def accuracy(self, y_pred, y_true, x_in): # y_pred is [-1, self.out_seq_len, num_decoder_symbols]; y_true is [-1, self.out_seq_len]
'''
Compute accuracy of the prediction, based on the true labels. Use the average number of equal
values.
'''
pred_idx = tf.to_int32(tf.argmax(y_pred, 2)) # [-1, self.out_seq_len]
#print ("my_accuracy pred_idx = %s" % pred_idx)
accuracy = tf.reduce_mean(tf.cast(tf.equal(pred_idx, y_true), tf.float32), name='acc')
return accuracy
return tf.reduce_mean(
tf.cast(tf.equal(pred_idx, y_true), tf.float32), name='acc'
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TFLearnSeq2Seq.accuracy refactored with the following changes:

This removes the following comments ( why? ):

#print ("my_accuracy pred_idx = %s" % pred_idx)

Comment on lines -158 to +159
checkpoint_path = checkpoint_path or ("%s%ss2s_checkpoint.tfl" % (self.data_dir or "", "/" if self.data_dir else ""))
checkpoint_path = (
checkpoint_path
or f'{self.data_dir or ""}{"/" if self.data_dir else ""}s2s_checkpoint.tfl'
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TFLearnSeq2Seq.model refactored with the following changes:

This removes the following comments ( why? ):

# for TFLearn to know what to save and restore

Comment on lines -253 to +245
if options.file_in:
if options.file_in == '-':
file_in = sys.stdin
else:
file_in = open(options.file_in)
else:
if options.file_in and options.file_in == '-' or not options.file_in:
file_in = sys.stdin
if options.file_out:
if options.file_out == '-':
file_out = sys.stdout
else:
file_out = open(options.file_out, 'wb')
else:
file_in = open(options.file_in)
if options.file_out and options.file_out == '-' or not options.file_out:
file_out = sys.stdout

else:
file_out = open(options.file_out, 'wb')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function run refactored with the following changes:

Comment on lines -13 to +14
fp = open('result/'+file_name, 'w')
fp.write(item['body'])
fp.close()
with open(f'result/{file_name}', 'w') as fp:
fp.write(item['body'])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SubtitleCrawlerPipeline.process_item refactored with the following changes:

url = response.urljoin(href)
request = scrapy.Request(url, callback=self.parse_detail)
yield request
yield scrapy.Request(url, callback=self.parse_detail)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SubTitleSpider.parse refactored with the following changes:

test_xs, test_ys = samples.test_sets()

for i in range(10000):
for _ in range(10000):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function train refactored with the following changes:

is_predict = True
if len(sys.argv) > 1 and sys.argv[1] == "train":
is_predict = False
is_predict = len(sys.argv) <= 1 or sys.argv[1] != "train"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 75-77 refactored with the following changes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant