Skip to content

Latest commit

 

History

History

datacleaning.ipynb: clean the dataset with ogt labeled enzyme sequences. The resulted dataset is 'data/cleaned_ogts.fasta' prepare_ogt_train_and_test_datasets.ipynb: split the obove dataset into train, validation and test datasets. Sample 10k sequences with original or uniform distributions for hyper-opt. Under data/

  • cleaned_ogts_train.fasta

  • cleaned_ogts_val.fasta

  • cleaned_ogts_test.fasta

  • ogt_for_hyperopt_original_distribution.fasta

  • ogt_for_hyperopt_uniform_distribution.fasta