× Download datasets Datasets size Amino Acid Composition Distribution Amino acid length distribution
☰ Navigation

Download datasets

This study used a total of 37 datasets.
Max length: The maximum length of any peptide sequence within the dataset.
Min length: The minimum length of any peptide sequence within the dataset.
Train positive: The number of bioactive peptides used for model training.
Train negative: The number of non-bioactive peptides used for model training.
Test positive: The number of bioactive peptides used for model evaluation.
Test negative: The number of non-bioactive peptides used for model evaluation.
Note: In the provided Excel file, peptide bioactivity is encoded as follows:

· 0 = Bioactive (Positive)

· 1 = Non-bioactive (Negative)

Download all Dataset
No. Activity Max length Min length Train positive Train negative Test positive Test negative Download
1 Anti-hypertensive 81 5 931 931 386 386 Download training set
Download test set
2 DPP-IV inhibitory 90 2 532 532 133 133 Download training set
Download test set
3 Bitter 39 2 256 256 64 64 Download training set
Download test set
4 Umami 39 2 112 241 28 61 Download training set
Download test set
5 Anti-microbial 180 11 3876 9552 2584 6369 Download training set
Download test set
6 Anti-malarial main 183 2 111 1708 28 427 Download training set
Download test set
7 Anti-malarial alternative 81 2 111 542 28 135 Download training set
Download test set
8 Quorum sensing 64 5 200 200 20 20 Download training set
Download test set
9 Anti-cancer main 50 3 689 689 172 172 Download training set
Download test set
10 Anti-cancer alternative 50 3 776 776 194 194 Download training set
Download test set
11 Anti-MRSAs trains 70 11 118 678 30 169 Download training set
Download test set
12 Tumor t cell antigens 20 8 442 442 111 111 Download training set
Download test set
13 Blood–brain barrier 50 5 326 326 99 99 Download training set
Download test set
14 Anti-parasitic activity 254 5 255 1863 46 46 Download training set
Download test set
15 Neuro peptide 100 5 1940 1940 485 485 Download training set
Download test set
16 Anti-bacterial 30 3 4540 5809 1112 1475 Download training set
Download test set
17 Anti-fungal activity 100 4 1168 1168 291 291 Download training set
Download test set
18 Anti-angio activity 67 11 107 107 28 28 Download training set
Download test set
19 Anti-viral activity 51 4 3272 3272 818 818 Download training set
Download test set
20 Toxicity 50 13 1621 1663 312 270 Download training set
Download test set
21 Anti-inflammatory peptides 30 7 690 1009 173 253 Download training set
Download test set
22 Anti-oxidant activity 20 2 848 848 212 212 Download training set
Download test set
24 IL-5 inducing peptides main 20 9 1525 6027 382 1552 Download training set
Download test set
25 IL-5 inducing peptides alternative 20 9 1525 1526 382 381 Download training set
Download test set
26 IL-6 inducing peptides 25 8 292 2393 73 598 Download training set
Download test set
27 IL-13 inducing peptides 35 8 250 2326 63 582 Download training set
Download test set
28 Anti-tubercular Peptides RD 61 5 199 199 47 47 Download training set
Download test set
29 Cell-penetrating 61 10 364 369 98 85 Download training set
Download test set
29 Cell-penetrating MLCCP 61 6 573 574 157 2184 Download training set
Download test set
30 Tumor-homing peptides main 31 4 521 521 130 130 Download training set
Download test set
31 Tumor-homing peptides small 10 4 375 375 94 94 Download training set
Download test set
32 Anti-coronavirus activity 100 6 649 651 65 2231 Download training set
Download test set
33 Viral integrase inhibitory peptides VINIP 58 8 110 650 12 71 Download training set
Download test set
34 Viral integrase inhibitory peptides INI 62 8 110 459 12 51 Download training set
Download test set
35 Anti-diabetic 41 11 188 188 48 48 Download training set
Download test set
36 Biofilm inhibitory 54 5 201 80 10 10 Download training set
Download test set
37 Hemolytic 100 5 433 423 663 1999 Download training set
Download test set

The dataset size of bioactive peptides

37 datasets are divided into four categories: Disease Related, Antimicrobial Related, Immune Related, and Peptide Property. Click on the buttons to switch.
Train_pos:positive samples in the training set.
Train_neg:negative samples in the training set.
Test_pos:positive samples in the independent test set.
Test_neg:negative samples in the independent test set.

The amino acid composition of bioactive peptides

The figure describes the distribution of amino acid participation, where darker shades of orange represent higher proportions in the dataset.
Train_pos:positive samples in the training set.
Train_neg:negative samples in the training set.
Test_pos:positive samples in the independent test set.
Test_neg:negative samples in the independent test set.

The length distribution of peptides

The wider the violin plot, the more dispersed the amino acids of that length; the narrower it is, the more concentrated the amino acids of that length.
Train_pos:positive samples in the training set.
Train_neg:negative samples in the training set.
Test_pos:positive samples in the independent test set.
Test_neg:negative samples in the independent test set.