DIAMOND DeepClust Data

Download Instructions

Make sure you have enough disk space available (2 TB for downloading, up to 5 TB while decompressing all files) and a stable high-bandwidth Internet connection. The use of a download manager is recommended on desktop systems. In addition, the following options are available:

An S3 client (e.g. Cyberduck, minio-client, rclone) can be used with the URL s3://objectstore.hpccloud.mpcdf.mpg.de/deepclust/

To download via https on the command line, the following `wget` command can be used:

wget --recursive --level=1 --execute robots=off --no-parent --no-host-directories --cut-dirs=2 https://objectstore.hpccloud.mpcdf.mpg.de/deepclust/index.html

The total compressed size of this download resource is 1.906.886 MB.

File Description

joined_with_index_RowGroupFinal.parquet: Parquet file conatining all sequences clustered with DIAMOND DeepClust mentioned in the publication.

clust_index_RowGroup.parquet: Index file indicating where in joined_with_index_RowGroupFinal.parquet a cluster can be found.

persistent: DuckDB Database created from clust_index_RowGroup.parquet.

SeqIdMapClustId.parquet: Parquet file containing all sequence IDs from the DeepClust Database which then can be mapped onto the cluster to which they have been assigned.

For more information see: https://github.com/drostlab/deepclust_dataretrieval

clust_bigg_2.mmseqs: MMseqs2 formatted Database containing all clusters from the DeepClust Database with more than two members to use in the context of Protein Structure Prediction and ColabFold.

clust_bigg2.fa: FASTA File containg all centroids representing clusters with more than two members.

For more information see: https://github.com/drostlab/deepclust_colabfold

Name Size Modified
DeepClustParquet.md5sum 4 KB
DeepClustParquet.readme 537 bytes
DeepClustParquet.tar.zst.00 20 GB
DeepClustParquet.tar.zst.01 20 GB
DeepClustParquet.tar.zst.02 20 GB
DeepClustParquet.tar.zst.03 20 GB
DeepClustParquet.tar.zst.04 20 GB
DeepClustParquet.tar.zst.05 20 GB
DeepClustParquet.tar.zst.06 20 GB
DeepClustParquet.tar.zst.07 20 GB
DeepClustParquet.tar.zst.08 20 GB
DeepClustParquet.tar.zst.09 20 GB
DeepClustParquet.tar.zst.10 20 GB
DeepClustParquet.tar.zst.11 20 GB
DeepClustParquet.tar.zst.12 20 GB
DeepClustParquet.tar.zst.13 20 GB
DeepClustParquet.tar.zst.14 20 GB
DeepClustParquet.tar.zst.15 20 GB
DeepClustParquet.tar.zst.16 20 GB
DeepClustParquet.tar.zst.17 20 GB
DeepClustParquet.tar.zst.18 20 GB
DeepClustParquet.tar.zst.19 20 GB
DeepClustParquet.tar.zst.20 20 GB
DeepClustParquet.tar.zst.21 20 GB
DeepClustParquet.tar.zst.22 20 GB
DeepClustParquet.tar.zst.23 20 GB
DeepClustParquet.tar.zst.24 20 GB
DeepClustParquet.tar.zst.25 20 GB
DeepClustParquet.tar.zst.26 20 GB
DeepClustParquet.tar.zst.27 20 GB
DeepClustParquet.tar.zst.28 20 GB
DeepClustParquet.tar.zst.29 20 GB
DeepClustParquet.tar.zst.30 20 GB
DeepClustParquet.tar.zst.31 20 GB
DeepClustParquet.tar.zst.32 20 GB
DeepClustParquet.tar.zst.33 20 GB
DeepClustParquet.tar.zst.34 20 GB
DeepClustParquet.tar.zst.35 20 GB
DeepClustParquet.tar.zst.36 20 GB
DeepClustParquet.tar.zst.37 20 GB
DeepClustParquet.tar.zst.38 20 GB
DeepClustParquet.tar.zst.39 20 GB
DeepClustParquet.tar.zst.40 20 GB
DeepClustParquet.tar.zst.41 20 GB
DeepClustParquet.tar.zst.42 20 GB
DeepClustParquet.tar.zst.43 20 GB
DeepClustParquet.tar.zst.44 20 GB
DeepClustParquet.tar.zst.45 20 GB
DeepClustParquet.tar.zst.46 20 GB
DeepClustParquet.tar.zst.47 20 GB
DeepClustParquet.tar.zst.48 20 GB
DeepClustParquet.tar.zst.49 20 GB
DeepClustParquet.tar.zst.50 20 GB
DeepClustParquet.tar.zst.51 20 GB
DeepClustParquet.tar.zst.52 20 GB
DeepClustParquet.tar.zst.53 20 GB
DeepClustParquet.tar.zst.54 20 GB
DeepClustParquet.tar.zst.55 20 GB
DeepClustParquet.tar.zst.56 20 GB
DeepClustParquet.tar.zst.57 20 GB
DeepClustParquet.tar.zst.58 20 GB
DeepClustParquet.tar.zst.59 20 GB
DeepClustParquet.tar.zst.60 20 GB
DeepClustParquet.tar.zst.61 20 GB
DeepClustParquet.tar.zst.62 20 GB
DeepClustParquet.tar.zst.63 20 GB
DeepClustParquet.tar.zst.64 20 GB
DeepClustParquet.tar.zst.65 20 GB
DeepClustParquet.tar.zst.66 20 GB
DeepClustParquet.tar.zst.67 20 GB
DeepClustParquet.tar.zst.68 20 GB
DeepClustParquet.tar.zst.69 20 GB
DeepClustParquet.tar.zst.70 20 GB
DeepClustParquet.tar.zst.71 20 GB
DeepClustParquet.tar.zst.72 20 GB
DeepClustParquet.tar.zst.73 20 GB
DeepClustParquet.tar.zst.74 11 GB
arch80_all.tsv.zst 913 MB
arch80_all.tsv.zst.md5sum 53 bytes
centroids.dedup.faa.md5sum 359 bytes
centroids.dedup.faa.readme 375 bytes
centroids.dedup.faa.zst.00 20 GB
centroids.dedup.faa.zst.01 20 GB
centroids.dedup.faa.zst.02 20 GB
centroids.dedup.faa.zst.03 20 GB
centroids.dedup.faa.zst.04 18 GB
clan2acc.tsv.zst 43 KB
clan2acc.tsv.zst.md5sum 51 bytes
clust.dedup2.tsv.md5sum 573 bytes
clust.dedup2.tsv.readme 360 bytes
clust.dedup2.tsv.zst.00 20 GB
clust.dedup2.tsv.zst.01 20 GB
clust.dedup2.tsv.zst.02 20 GB
clust.dedup2.tsv.zst.03 20 GB
clust.dedup2.tsv.zst.04 20 GB
clust.dedup2.tsv.zst.05 20 GB
clust.dedup2.tsv.zst.06 20 GB
clust.dedup2.tsv.zst.07 20 GB
clust.dedup2.tsv.zst.08 12 GB
clust_bigg2_mmseqs_db.md5sum 335 bytes
clust_bigg2_mmseqs_db.readme 1 KB
clust_bigg2_mmseqs_db.tar.zst.00 20 GB
clust_bigg2_mmseqs_db.tar.zst.01 20 GB
clust_bigg2_mmseqs_db.tar.zst.02 20 GB
clust_bigg2_mmseqs_db.tar.zst.03 20 GB
clust_bigg2_mmseqs_db.tar.zst.04 19 GB