Emmanuel Levy

QSbio

Dataset of annotated biological assemblies (QSbio) 

The data are resulting from Dey et al. Nature Methods 2018

Annotations are available for download --- the excel file contains the following columns:

  • Code: PDB code; the _X, where X is a number indicates the assembly number from PDB.
  • Code_sub: Four-letters PDB code (without assembly number)
  • PIQSI_error: PiQSi annotation
  • Qsalign_error: Qsalign annotation
  • pdb_sym: Symmetry of the PDB assembly, computed in 3DComplex
  • pdb_nsub: Number of subunits in the PDB assembly
  • h_90_repre: Whether the assembly is a representative in a dataset filtered at 90% sequence identity (from 3DComplex)
  • error_estimated: Probability that the assembly is erroneous, as estimated in a benchmark.
  • QSbio.confidence: One of five confidence range deduced from the "error_estimated"

Dataset of structures corresponding to the benchmark 
These assemblies can be downloaded in bulk here. This dataset corresponds to structures annotated as correct/errors in PiQSi (only YES/NO annotations) and filtered by 3DComplex at 90% seq. identity (h_90_repre=1 in the excel file). Note that such a filter allows for two structures to share >90% sequence identity if their graph representation is different.

Dataset of structures corresponding to these annotations 
The confidence estimates found in the Excel file correspond to assemblies from PDB. 
All the assemblies are available from the PDB or can be downloaded in bulk from 3DComplex.