Emmanuel Levy

Interface evolution and stickiness

Data from "Cellular crowding imposes global constraints on the chemistry and evolution of proteomes"
Levy et al. PNAS 2012

- matrix containing the data from E. coli
- matrix containing the data from S. cerevisiae
- matrix containing the data from H. sapiens

The three data matrices contain the data used in this work. In these matrices, each row corresponds to one amino acid, and columns contain the following information: 

ensID: Protein identifier (Uniprot for E. coli, SGD for Yeast, and Ensembl for Human). 
pdbID: pdb code of the structure and chain identifier 
pos.ens: position in the protein sequence (from the organism's proteome). 
pos.pdb: position in the protein sequence (from the pdb SEQRES field). 
aa: amino acid letter code 
rate: rate of evolution calculated with the rate4site software 
ndef: number of species defined at a given position 
aa.prop: stickiness score of the amino acid 
len: length of the protein 
ASA.rel.cplx: relative ASA of the amino-acid in the biological unit 
ASA.rel.alone: relative ASA of the amino-acid in the chain (separated from the biological unit) 
ab.all: abundance given in the pax-db database 
patch.compo.400abs: stickiness score of the 400A^2 patch surrounding the amino acid.

Data from "A simple definition of structural regions in proteins and its use in analysing interface evolution"
Levy, J. Mol. Biol. 2010

- matrix containing the data from E. coli
- matrix containing the data from S. cerevisiae
- matrix containing the data from H. sapiens

The three data matrices contain the data used in this study. In these matrices, each row corresponds to one amino acid, and columns contain the following information: 

ensID: Protein identifier (Uniprot for E. coli, SGD for Yeast, and Ensembl for Human). 
pdbID: pdb code of the structure and chain identifier 
pos.ens: position in the protein sequence (from the organism's proteome). 
pos.pdb: position in the protein sequence (from the pdb SEQRES field). 
aa: amino acid letter 
ASA.rel.cplx: relative ASA measured in the complexed form (if the protein is involved in a complex). 
ASA.rel.alone: relative ASA measured in the monomeric form (chains are split) 
nsub: number of subunits in the structure containg the chain 
sym: symmetry of the complex 
len: length of the protein 
homo: 1 if the protein is a homo-oligomer, 0 otherwise. Note: 2 stands for complexes containing paralogues only. 
cat: category an amino acid is assigned to. 1: interior, 2: surface, 3: interface support, 4: interface core, 5: interface rim. 
patch.alone.size: number of amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the monomer) 
patch.cplx.size: number of amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the complex) 
patch.cplx: interface propensity of the amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the monomer) 
patch.alone: interface propensity of the amino acids comprised in the 400A^2 patch surrounding the amino acid (measured on the monomer) 

Note: corresponding structures are available on the 3DComplex website