DISCLAIMER: THIS SOFTWARE IS MADE FREELY AVAILABLE. NO GUARANTEES ARE MADE REGARDING ITS CORRECTNESS OR UTILITY. PURPOSE: GENERATE AN ENSEMBLE OF PEPTIDE/PROTEIN BACKBONE CONFORMATIONS. USE THIS ENSEMBLE TO EVALUATE VARIOUS OBSERVABLES OF THE ENSEMBLE. Reference: "Generating Intrinsically Disordered Protein Conformational Ensembles from a Database of Ramachandran Space Pair Residue Probabilities Using a Markov Chain" R. I. Cukier jpcb J. Phys. Chem. B 122, 9087?9101 (2018) DOI: 10.1021/acs.jpcb.8b05797. program analyticEns.cpp: Generates an ensemble of conformations using a database of pair residue conditional probabilities and a Markov chain to "string" them together. The output ensemble variables are the backbone phi and psi dihedrals of all residues in the input sequence. program dihedtoCC.cpp Inputs this dihedral ensemble and converts it to Cartesian Coordinates (CCs) for the N, CA, C and O backbone atoms. These CC conformers are screened for atom-atom overlap and overlap conformers eliminated. Various observables of the conformers are evaluated: They are: (see paper for details) Radius of Gyration RG and its probability distribution p(RG) Invariant Shape parameters Delta and S that reflect the ensemble average shape of the protein. RG versus these shape paramenters over the conformer of the ensembles. End-to-End distance probability distribution p(EtoE) Distance Probability distribution p(R) - what is measured (when Fourier transformed from wavevector to real space) in an e.g. SAXS experiment. NMR 3JHNH three bond couplings. Both programs will compile using g++11 that should be available on any linux box. The example given in these programs is for a 9 residue peptide with sequence EGAAWAASS Instructions for: analyticEns.cpp For a given sequence set in analyticEns.cpp: 1) const string resName ("EGAAWAASS"); //fasta sequence / 2) DIR where analyticEns.cpp is located holds the file condProbKmeans_ALL.dat. (It has the database of pair conditional probabilities) 3) string sdirResults = "/path/..."; (your path to output of analyticEns.cpp) Two parameters to set; see #define: NUMVECS int(1e9) //how many vectors to create - as sequence gets longer need more vectors. If exceed limit of constructed vectors, program will terminate and let you know that. DENOM 10 Sets the ratio of the largest sequence probability to smallest accepted probability. If you don't need an order of magnitude smaller probabilities set DENOM smaller than 10 - it can then do longer sequences, other things being equal. OUTPUTs in above sdirResults: OUTPUT FILES 1) cout to run dir: Has how many possible states there are as the sequence is constructed, information on how the states are pruned by set accepted probability (see DENOM), and how many states as a function of the chain construction. files in the named dir sdirResults you set above 2) mapPhiPsi_*.dat the state phi psi correspondence from read data 3) StatesAndProbsDescend_*.dat unique states and their probs in descend order and number of ensemble members for each of these probabilities. 4) Ensemble_*dat The ensemble of phi/psi dihedrals such that there are e.g. 9 (for DENOM 10) for the highest prob down to e.g. 1 for the lowest accepted probability. Instructions for: DihedToCC.cpp (uses nr3.h and svd.h from "Numerical Recipes in C" W. H. Press et al.) INPUT DIR: sdirInputs =/path to output of analyticEns.cpp for file in that DIR: edit string iFileName=" Ensemble_*.dat to match Parameters to set: (see #define) histo ranges/interval for P(R), P(RG), P(EtoE). (Set it here versus auto-detect for your purposes). (For histo consistency use 1 A bin size) WRITEINTERVAL for how often to write pdbs and RG_EXT RANGE could use this to fuzz out the phi/psi angles but I don't use this. DISCUT exclude configurations if >3rd neighbors are too close OUTPUTs: Put results in named dir - set in string sdirOut OUTPUT FILES 1) cout counts samps with number of bad distances (these samps are excluded by vdw overlap). Compares RG in two ways and outputs shape parameters. Indicates how many samps kept out of total samps. The following are for the kept (non-excluded) samps 2) *_RG_EXT.txt list of Rg, delta, and EtoE values. 3) *_PofR.txt the distance distribution function between all pairs of atom sites, (whose Fourier transform, the scattering intensity, is proportional to the SAXS intensity) 4) *_PofRG.txt The probability distribution of RG, the radius of gyration 5) *_PofEtoE.txt The probability distribution of the end-to-end (CA) distance 6) *_NMR.txt The JHNH coupling and its standard deviation over the samps 7) *_HB.txt The 14 HBs (alpha helix) and the 13 HBs (PPII) 8) *_KEEP.pdb The pdbs of the (kept) samps 9) *_pdb The pdbs of ALL the samps used. (For examining bad overlaps if desired)