DISCLAIMER: THIS SOFTWARE IS MADE FREELY AVAILABLE. 
NO GUARANTEES ARE MADE REGARDING ITS CORRECTNESS OR UTILITY.

PURPOSE: GENERATE AN ENSEMBLE OF PEPTIDE/PROTEIN BACKBONE CONFORMATIONS.
USE THIS ENSEMBLE TO EVALUATE VARIOUS OBSERVABLES OF THE ENSEMBLE.

Reference: "Generating Intrinsically Disordered Protein Conformational
Ensembles from a Database of Ramachandran Space Pair Residue
Probabilities Using a Markov Chain" 
R. I. Cukier jpcb J. Phys. Chem. B 122, 9087?9101 (2018)
DOI: 10.1021/acs.jpcb.8b05797.

program analyticEns.cpp:
Generates an ensemble of conformations using
a database of pair residue conditional probabilities and a Markov
chain to "string" them together.
The output ensemble variables are the backbone phi and psi dihedrals 
of all residues in the input sequence.

program dihedtoCC.cpp 
Inputs this dihedral ensemble and converts it to
Cartesian Coordinates (CCs) for the N, CA, C and O backbone atoms.
These CC conformers are screened for atom-atom overlap and overlap
conformers eliminated.
Various observables of the conformers are evaluated:
They are: (see paper for details)
Radius of Gyration RG and its probability distribution p(RG)
Invariant Shape parameters Delta and S that reflect the ensemble
average shape of the protein.
RG versus these shape paramenters over the conformer of the ensembles. 
End-to-End distance probability distribution p(EtoE)
Distance Probability distribution p(R) - what is measured (when
Fourier transformed from wavevector to real space) in an e.g. SAXS
experiment.
NMR 3JHNH three bond couplings.

Both programs will compile using g++11 that should be available on any linux box.

The example given in these programs is for a 9 residue peptide with sequence EGAAWAASS

Instructions for:
analyticEns.cpp
For a given sequence set in analyticEns.cpp:
1)  const string resName ("EGAAWAASS"); //fasta sequence /
2) DIR where analyticEns.cpp is located holds the file condProbKmeans_ALL.dat. (It has the database
of pair conditional probabilities)
3) string sdirResults = "/path/..."; (your path to output of analyticEns.cpp)
Two parameters to set; see #define:
 NUMVECS int(1e9) //how many vectors to create - as sequence gets
 longer need more vectors. If exceed limit of constructed vectors, 
program will terminate and  let you know that. 
DENOM 10  Sets the ratio of the largest sequence probability to
smallest accepted probability. If you don't need an order of magnitude
smaller probabilities set DENOM smaller than 10 - it can then do
longer sequences, other things being equal.
OUTPUTs in above sdirResults:
OUTPUT FILES 
1)  cout to run dir: Has how many possible states there are as the
  sequence is constructed, information on how the states are pruned by
  set accepted probability (see DENOM), and how many states as a
  function of the chain construction.
files in the named dir sdirResults you set above
2)  mapPhiPsi_*.dat the state phi psi correspondence from read data 
3)  StatesAndProbsDescend_*.dat  unique states and their probs in
  descend order and number of ensemble members for each of these probabilities.
4)  Ensemble_*dat The ensemble of phi/psi dihedrals such that there
are e.g. 9 (for DENOM 10) for the highest prob down to e.g. 1 for the lowest accepted probability.

Instructions for:
DihedToCC.cpp (uses nr3.h and svd.h from "Numerical Recipes in C"
W. H. Press et al.)
INPUT DIR: sdirInputs =/path to output of analyticEns.cpp
for file in that DIR: edit string iFileName=" Ensemble_*.dat to match
Parameters to set: (see #define)
histo ranges/interval for P(R), P(RG), P(EtoE).
(Set it here versus auto-detect for your purposes).
(For histo consistency use 1 A bin size)
WRITEINTERVAL for how often to write pdbs and RG_EXT
RANGE could use this to fuzz out the phi/psi angles but I don't use
this.
DISCUT exclude configurations  if >3rd neighbors are too close
OUTPUTs:
Put results in named dir - set in string sdirOut 
OUTPUT FILES 
1) cout counts samps with number of bad distances (these samps are
excluded by vdw overlap).
Compares RG in two ways and outputs shape parameters.
Indicates how many samps kept out of total samps.
The following are for the kept (non-excluded) samps
2) *_RG_EXT.txt list of Rg, delta, and EtoE values.
3) *_PofR.txt the distance distribution function between all pairs of
atom sites, (whose Fourier transform, the scattering intensity, is
proportional to the SAXS intensity)
4) *_PofRG.txt The probability distribution of RG, the radius of gyration
5) *_PofEtoE.txt The probability distribution of the end-to-end (CA)
distance 
6) *_NMR.txt The JHNH coupling and its standard deviation over the
samps
7) *_HB.txt The 14 HBs (alpha helix) and the 13 HBs (PPII)
8) *_KEEP.pdb The pdbs of the (kept) samps
9) *_pdb The pdbs of ALL the samps used. (For examining bad overlaps
if desired)