[关闭]
@Billy-The-Crescent 2019-10-18T15:35:08.000000Z 字数 15630 阅读 1177

Protein-Ligand Simulation Using GROMACS

Protein-Ligand complex simulation GROMACS



Software: GROMACS

GROMACS Manual

Platform: Linux

Initial Protein: HisG_MD.pdb

Initial Ligand: ADN.pdb


1 Check GROMACS

GROMACS location:

/usr/local/gromacs/bin/

Set environment variable:

  1. export PATH=$PATH:/usr/local/gromacs/bin/
  2. #test if environment works
  3. gmx -version

2 HisG-Adenosine complex MD Simulation

2.1 Prepare Molecular Topology file

As the long time consumed in this simulation, we only conduct the simulation for HisG. Since we have conducted Molecular Dynamics Simulation of the enzyme cns1, cns2, cns3 and its HisG domain, we'd like to use the result structure as the input protein file. For cns3-HisG domain we call it HisG_MD.pdb.

The first thing to do is to find and download the ligand structure, here, we search "Adenosine" in PDBeChem and dowload the ".pdb" file, we call it ADN.pdb.

According to the manual, the ligand is not a recognized entity in any of the force fields provided with GROMACS, so we can not use pdb2gmx to generate the topology of the ligand.

The force field we will be using in this tutorial is CHARMM36, obtained from the MacKerell lab website. While there, download the latest CHARMM36 force field tarball and the "cgenff_charmm2gmx.py" conversion script, which we will use later. Download the version of the conversion script that corresponds to your installed Python version (Python 2.x or 3.x).

Unarchive the force field tarball in your working directory:

  1. tar -zxvf charmm36-mar2019.ff.tgz

There should now be a "charmm36-mar2019.ff" subdirectory in your working directory.

2.1.1 Generate the protein topology

Then, Write the topology for HisG with pdb2gmx:

  1. gmx pdb2gmx -f HisG_MD.pdb -ignh -o HisG_MD_processed.gro

In this operation, we ignore the water using -ignh or we will get a fatal error due to the inconsistence of the name of the water. Choose the default water model when prompted (TIP3P). The structure will be processed by pdb2gmx, and you will be prompted to choose a force field:
image_1dkacq5du1soc9ng17ftoeq1kp29.png-129.5kB

Choose the first force filed which is just installed by us manually, CHARMM36 all-atom force field (March 2019).

2.1.1 Generate the ligand topology

Let's move to deal with the ligand.

Here, we use CGenFF, an official CHARMM General Force Field server to generate the topology of the ligand.

Since the strict requirement of this online server on the file format of the ligand. We need to:

  • Add Hydrogen Atoms to the ligand
  • Convert the pdb file to mol2 file
  • Several corrections to the generated mol2 file

We use avogadro to add the hydrogen atoms and convert it to mol2 format.

Open Adenosine.pdb in Avogadro, and from the "Build" menu, choose "Add Hydrogens." Avogadro will build all of the H atoms onto the Adenosine ligand. Save a .mol2 file (File -> Save As... and choose Sybyl Mol2 from the drop-down menu) named "ADN.mol2."

In order to correct the mol2 file, we need to look through it.

  • Change the name of the molecule to Adenosine
  • Change the sixth column to 1
  • Change the seventh column to Adenosine
  • Change the name of the atoms uniquely

image_1dkb1qu7b8k43ig3uv1ju1ldp1g.png-154.7kB

Last, notice the strange bond order in the @BOND section. All programs seem to have their own method for generating this list, but not all are created equal. There will be issues in constructing a correct topology with matching coordinates if the bonds are not listed in ascending order. To fix this problem, download the sort_mol2_bonds.pl script and execute it:

Run the script to fix the bonds.

  1. perl sort_mol2_bonds.pl ADN.mol2 ADN_fix.mol2

Use "ADN_fix.mol2" in the next step.

Now, we'd like to generate the topology of the ligand using CGenFF server.

The ADN_fix.mol2 file is now ready for use to produce a topology. Visit the CGenFF server, log into your account, and and click "Upload molecule" at the top of the page. Upload Adenosine_fix.mol2 and the CGenFF server will quickly return a topology in the form of a CHARMM "stream" file (extension .str). Save its contents from your web browser into a file called "ADN.str."

Check the penalty of the .str file carefully! If the penalty is high, please double check the original ligand file whether it is a valid structure.

Alter the resi name of the .str file of to ADN, or it will create error in the next step.

Then we use the python script to generate the topology:

  1. python cgenff_charmm2gmx.py ADN ADN_fix.mol2 ADN.str charmm36-mar2019.ff

If we do not alter the name of the RESI (the figure below),
image_1dkb6akr61l313rup351q6vquu1t.png-45.8kB

we will get the error:
image_1dkb6pdbbe1l1dn4gus10mp18f82a.png-219.6kB

It means that the mol2 file and the str file is not consistent.

When we successfully run the transcript, we will get the result below:
image_1dkb6rpge1ss1scfa3f7quec42n.png-48.5kB

We get adn.itp, adn.prm, adn.top, adn_ini.pdb.

We convert the adn_ini.pdb to adn.gro using editconf:

  1. gmx editconf -f adn_ini.pdb -o adn.gro

2.2 Build Complex Topology file

Copy HisG_MD_processed.gro to a new file to be manipulated, for instance, call it "complex.gro," as the addition of ADN to the protein will generate our protein-ligand complex. Next, simply copy the coordinate section of adn.gro and paste it into complex.gro, below the last line of the protein atoms, and before the box vectors, like so:

image_1dkb7vhtd1vc4ln8kut13mut7441.png-122.8kB

Since we add 32 atoms into the .gro file, increment the second line of complex.gro to reflect this change. There should be 12439 atoms in the coordinate file now.

Insert a line that says #include "adn.itp" into topol.top after the position restraint file is included. The inclusion of position restraints indicates the end of the "Protein" moleculetype section.

  1. ; Include Position restraint file
  2. #ifdef POSRES
  3. #include "posre.itp"
  4. #endif
  5. ; Include water topology
  6. #include "./charmm36-mar2019.ff/tip3p.itp"

becomes

  1. ; Include Position restraint file
  2. #ifdef POSRES
  3. #include "posre.itp"
  4. #endif
  5. ; Include ligand topology
  6. #include "adn.itp"
  7. ; Include water topology
  8. #include "./charmm36-mar2019.ff/tip3p.itp"

The ligand introduces new dihedral parameters, which were written to "adn.prm" by the cgenff_charmm2gmx_py3.py script. At the TOP of topol.top, insert an #include statement to add these parameters:

  1. ; Include forcefield parameters
  2. #include "./charmm36-mar2019.ff/forcefield.itp"
  3. ; Include ligand parameters
  4. #include "adn.prm"
  5. [ moleculetype ]
  6. ; Name nrexcl
  7. Protein_chain_A 3

The last adjustment to be made is in the [ molecules ] directive. To account for the fact that there is a new molecule in complex.gro, we have to add it here, like so:

  1. [ molecules ]
  2. ; Compound #mols
  3. Protein_chain_A 1
  4. ADN 1

Remember to substitude the original topol.top with the updated one.

2.3 MD Simulation

At this point, the workflow is just like any other MD simulation. We will define the unit cell and fill it with water.

  1. gmx editconf -f complex.gro -o newbox.gro -bt dodecahedron -d 1.0
  2. gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro

Use em.mdp to build the ion.tpr and conduct energy minimization:

em.mdp:

  1. ; LINES STARTING WITH ';' ARE COMMENTS
  2. title = Minimization ; Title of run
  3. ; Parameters describing what to do, when to stop and what to save
  4. integrator = steep ; Algorithm (steep = steepest descent minimization)
  5. emtol = 1000.0 ; Stop minimization when the maximum force < 10.0 kJ/mol
  6. emstep = 0.01 ; Energy step size
  7. nsteps = 50000 ; Maximum number of (minimization) steps to perform
  8. ; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
  9. nstlist = 1 ; Frequency to update the neighbor list and long range forces
  10. cutoff-scheme = Verlet
  11. ns_type = grid ; Method to determine neighbor list (simple, grid)
  12. rlist = 1.2 ; Cut-off for making neighbor list (short range forces)
  13. coulombtype = PME ; Treatment of long range electrostatic interactions
  14. rcoulomb = 1.2 ; long range electrostatic cut-off
  15. vdwtype = cutoff
  16. vdw-modifier = force-switch
  17. rvdw-switch = 1.0
  18. rvdw = 1.2 ; long range Van der Waals cut-off
  19. pbc = xyz ; Periodic Boundary Conditions
  20. DispCorr = no

Type the code:

  1. gmx grompp -f em.mdp -c solv.gro -p topol.top -o ion.tpr

We now pass our .tpr file to genion:

  1. gmx genion -s ion.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral

In order to run energy minimization by mdrun, we have to use the grompp operation again.

  1. gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o HisGcomplex_em.tpr

Then, we run energy minimization:

  1. gmx mdrun -v -deffnm HisGcomplex_em

Before conducting equilibration, we need to apply restraints to the ligand and do the treatment of temperature coupling groups.

  1. gmx make_ndx -f adn.gro -o index_adn.ndx
  2. > 0 & ! a H*
  3. > q

Then, execute the genrestr module and select this newly created index group (which will be group 3 in the index_jz4.ndx file):

  1. gmx genrestr -f adn.gro -n index_adn.ndx -o posre_adn.itp -fc 1000 1000 1000

Now, we need to include this information in our topology. We can do this in several ways, depending upon the conditions we wish to use. If we simply want to restrain the ligand whenever the protein is also restrained, add the following lines to your topology in the location indicated:

  1. ; Include Position restraint file
  2. #ifdef POSRES
  3. #include "posre.itp"
  4. #endif
  5. ; Include ligand topology
  6. #include "adn.itp"
  7. ; Ligand position restraints
  8. #ifdef POSRES
  9. #include "posre_adn.itp"
  10. #endif
  11. ; Include water topology
  12. #include "./charmm36-mar2019.ff/tip3p.itp"
  1. gmx make_ndx -f HisGcomplex-em.gro -o index.ndx

image_1dkd24l3h41i1b631m5bkbe14qe9.png-101.7kB
Merge the "Protein" and "ADN" groups with the following, where ">" indicates the make_ndx prompt:

  1. > 1 | 13
  2. > q

Proceed with NVT equilibration using this .mdp file.

  1. title = Protein-ligand complex NVT equilibration
  2. define = -DPOSRES ; position restrain the protein and ligand
  3. ; Run parameters
  4. integrator = md ; leap-frog integrator
  5. nsteps = 50000 ; 2 * 50000 = 100 ps
  6. dt = 0.002 ; 2 fs
  7. ; Output control
  8. nstenergy = 500 ; save energies every 1.0 ps
  9. nstlog = 500 ; update log file every 1.0 ps
  10. nstxout-compressed = 500 ; save coordinates every 1.0 ps
  11. ; Bond parameters
  12. continuation = no ; first dynamics run
  13. constraint_algorithm = lincs ; holonomic constraints
  14. constraints = h-bonds ; bonds to H are constrained
  15. lincs_iter = 1 ; accuracy of LINCS
  16. lincs_order = 4 ; also related to accuracy
  17. ; Neighbor searching and vdW
  18. cutoff-scheme = Verlet
  19. ns_type = grid ; search neighboring grid cells
  20. nstlist = 20 ; largely irrelevant with Verlet
  21. rlist = 1.2
  22. vdwtype = cutoff
  23. vdw-modifier = force-switch
  24. rvdw-switch = 1.0
  25. rvdw = 1.2 ; short-range van der Waals cutoff (in nm)
  26. ; Electrostatics
  27. coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
  28. rcoulomb = 1.2 ; short-range electrostatic cutoff (in nm)
  29. pme_order = 4 ; cubic interpolation
  30. fourierspacing = 0.16 ; grid spacing for FFT
  31. ; Temperature coupling
  32. tcoupl = V-rescale ; modified Berendsen thermostat
  33. tc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accurate
  34. tau_t = 0.1 0.1 ; time constant, in ps
  35. ref_t = 300 300 ; reference temperature, one for each group, in K
  36. ; Pressure coupling
  37. pcoupl = no ; no pressure coupling in NVT
  38. ; Periodic boundary conditions
  39. pbc = xyz ; 3-D PBC
  40. ; Dispersion correction is not used for proteins with the C36 additive FF
  41. DispCorr = no
  42. ; Velocity generation
  43. gen_vel = yes ; assign velocities from Maxwell distribution
  44. gen_temp = 300 ; temperature for Maxwell distribution
  45. gen_seed = -1 ; generate a random seed

Remember to change the nvt.mdp file according to the protein name in the line tc-grps = Protein_JZ4 Water_and_ions to tc-grps = Protein_ADN Water_and_ions

  1. gmx grompp -f nvt.mdp -c HisGcomplex-em.gro -r HisGcomplex-em.gro -p topol.top -n index.ndx -o HisGcomplex-nvt.tpr
  2. nohup gmx mdrun -deffnm HisGcomplex-nvt &
  3. #tail -n 25 HisGcomplex-nvt.log

NPT equilibrium

  1. title = Protein-ligand complex NPT equilibration
  2. define = -DPOSRES ; position restrain the protein and ligand
  3. ; Run parameters
  4. integrator = md ; leap-frog integrator
  5. nsteps = 50000 ; 2 * 50000 = 100 ps
  6. dt = 0.002 ; 2 fs
  7. ; Output control
  8. nstenergy = 500 ; save energies every 1.0 ps
  9. nstlog = 500 ; update log file every 1.0 ps
  10. nstxout-compressed = 500 ; save coordinates every 1.0 ps
  11. ; Bond parameters
  12. continuation = yes ; continuing from NVT
  13. constraint_algorithm = lincs ; holonomic constraints
  14. constraints = h-bonds ; bonds to H are constrained
  15. lincs_iter = 1 ; accuracy of LINCS
  16. lincs_order = 4 ; also related to accuracy
  17. ; Neighbor searching and vdW
  18. cutoff-scheme = Verlet
  19. ns_type = grid ; search neighboring grid cells
  20. nstlist = 20 ; largely irrelevant with Verlet
  21. rlist = 1.2
  22. vdwtype = cutoff
  23. vdw-modifier = force-switch
  24. rvdw-switch = 1.0
  25. rvdw = 1.2 ; short-range van der Waals cutoff (in nm)
  26. ; Electrostatics
  27. coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
  28. rcoulomb = 1.2
  29. pme_order = 4 ; cubic interpolation
  30. fourierspacing = 0.16 ; grid spacing for FFT
  31. ; Temperature coupling
  32. tcoupl = V-rescale ; modified Berendsen thermostat
  33. tc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accurate
  34. tau_t = 0.1 0.1 ; time constant, in ps
  35. ref_t = 300 300 ; reference temperature, one for each group, in K
  36. ; Pressure coupling
  37. pcoupl = Berendsen ; pressure coupling is on for NPT
  38. pcoupltype = isotropic ; uniform scaling of box vectors
  39. tau_p = 2.0 ; time constant, in ps
  40. ref_p = 1.0 ; reference pressure, in bar
  41. compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
  42. refcoord_scaling = com
  43. ; Periodic boundary conditions
  44. pbc = xyz ; 3-D PBC
  45. ; Dispersion correction is not used for proteins with the C36 additive FF
  46. DispCorr = no
  47. ; Velocity generation
  48. gen_vel = no ; velocity generation off after NVT

Likewise, remember to change the nvt.mdp file according to the protein name in the line tc-grps = Protein_JZ4 Water_and_ions

  1. gmx grompp -f npt.mdp -c HisGcomplex-nvt.gro -t HisGcomplex-nvt.cpt -r HisGcomplex-nvt.gro -p topol.top -n index.ndx -o HisGcomplex-npt.tpr
  2. nohup gmx mdrun -deffnm HisGcomplex-npt &
  3. #tail -n 25 HisGcomplex-npt.log

md.mdp

  1. title = Protein-ligand complex MD simulation
  2. ; Run parameters
  3. integrator = md ; leap-frog integrator
  4. nsteps = 5000000 ; 2 * 5000000 = 10000 ps (10 ns)
  5. dt = 0.002 ; 2 fs
  6. ; Output control
  7. nstenergy = 10000 ; save energies every 20.0 ps
  8. nstlog = 10000 ; update log file every 20.0 ps
  9. nstxout-compressed = 10000 ; save coordinates every 20.0 ps
  10. ; Bond parameters
  11. continuation = yes ; continuing from NPT
  12. constraint_algorithm = lincs ; holonomic constraints
  13. constraints = h-bonds ; bonds to H are constrained
  14. lincs_iter = 1 ; accuracy of LINCS
  15. lincs_order = 4 ; also related to accuracy
  16. ; Neighbor searching and vdW
  17. cutoff-scheme = Verlet
  18. ns_type = grid ; search neighboring grid cells
  19. nstlist = 20 ; largely irrelevant with Verlet
  20. rlist = 1.2
  21. vdwtype = cutoff
  22. vdw-modifier = force-switch
  23. rvdw-switch = 1.0
  24. rvdw = 1.2 ; short-range van der Waals cutoff (in nm)
  25. ; Electrostatics
  26. coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
  27. rcoulomb = 1.2
  28. pme_order = 4 ; cubic interpolation
  29. fourierspacing = 0.16 ; grid spacing for FFT
  30. ; Temperature coupling
  31. tcoupl = V-rescale ; modified Berendsen thermostat
  32. tc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accurate
  33. tau_t = 0.1 0.1 ; time constant, in ps
  34. ref_t = 300 300 ; reference temperature, one for each group, in K
  35. ; Pressure coupling
  36. pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT
  37. pcoupltype = isotropic ; uniform scaling of box vectors
  38. tau_p = 2.0 ; time constant, in ps
  39. ref_p = 1.0 ; reference pressure, in bar
  40. compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
  41. ; Periodic boundary conditions
  42. pbc = xyz ; 3-D PBC
  43. ; Dispersion correction is not used for proteins with the C36 additive FF
  44. DispCorr = no
  45. ; Velocity generation
  46. gen_vel = no ; continuing from NPT equilibration

Likewise, remember to change the nvt.mdp file according to the protein name in the line tc-grps = Protein_JZ4 Water_and_ions

  1. gmx grompp -f md.mdp -c HisGcomplex-npt.gro -t HisGcomplex-npt.cpt -p topol.top -n index.ndx -o HisGcomplex-md_10ns.tpr
  2. nohup gmx mdrun -deffnm HisGcomplex-md_10ns &
  3. #tail -n 25 HisGcomplex-md_10ns.log

3 Result Analysis

3.1 Generate trace file

To recenter the protein and rewrap the molecules within the unit cell to recover the desired rhombic dodecahedral shape, invoke trjconv:

  1. gmx trjconv -s HisGcomplex-md_10ns.tpr -f HisGcomplex-md_10ns.xtc -o HisGcomplex-md_10ns_center.xtc -center -pbc mol -ur compact

Choose "Protein" for centering and "System" for output. Note that centering complexes (protein-ligand, protein-protein) may be difficult for longer simulations involving many jumps across periodic boundaries. In those instances (particularly in protein-protein complexes), it may be necessary to create a custom index group to use for centering, corresponding to the active site of one protein or the interfacial residues of one monomer in a complex.

To extract the first frame (t = 0 ns) of the trajectory, use trjconv -dump with the recentered trajectory:

  1. gmx trjconv -s HisGcomplex-md_10ns.tpr -f HisGcomplex-md_10ns_center.xtc -o HisGcomplex_start.pdb -dump 0

For even smoother visualization, it may be beneficial to perform rotational and translational fitting. Execute trjconv as follows:

  1. gmx trjconv -s HisGcomplex-md_10ns.tpr -f HisGcomplex-md_10ns_center.xtc -o HisGcomplex-md_10ns_fit.xtc -fit rot+trans

Choose "Backbone" to perform least-squares fitting to the protein backbone, and "System" for output. Note that simultaneous PBC rewrapping and fitting of the coordinates is mathematically incompatible. Should you wish to perform fitting (which is useful for visualization, but not necessary for most analysis routines), carry out these coordinate manipulations separately, as indicated here.

3.2 Visualization

Transfer the xtc trace file into pdb file for visualization.

  1. gmx trjconv -s HisGcomplex-md_10ns.gro -f HisGcomplex-md_10ns_fit.xtc -n index.ndx -o HisGcomplex_ADN-md_10ns_movie_fit.pdb

Using methods described in Molecular Dynamics Using GROMACS, users can generate trace video.

Enter 22 Protein-ADN to select both the structure of protein and ligand.

Change the gro structure to pdb structure.

  1. gmx editconf -f HisGcomplex-md_10ns.gro -o HisGcomplex-md_10ns.pdb

Additional 1 nanosecond simulation, changing simulation time to 1ns in md1.mdp. And run:

  1. gmx grompp -f md1.mdp -c HisGcomplex-md_10ns.gro -t HisGcomplex-md_10ns.cpt -p topol.top -n index.ndx -o HisGcomplex-md_Addi_1ns.tpr
  2. nohup gmx mdrun -deffnm HisGcomplex-md_Addi_1ns &
  3. #tail -n 25 HisGcomplex-md_Addi_1ns.log

Using visualization methods above to see the simulation process.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注