Algorithms and Learning for Protein Science

MVA course

Frederic.Cazals@inria.fr

Click on a lecture title to download the notes.

The exam will take place on March the 24th, as follows:
  1. 02/24: release of the papers for the projects. Students will have one week to form pairs and select the projects--first come first serve or mutual agreement.
  2. 03/24: 9am-10am or 2pm-3pm: paper and pencil written exam. class notes allowed, but no electronic device.

    Due to a collision with another class, the form below features a question to find the optimal scheduling of the exam.

  3. 24/03/24: 10:30-5pm: project presentations.
    • Students will present in tandem one the research paper below. Bonus point will be awarded for two intiatives: (i) computer experiments aiming at reproducing the results; (ii) a well-founded strategy to improve on the paper and its limitations.
    • For a given presentation: all students will attend it, *except* those working on the same project, who did not present yet.
The precise schedule for this second part will be provided later.

  1. K-means++
    Beretta, L., Cohen-Addad, V., Lattanzi, S., & Parotsidis, N. (2023). Multi-swap k-means++. Advances in Neural Information Processing Systems, 36, 26069-26091.
  2. Gaussian mixtures, generative denoising processes
    Shah, K., Chen, S., & Klivans, A. (2023). Learning mixtures of gaussians using the ddpm objective. Advances in Neural Information Processing Systems, 36, 19636-19649.
  3. Structural alphabets
    Rosenberg, A. A., Yehishalom, N., Marx, A., & Bronstein, A. M. (2023). An amino-domino model described by a cross-peptide-bond Ramachandran plot defines amino acid pairs as local structural units. Proceedings of the National Academy of Sciences, 120(44), e2301064120.
  4. Time lagged ICA, deep learning
    Bonati, L., Piccini, G., & Parrinello, M. (2021). Deep learning the slow modes for rare events sampling. Proceedings of the National Academy of Sciences, 118(44), e2113533118.
  5. Structural alignments
    Van Kempen, M., Kim, S. S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C. L., ... & Steinegger, M. (2024). Fast and accurate protein structure search with Foldseek. Nature biotechnology, 42(2), 243-246.
  6. Normal modes, interpolation, SPD matrices
    Batista, P. R., Robert, C. H., Marechal, J. D., Hamida-Rebaï, M. B., Pascutti, P. G., Bisch, P. M., & Perahia, D. (2010). Consensus modes, a robust description of protein collective motions from multiple-minima normal mode analysis—application to the HIV-1 protease. Physical Chemistry Chemical Physics, 12(12), 2850-2859.
  7. Structural decompositions, community detection
    Wells, J., Hawkins-Hooker, A., Bordin, N., Sillitoe, I., Paige, B., & Orengo, C. (2024). Chainsaw: protein domain segmentation with fully convolutional neural networks. Bioinformatics, 40(5), btae296.
  8. Loop sampling, reinforcement learning
    Barozet, A., Molloy, K., Vaisset, M., Siméon, T., & Cortés, J. (2020). A reinforcement-learning-based approach to enhance exhaustive protein loop sampling. Bioinformatics, 36(4), 1099-1106.
  9. Trees, classification, regression
    Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ... & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56-67.
  10. Protein design, deep learning
    Defresne, M., Barbe, S., & Schiex, T. (2023). Scalable coupling of deep learning with logical reasoning. arXiv preprint arXiv:2305.07617. IJCAI 2023
  11. Correlations, information theory
    Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., ... & Sabeti, P. C. (2011). Detecting novel associations in large data sets. science, 334(6062), 1518-1524.
  12. Protein design, covid
    Cao, L., Goreshnik, I., Coventry, B., Case, J. B., Miller, L., Kozodoy, L., ... & Baker, D. (2020). De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science, 370(6515), 426-431.
  13. Normal modes, cryo-electron miscroscopy
    Vuillemot, R., Mirzaei, A., Harastani, M., Hamitouche, I., Fréchin, L., Klaholz, B. P., ... & Jonic, S. (2023). MDSPACE: Extracting continuous conformational landscapes from Cryo-EM single particle datasets using 3D-to-2D flexible fitting based on molecular dynamics simulation. Journal of molecular biology, 435(9), 167951.
  14. Collective coordinates, dimensionality reduction, rare events
    Belkacemi, Z., Gkeka, P., Lelièvre, T., & Stoltz, G. (2021). Chasing collective variables using autoencoders and biased trajectories. Journal of chemical theory and computation, 18(1), 59-78.
  15. Conformer generation, diffusion processes
    Wu, K. E., Yang, K. K., van den Berg, R., Alamdari, S., Zou, J. Y., Lu, A. X., & Amini, A. P. (2024). Protein structure generation via folding diffusion. Nature communications, 15(1), 1059.