Google colab with Notebook
CZS 2024 Summer School - ML4Chem
Hello! I am Mayk Caldas and will be talking about machine learning for property prediction and large language models at the CZS 2024 summer school. Before going any further, it is important to notice that:
This is not the official summer school page. Please, check this link for official information about the event.
I will be giving three talks in this event. Bellow is a summary on what I plan to show on each meeting.
Machine Learning fundamentals
Machine learning (ML) is a very broad subject to be discusses in one class only. Indeed, you can ministrate a whole semester class on it and still you will probably not discuss all its aspects. Therefore, my goal for this talk is not to give a strong base in mathematics, statistics, and programming which will allow you to apply ML right away. My main goal is to introduce valuable concepts to ease your way on studying deeper, specific concepts depending on your area of interest.
On this first talk, I will define what ML is and how a common ML project work. I will show you the steps you will need to accomplish when working on a real-world ML problem and talk about possible problems you would enconter during your journey. Lastly, I want to show you a toy example to illustrate how some of these problem can by dealt with.
Machine learning for molecular property predictions
After giving you a initial foundation on ML, I will discuss how it can be applied directly in chemistry. The main question I want to answer during this section is: "What differentiate a general ML project from a ML project in chemistry?"
Because we will have this foundation on ML already, this transfer of knowledge might be fast. Hence, I also prepared real-world use cases in chemistry that will be shown in this section.
More importantly, following the mantra that "The best way to learn programming is programming", we will have a hands-on section where I will provide a dataset and you will use the concepts learnt in these two talks to develop a model to learn this dataset. Working in groups and asking questions is more than welcome!
Large Language Models in chemistry
Large language models (LLM) have exponentially attracting attention since ~2017 when the paper "Attention is all you need" was published. It is showing incredible applications in a very broad group os applications. Naturally, its applications in chemistry has also been investigated.
In this talk, I will focus on showing the capabilities of LLMs in chemistry. Not only on property predictions, but also on tasks no other models were capable of addressing, such as long texts reasoning, hypothesis creation, data curation, and lab automation and optimization. We recently published the ideas I will be discussing in this talk on "A Review of Large Language Models and Autonomous Agents in Chemistry", which you are welcome to check it out!