Undergraduates in the School of Science of Tianjin University (TJU) invent an efficient machine learning method, which can accurately predict the boiling point of common organic molecules.
Machine learning is an artificial intelligence developed rapidly in recent years, and it is widely used in many areas including image identification, big data mining, and strategy making. The famous computer program AlphaGo is a representative of the advanced machine learning method. Machine learning technology is also applied in chemical research and shows enormous potential in predicting the structure and properties of compounds. The atmospheric boiling point of organic compounds is an important physical chemistry quantity, which is of great value in chemical production. However, it is of great difficulty to predict boiling points because of the complex composition and structure of organic molecules.
After nearly two years of research, the researchers in TJU (Liu Yuze, Li Kunhua, Huang Jiaxing, Yu Xi, Hu Wenping) developed an efficient machine learning method, which can predict the boiling point of common organic molecules accurately. They collected several data on compound boiling point to establish their database, which served as the basis for the development of machine learning methods. Then they developed a model method based on integrated learning to study the data. This model consists of three heterogeneous models, facilitates three different dimensions to describe molecular characteristics, including interpretability descriptor, regressive analysis descriptor, and molecular fingerprint, uses an artificial neural network and Support Vector Machine (SVM) as a model method, and finally integrates the three heterogeneous models, thus the boiling point is predicted accurately. Compared with the traditional method, the prediction accuracy is greatly improved and the application area is wider with the multi-component integrated model.
More importantly, this heterogeneous component integrated strategy is universal and can be applied to the prediction of various physical chemistry quantities in principle. Recently, a paper on these results has been published in Acta Chimica Sinica.
Yu Xi, the team's instructor, said that the team began to study machine learning in 2020 and kept trying to learn from scratch. The students, over the past two years, have gradually gained data collection and organization, database construction, and machine learning methods, and finally established their database and developed a unique machine learning model to predict chemical property.
Now, the team further improves the database and prediction method, which also applies to related patents. The team intends to expand the range of data, covering more data on physicochemical properties and providing services to query and predict data, which brings research results to practical application. In addition, this achievement is part of the pilot study of the Organic Photovoltaic Molecular Database and Artificial Intelligence Development Program of the Tianjin Molecular Optoelectronic Laboratory, which provides technology exploration and technology reserve for the intelligent development of optoelectronic molecules.
By: School of Science
Editor: Qin Mian