Essay sample library > A Pattern Matching Approach to Find the IUPAC Names in Chemical Documents

A Pattern Matching Approach to Find the IUPAC Names in Chemical Documents

2023-03-11 07:32:53

Chemical substances or entities are important terms in chemical publications and patents. Various expressions can be used to represent chemicals such as IUPAC, generic name, SMILES, InChI and CAS registration number. Chemistry names are often long and complicated expressions and can change, causing specific problems in information retrieval. As a result, search performance may be degraded. The difficulty of obtaining manual annotation data to train the NER system encouraged researchers to look for alternative ways of generating annotation data or to find optimal use of unlabeled data.

"Chemical name" is a scientific nomenclature of chemical substances based on nomenclature developed by the International Union of Pure and Applied Chemistry (IUPAC) or Chemical Abstract Service (CAS) nomenclature, or clearly identified for classification of hazards It means the name. It is a product. "Classification" is the identification of relevant data on hazards of chemicals, the review of such data to identify the hazards associated with chemical substances, and the chemistry under the definition of hazardous chemicals in this section It is to judge whether a substance is dangerous. In addition, classifying health and physical hazards includes determining the level of risk by comparing the data to health and physical risk criteria.

For the purposes of international exchange and trade, the official name of an ancient and recently accepted chemical element was decided by the International Union of Pure Chemical Industries (IUPAC) which decided to adopt international English. Based on the Latin word or other traditional words such as "gold" instead of "gold" as the name of the 79th element (Au). IUPAC prefers English spelling of "aluminum" and "铯", not spelling of US "aluminum" and "lling", while US "sulfur" exceeds British "sulfur". However, in countries that are sold together in many countries, the country name used locally is normally used as is, and in countries where the Latin alphabet is not used in national languages, IUPAC element names can be used.

According to IUPAC, chemical elements are not English proper nouns, so even if they are proper names like cal or einsteinium, element full names are conventionally not capitalized in English. If written, the isotope names of chemical elements are not capital letters like carbon 12 or uranium 235. Symbols for chemical elements (Cf for cal, E for E, etc.) are always capitalized (see below). In the latter part of the 20th century, the physical laboratory has a very short half-life, which makes it possible to generate nuclei in which a significant amount of chemical elements are always present. These are also named by IUPAC. IUPAC normally uses the name chosen by the finder. This approach may lead to controversial issues, which the team has actually discovered elements that delay the naming of elements with atomic number 104 or higher for quite a while. (See element nomenclature controversy)