Essay sample library > 1. Learns more with less data

1. Learns more with less data

2024-02-10 09:48:58

Watson learns from a small dataset. The quality of the data changes, not the quantity

When you train with Watson, your insight is yours. Even if the value of the model increases, the ownership of the data is maintained.

Using Watson makes the business process smarter. Watson incorporates a workflow to provide AI when needed.

Behind all excellent learning systems is data. The more you process the system, the more data is generated. The more data you generate, the lower the "cleanliness" of the data you get, the more you combine systems, import histories, and manipulate things. What is clean data How does dirty data clean up dirty data? There is a problem. Let's begin by defining clean data and dirty data. In my definition, clean data is data generated using the system directly. For a learning system, this is the data that is generated when the user creates a new user, when learning is finished, when it becomes available, or when a job is marked. It is distributed directly through system tools through learner or counselor behavior. It is complete and accurate. Dirty data is not beautiful! It contains incorrect or missing key fields. Many other factors introduce dirty data

Missing data is a general problem in statistical analysis. The loss rate of less than 1% of lost data is generally considered negligible and 1 to 5% can be controlled. However, 5 to 15% needs to be dealt with in a complicated way, and people over 15% can have serious consequences for all kinds of interpretation. Several methods for processing missing data are proposed in the literature. Many of these methods were developed to process the missing data in the sample survey and have some drawbacks when applied to classification tasks. Chan and Dunn (1972) are considering using LDA classifiers to handle missing values ​​in supervised classification, but only two kinds of problems from the simulation dataset from the multivariate normal model I am studying. Dixon (1979) introduced the KNN interpolation method to handle missing values ​​in supervised classification. Trespe et al. (1995) We also considered the missing value problem in the neural network with teacher's learning environment.