There are three types of domestic cats: indoor pet cats, outdoor pet cats, and indoor and outdoor domestic cats. These cats seem to be very similar, but if they are clean and adjacent to one another, they are very different. Indoor cats are usually very pretty as they do not have to work for their food. Extra time allows them to clean themselves and take care of themselves everyday. The fact that they are usually supplied on a regular basis, indoor cats get overweight. Cats know that they can sleep all day, but they can still eat more than once a day.
Standard classification is used in almost all classification models. The input is entered into a series of layers, and the class probability is output at the end. If you want to predict a cat's dog, you can train the model with a similar (but not identical) dog / cat picture that is expected within the predicted time. Of course, this requires a data set similar to what is expected when using the model for prediction. A good example is face recognition. For a small number of people, we train one shot classification model with dataset including various angles, lighting etc Next, if you want to check whether the person X appears in the image, take a picture of that person and ask the model if the person is in the image (model does not use someone's photo for training X Hmm).
On the other hand, when classifying an object, you need to learn the invariance of the position in the model. Regardless of where the cat is displayed in the image, it is classified as a cat. On the other hand, when executing object detection, I want to know the difference in position. If the cat is in the upper left corner, draw a box in the upper left corner. If you try to share the convolution calculation with 100% of the network, what is the trade-off between position invariance and position distribution?
Let's imagine. Let us assume that we are training the network to distinguish between cats and dogs. Therefore, two output neurons are required, one for each category. Put the image of the cat in the network. Imagine that each pixel of the image corresponds to "input" (a method to improve the image will be explained later). Here, the probability that the image is a dog, 38%, a cat is assigned to 62%. Ideally, I would like you to say that this image is 100% cat. The best loss function depends on the data and the intended application. There is no time to explain various options in detail, but in the spirit of encouraging intuition, there is "mean square error" as an example of a simple loss function. This is the point of installation data learned at school. Square of the distance between a point and a line: