Red Robber is a story about the evolution of his relationship with boys and his grandmother. Willy, a little boy and his grandmother stayed home for a week. His parents went on a trip, and he was not interested in making his grandmother a babysitter. He felt that she treated herself like a baby and did not understand the necessity of adventure. When he was in his room he met encyclopedia of red robbery and invited him to risk. Willy did not know his grandmother was a red robber.
On her wedding trip, robbery suspended the trip. He entered the car, almost immediately immediately fascinated by the beauty and strength of nine people. The robber did not frighten her. She actually smiled at him after entering the car. Gang grabbed her red shoes and symbolized passion; he was already walking towards a beautiful girl in red. The gangsters put her down from the car and put it on a high floor ceremony and planned to rape her, but she glanced at Yu and came to save her. Yu and Nine exchanged their eyes, their relationship increased, then Yu grasped Nine shoes. She did not resist his behavior but asked for his attention by giving him a glimpse into the car and an enchanting entrance and letting the red curtain fall down on her . These actions indicate that she is trying to promote her passion between Yu and Yu.
Context gang introduces the concept of the state. The state contains a description of the environment that the agent can use to take wiser behavior. In our question, there is the possibility that there are multiple robbers rather than robbers. The state of the environment shows which robber is dealing. The goal of the agent is to learn about arbitrary number of robbers as well as to learn best actions against a single robbery. Since each robbery has different possibilities to reward each weapon, our agent needs to learn how to adjust their behavior based on environmental conditions. Otherwise, it will not get the maximum reward over time. In order to achieve this goal, we build a single layer neural network that uses states in Tensorflow to generate actions. By using the strategy gradient renewal method you can let the network learn to take actions that maximize that reward. Here is the iPython notebook for reading the tutorial.