Friday, January 21, 2011

Training 101

What is behavior? All behaviors are influenced by what occurs as a consequence of that performed behavior. You touch a hot stove, the consequence is a burn; the following behavior is one of self-preservation—pull away. You bring your mom a snake from outside while she’s working at her desk, she screams. Unless that’s the reaction you were going for, it’s unlikely you’ll ever do that again. Recurring behaviors form by a means of shaping, of selectively reinforcing responses that approximate a desired response to an increasingly greater degree. This type of behavioral modeling is a prime example of operant conditioning, where consequence is used to shape behaviors.

The ABC’s of Operant Training:
A for Antecedent: whatever precedes the behavior or elicits a specific response (a whistle blowing for a horse at the starting gate)
B for Behavior: the response (the horse begins to run)
C for Consequence: what happens immediately after the behavior (the horse may win because it responded, which would be reinforcing)

Marine mammal training is often based on these three basic principles of conditioning behaviors. There are 7 principle reasons to train animals, as seen by most trainers:
1) Companionship—who wouldn’t want to say they hang out with dolphins all day?
2) Police and rescue work—sea lions have been trained to dive as far as 1000 feet to retrieve objects even humans can’t safely reach
3) Therapy—dolphins have been shown to have vast effects on increasing personal motivation in children with disabilities
4) Entertainment—because it’s just plain fun
5) Education—because it’s important to do a service to both animals in human care as well as in the wild by educating the public on proper treatment and concern for animals
6) Research—because there’s so much left to discover
7) Health—the first and foremost priority of training animals; being able to have an animal participate in its own health maintenance is as important as it gets for marine mammal trainers

So, how do these trainers wave a hand and get four dolphins to do back flips until they blow a seemingly meaningless whistle? Trainers at Dolphin Cove use a type of reinforcement called positive reinforcement, which focuses on adding something to a situation in order to increase a behavior. To go over some terms:
Positive: adding something
Negative: removing something
Reinforcement: increasing something
Punishment: decreasing something

Negative reinforcement: this approach may be effective, but the animal may begin to associate aversive connotations with the reinforcement. For example, a dolphin getting bullied by another dolphin doesn’t like being at the platform with that dolphin. As a “reward,” the trainer may send away (remove, negative) the bully away if the bullied dolphin completes the behavior asked of him. This increases the chance that the bullied dolphin will perform the behavior correctly, because he knows it’ll mean relief from the bully. And while it may be effective, this particular scenario utilizes negative reinforcement in such a way that fear becomes the driving factor in completing a behavior.

Positive punishment: in order to get someone to stop doing someone, you may say something to them that represents your annoyance with their behavior. You are adding (positive) your opinion on their behavior in order to decrease it (punishment). A good example is with training dogs not to bark, change their normal collar to a shock collar and it’s almost guaranteed that the addition of a painful shock is going to decrease their leaving the yard.

Negative punishment: the most notable examples of this type of behavioral shaping is generally found in the realm of raising children. Like it or not, most parents use this form of shaping to change unwanted behaviors in their children. You don’t like that your kid is crying? Ignore them (take away yourself, essentially) to decrease (punishment) the behavior. Same goes for taking away privileges, time-outs, etc. In timeouts, you may think the action is positive (adding an unwanted scenario such as sitting in the corner alone), but in fact it’s more reasonable to say that you are taking something away (you are taking them away from whatever they were doing, such as playing).

Positive reinforcement: we, as emotive beings, are constantly engaging in behaviors in order to get a positive response from others. Something as small as a smile acts as a means of praise that says, if you do whatever you just did again, that person will smile and you’ll likely feel good about it, or reinforced. We rarely, if ever, perform behaviors with the altruistic notion that it will simply make the other person feel good. We always want something in response; we are, in fact, a species driven by desire. Think about it. Why do you ask someone a question? Because you want a response. Why do you make conversation? Because you want to interact? Why do you yell? Because you want someone to know that you’re upset. There is always a reason behind your behaviors, and it’s hard to find any other reason for doing something besides wanting something in return. Try it.


One intriguing and confusing example of training (in terms of whether it is positive or negative, reinforcement or punishment) can be seen in a specific scenario of dog training. Imagine you have a pooch that barks every single time the mailman comes. You could yell at the dog, or put it in another room, or let it keep barking. OR you could remove it from that scenario altogether. Removing the animal from the scenario before the time at which the behavior begins causes the behavior to fade or disappear. If you remove the dog from the house when you know the mailman is coming and take them for a walk, you aren’t allowing them to practice the bad behavior. You are reinforcing the absence of that behavior by removing the ability to engage in it. This is called differential reinforcement of incompatible behavior. The dog can’t bark at the mailman because the mailman simply is not there. The dog is being walked, a primary reinforcer (inherently enjoyed), and the target response (barking at the mailman) is decreasing in frequency. Pretty soon, even if the mailman comes and you’re not out walking, it’s likely that the dog will be at your feet waiting for you to get your butt off the couch rather than standing at the door barking its head off.

A primary reinforcer is something that an animal finds inherently rewarding; its value does not need to be learned. Secondary reinforcers are conditioned by pairing them with primary reinforcers to signify their value. For instance, the dolphins respond to a whistle that the trainers use. They don’t stop the behavior when the whistle is blown because they like the sound of the whistle, they stop because they know that a primary reinforcer (fish, praise, toys) is coming. A secondary reinforcer bridges the gap between the completion of a behavior and the primary reinforcer earned. This is what trainers call using a “bridge”. In the trainers’ cases here, the whistle is used as a bridge. The 5 main properties of a bridge are as follows:
1) stops the behavior
2) marks the apex (the high point, or point of greatest success) of that behavior)
3) gives secondary reinforcement
4) indicates success
5) bridges gap to the primary reinforcer

The point bridge is also used with the dolphins, where a trainer will point at the dolphin that correctly performed the behavior if others were involved and failed. This still indicates success for the animal that earned the bridge, but does not bridge the others. Other examples of secondary reinforcers, like the whistle, are rubs, praise, high-fives and receiving ice. The animals had to learn that these things held value in order to accept them as rewards.
A bridge is a good example of classical condition, which gives a novel stimulus a specific, generally unassociated meaning. The most famous example of this type of condition is, of course, with Pavlov’s dogs, which were conditioned to salivate when a bell rang because of their learned association of that bell with meat. The signals given to the dolphins here to elicit behaviors are called Sd’s or discriminative stimuli. These conditioned signals are paired with, and then eventually elicit, a specific behavior.

For everyone reading this that already knows the basics of psychology and feels like I’m beating a dead horse by explaining all of these terms, I apologize. ☺

No comments:

Post a Comment