At the end of this lesson, students should be able to...
- Discuss how objects are stored in Ruby
- Differentiate between references and values
- Compare modifying an object with reassigning a variable
We're going to start today with some Ruby code that does something a little unexpected. It's a method that takes an array of strings as an argument, and truncates (chops off the end of) all the strings with more than three characters. Or at least that's the idea.
def short_strings(input)
result = []
input.each do |word|
# Slice characters 0 to 2
result << word[0..2]
end
input = result
end
pets = ['dog', 'parrot', 'cat', 'llama']
short_strings(pets)
puts "#{pets}"
Running this code results in ["dog", "parrot", "cat", "llama"]
. The array in unchanged! Let's do some debugging:
def short_strings(input)
result = []
input.each do |word|
# Slice characters 0 to 2
result << word[0..2]
end
input = result
puts "Inside short_strings, input is"
puts "#{input}"
end
pets = ['dog', 'parrot', 'cat', 'llama']
short_strings(pets)
puts "After calling short_strings"
puts "#{pets}"
The results:
Inside short_strings, input is
["dog", "par", "cat", "lla"]
After calling short_strings
["dog", "parrot", "cat", "llama"]
Seems like our method is indeed creating a list of shortened words, but our outer variable isn't being updated. The reason why has to do with references and values, and how data is stored in a computer.
As an aside: one way to fix our method is to simply return the new array, and when calling it say pets = short_strings(pets)
. However, sometimes this isn't an option - for example, what if our method was supposed to return the number of words that had to be truncated?
When we create an array in Ruby (or a string or a hash or any other complex data type), we're actually creating two things.
The first is the value of the array, which involves asking the operating system for a bit of memory and then putting our data in it. You can think of this as the actual object. Each piece of memory we get from the OS has an address representing its physical location in hardware, which is how we get back to it later.
The second is a reference to the array, which ties together the address of that memory with a name for our program to use. References are sometimes called pointers (especially in C), and we say that a variable points to or references an object.
Every variable in Ruby consists of these two parts, a reference and a value. Normally when you type the variable's name, Ruby automatically goes and gets the object. If you want to find out the address, you can use the object_id
method:
# You can see an object's memory address using object_id
pets = ["dog", "parrot", "cat", "llama"]
puts "pets.object_id: #{pets.object_id}"
# Different objects have different IDs
veggies = ["turnip", "beet"]
puts "veggies.object_id: #{veggies.object_id}"
The =
operator changes what a variable points at.
If we assign one variable to another variable, they will both reference the same underlying object.
# Two variables can point to the same object
repeat = veggies
puts "repeat.object_id: #{repeat.object_id}" # same as veggie.object_id
If we make changes to the object through one variable, you can see the changes via the other. The variables are just names, but the underlying object is the same.
veggies[1] = "onion"
repeat.push("potato")
puts "#{veggies}" # ["turnip", "onion", "potato"]
puts "#{veggies}" # ["turnip", "onion", "potato"]
When we use the =
operator, we are not changing the underlying object but instead changing what our variable points to. This does not affect any other variables.
repeat = ["new", "array"]
puts "repeat.object_id: #{repeat.object_id}"
puts "value of repeat:"
puts "#{repeat}" # ["new", "array"]
puts "value of veggies:"
puts "#{veggies}" # ["turnip", "onion", "potato"]
So to summarize, if two variables point to the same underlying object:
- Modifications to the object (the value) will be visible from both variables
- Reassigning one variable (the reference) with
=
does not affect the other variable
Note that +=
and the other shorthand operators all involve reassignment. If we say veggies += ['rutabaga']
, Ruby creates a new array, copies all the values from veggies
, adds in rutabaga
, and reassigns veggies
to point to this new array. This is true of strings and numbers as well.
In general, calling a method on an object like .concat()
or .push()
will change the underlying object, while any operation that contains an =
will result in reassignment.
Question: When we pass a parameter to a method, what do you get?
- Is it the same underlying object?
- Is it the same variable?
- How can we find out?
Let's write some code that will help us investigate this.
def reassign_parameter(param)
puts " Inside reassign_parameter"
puts " at start, param.object_id is #{param.object_id}"
# .push modifies the underlying object
param.push('gecko')
puts " after modification, param.object_id is #{param.object_id}"
# = changes the reference
param = ["new", "array"]
puts " after reassignment, param.object_id is #{param.object_id}"
puts " with value #{param}"
puts " Finish reassign_parameter"
end
pets = ["dog", "parrot", "cat", "llama"]
puts "Before reassign_parameter"
puts "pets.object_id is #{pets.object_id}"
puts
reassign_parameter(pets)
puts
puts "After reassign_parameter"
puts "pets.object_id is #{pets.object_id}"
puts "with value #{pets}"
Before running this code, take a couple minutes to read through it. What is it doing? What do you expect the output to be?
Running the code yields (your object_id
s may be different):
Before reassign_parameter
pets.object_id is 70144030241620
Inside reassign_parameter
at start, param.object_id is 70144030241620
after modification, param.object_id is 70144030241620
after reassignment, param.object_id is 70144030228060
with value ["new", "array"]
Finish reassign_parameter
After reassign_parameter
pets.object_id is 70144030241620
with value ["dog", "parrot", "cat", "llama", "gecko"]
We can make a few interesting observations about this output:
- The parameter inside the method has the same
object_id
as the variable we passed from outside - Modifications to the underlying object are visible outside the method
- Reassigning the parameter with
=
does not reassign the outer variable
This is exactly the same behavior we saw before, when we had two variables referencing the same object. From this we can conclude: when you pass a variable as parameter, Ruby creates a new variable that references same object.
Question: Given what we've learned, how can we modify our short_strings
method to do what we want?
The answer is to modify the underlying object, rather than reassigning the parameter. Here's what the resulting code might look like:
def short_strings(input)
input.each_with_index do |word, i|
# Slice characters 0 to 2
input[i] = word[0..2]
end
end
pets = ['dog', 'parrot', 'cat', 'llama']
short_strings(pets)
puts "#{pets}"
This produces the expected output. Note that we can't just say word = word[0..2]
, for the same reason as above: that reassigns the block parameter word
to a new string containing just the first 3 letters, but neither modifies nor reassigns the string in the array. Instead we reassign input[i]
, which does what we want: change the value stored in the array.
We could also use the map!
enumerable method, since that modifies the original. map
(without a !
) would not work, because it creates a new array.
We've talked a lot about arrays today, but this pattern holds true for all complex objects in Ruby: strings, hashes, instances of classes, etc. For example, consider the following code:
# Reassign a string using +=
def reassign_string(str)
str += ' reassigned'
puts "inside reassign_string, str is '#{str}'"
end
text = 'original'
reassign_string(text)
puts "outside reassign_string, text is '#{text}'"
# Modify a string using the .concat() method
def modify_string(str)
# str << ' modified' would do the same thing
str.concat(' modified')
puts "inside modify_string, str is '#{str}'"
end
text = 'original'
modify_string(text)
puts "outside modify_string, text is '#{text}'"
Primitive types like numbers, booleans and nil
follow basically the same rules. The catch is there's no way to change the underlying value of a primitive without reassignment. In programming lingo, we say that these types are immutable. This means that whenever you change the value, Ruby makes a copy and changes that instead.
- A variable in Ruby consists of two things:
- The variable itself, tying a name to an address in memory
- The object at that memory address
- We say a variable references or points to an object
- Multiple variables can reference the same object
- Changes to the underlying object will be reflected through both variables
- Methods like
.push()
or.concat()
- Methods like
- Changing what one variable points to does not affect any other variables
=
,+=
, etc.
- Changes to the underlying object will be reflected through both variables
- Passing an argument to a method creates a new variable referencing the same object
- Primitives (numbers, booleans and
nil
) are immutable, meaning the underlying object can't be modified
- Passing objects in Ruby - Great article! Goes into more depth on immutables.
- Memory allocation in Ruby - Deep dive into some os-level stuff.
- Pointers: Understanding Memory Addresses - Lower-level view of what's going on. Oriented toward C programming, but very approachable.