Friday, 21 January 2011

What is the difference between pass by value and pass by reference?

This was recently listed as one of 50 Java Interview Questions implying that those 50 are the most common ones, the ones you want to read up on before going to a job interview. Since I'm suspecting some developers will look at this question, answer it to themselves and not looking it up, some will get it wrong. The answer might not be what you think it is. And since the answer is what it is, the question is not relevant to Java.

Many claims above. Lets start with why it's a bad question, or at least an irrelevant question for Java. It is irrelevant because there are more than two parameter passing semantics out there. A few others are pass by result, pass by value result and pass by name. None of these are in the question and I wonder why that is? I suspect it is because they're not available in Java. You can't pass parameters by result, value result or name in Java. And this is where it gets interresting because pass by reference isn't available in Java either, but by leaving out other common passing semantics, the question is implying Java has pass by reference. It doesn't. A better question would be "Is Java pass by value or pass by reference?".

Now, this isn't an answer to the question. But an actual answer to the question would, as stated, be irrelevant in Java. Lets try anyway, shall we. First, we need to, informally, define a few things
  • A formal parameter is the receiving sides parameter, or the parameter defined in your method header.
  • An actual parameter is the calling sides parameter, the one defined in the method invocation.
Example:

public void myMethod(int foo) { //foo is the formal parameter
    System.out.println(foo);
}

public void caller() {
    myMethod(42); //42 is the actual parameter
    int a = 42;
    myMethod(a); //a is the actual parameter
}

Now we can, again informally, define the two concepts whos difference the question asks about.
  • Pass by value: The value of the actual parameter (value of 42 and a above) is copied to the formal parameter (foo above).
  • Pass by reference: An "access path" to the actual parameter is sent to the formal parameter, often the acess path is nothing more than an adress. (Don't confuse adress with pointer. A pointer is a parameter type, a variable. It is a named value that takes up space in memory. An adress does not take up space in memory and has no name. They are related only in that the value of a pointer is an adress. But the pointer itself also has an adress, since it takes up space in memory, so you can create a pointer whos value is the adress of another pointer.)
So, the difference between them is basically that pass by value copies the actual parameter while pass by reference sends an adress to the actual parameter. This means that everything you do with the formal parameter will be reflected back to the actual parameter.

This sounds like "Primitives are sent by value, objects by reference" is the correct answer, right? I agree. But it's wrong. Here's why.

To be able to send objects by reference, you need to be able to actually send objects as parameters. You can't send objects as parameters in Java. That's all you need to know. The reason you cant send objects as parameters is that you can't create object variables in Java. All variables are either primitive or reference type variables. If you could create object variables in Java, what would the following do?

Point p = new Point(10, 10);
Point p2 = p;

A hint is "what is the value of p?". The answer is, the value of p is an "access path" to an object, in this case a point, on the heap. If the value of p was the object itself, then what happens when you assign p2 to p? You would need to copy the value of p to a new memory area and set the value of p2 to this new information. Is that what happens? No, not in Java. Besides a few rare exceptions, heap allocation is never performed unless the new keyword is used. Also, to make the above code work you would need some kind of copy constructor. The JVM can't decide for you how deeply an object should be copied. What happens is, that the value of p (the value is an access path) is copied and put in a new memory area (not the heap), then the value of p2 is set to that new memory area. Thus, p2 and p have the same value. And the value is an access path to an object on the heap (in this case a point). Although the values are the same, they reside in different parts of memory.

A little sidetracked, but this is the key point. You cannot ever create an object variable. All variables are either reference variables (access paths) or primitive variables. So, when invoking a method in Java, the only thing we can send to the method is an access path variable or a primitive variable. Now the question is, do we send our access paths by reference or by value?

Remember that "by reference" means "let the formal parameter be an access path to the actual parameter". Also remember that this means that the formal parameter can be used to change the value of the actual parameter. But, the value of our actual parameter is an access path! This would mean that we can use the formal parameter to change that access path! Is this what happens?

public void myMethod(Point foo) { //foo is the formal parameter
    foo = new Point(20, 20); 
}

public void caller() {
    Point p = new Point(10, 10);
    myMethod(p); //p is the actual parameter.
    System.out.println(p);
}

The output will print p as being (10, 10). So, the change that we make on the formal parameter 'foo' is NOT reflected on the actual parameter 'p'. If it was, the value of 'p' would be an access path to a new point with coordinates (20, 20)! What really happens is that the value of 'p' is copied to the value of 'foo'. This is called... pass by value. The fact that the values of 'p' and 'foo' are access paths does not change this. In other words, 'foo' and 'p' are variables that hold the same vaules but the values reside in different parts of memory. So, when we change the value of 'foo', 'p' is not affected. Pass by value. They are but copies of each other. If it was pass by reference, then 'foo' would reference the memory area of 'p', and it would be used to write to that memory area. If we write to the memory area of 'p', we would change the value of 'p'. If we change the value of 'p', we would change an access path since 'p' is an access path. This is not what happens. After the method call, 'p' is still accessing the same memory area as it did before the call. The memory area it accesses is that of a point. The value of that point is (10, 10). Somewhere on the heap is a point with the value of (20, 20). This point is not accessed by anything. It was once accessed by 'foo'.

Wait a minute. When I'm in myMethod, I can change 'foo' with a statement such as foo.x = 20, and that will be reflected in the caller. That is, 'p' would also change! Yes, that is because the access path of 'p' is the same as that of 'foo'. They access the same memory area, their values are equal. The path goes to the same adress in memory. But they are distinct variables! Remember the distinction between an adress and a pointer? Ok so, 'p' takes up space. That space is filled with information. The information is an access path and you may call the access path an adress if you like. So, 'p' has space and... This space is not the same space as that of 'foo'. The information in the two different spaces however, is the same. You could say (and be correct) that 'p' and 'foo' have the same value. At least until 'foo' gets reassigned. Then the information in 'foo's space is filled with a new value. This is not reflected in 'p' whos value remain the same.

In summary
  • Java is always and only pass by value, no matter what you pass.
  • In Java, there is no way to pass an object, only an access path to the object. For some reason, they chose to call this access path a reference.
  • In Java, there is no way to even create an object variable. Everything is access paths except for primitives. (Maybe brighter people than me will argue about arrays not being neither primitives nor objects. But the variables for arrays are still access paths.)
  • A formal parameter is declared in your method header and (presumably) used in your method body.
  • An actual parameter is what you pass in to a method.
  • If all variables in Java are either primitive variables or access path variables (they are), and pass by reference sends access paths (it does), then a method invocation in Java would send an access path to an access path. This would mean that reassignment of the formal parameter would be reflected as a reassignemnt of the actual parameter. This. Is. Not. The. Case.
Don't just take my word for it:
Parameter passing in Java - by reference or by value?
Simple explanation
James Gosling (inventor of Java) says: Always pass by value
Java is Pass-by-Value, Dammit!

> Edited to add tags

    2 comments:

    1. Thanks for putting answer for this common question buddy. by the way in Java everything is pass by value nothing is pass by reference because we don't have any pointer stuff in Java but when we pass an object as argument we pass handle of java which can then be used to modify that object.

      Thanks
      Javin
      Difference between HashMap and HashTable? Can we make hashmap synchronized

      ReplyDelete
    2. Javin,

      "[...]in Java everything is pass by value nothing is pass by reference[...]"

      Exactly! :) But...

      "[...]when we pass an object[...]"

      We never do! Instead

      "[...]we pass handle[...]".

      And that handle/access path/'reference'/'pointer'/adress variable is passed by value.

      We do have pointers in Java. It is the only thing we have aside from primitives. Remember that when people say that Java don't have pointers, what they mean is that you cant access the pointer variables value. In languages such as C/C++, you can access the pointer value, doing pointer arithmetic. This, we can't do in Java. However, a Java "reference" is in no way similar to a C++ reference. In C++, a reference is an alias for an object variable. We can't create object variables in Java, hence no aliases for them and thus no reference in the C++ sense. Take a look at references in C++:

      Point p(10, 10);
      Point& p2 = p;

      This is very different from the Java way:

      Point p = new Point(10, 10);
      Point p2 = p;

      In C++ above, p2 has a different type as well as value from p. In the Java code, p and p2 have the same type and the same value. In fact, the Java code better translates to the following C++:

      Point* p = new Point(10, 10);
      Point* p2 = p;

      Both syntactically AND semantically.

      So, a Java "reference" variable is semantically closer to a C++ pointer variable. In fact, if you read the Java language specification, it will use the word pointer to describe what a reference variable is.

      ReplyDelete