String interning is a process in which a String object is moved into String pool.
You may have read that objects in the string pool are shared. Reference to a String object in the pool is shared in place of all those string literals that have same value as this String object has. This prevents the creation of new String objects.
Thus the main purpose of string interning is to save memory and increase performance of application by reducing the creation of new String objects.
Note: As described in this [article] when we say that String Pool contains String objects it actually means that String Pool contains 'references' to those String objects. Actual String objects are stored in permanent generation memory area. For the sake of convenience we use to say that String Pool contains String objects.
As you know that strings in Java can be created in two ways:
1) Using string literals and
2) Using "new" operator.
Strings created using String literals
Strings created using string literals are automatically interned and stored in string pool.
For example,
String s="John";
In above statement we have created a String object s using literal John. This string object would be automatically interned i.e automatically stored in string pool.
If we create two more strings with same value as:
String s1="John";
String s2="John";
All three reference variables - s,s1 and s2 would be pointing to same String object in the string pool.
Example,
public class StringInterning{
public static void main(String[] args){
String s="John";
String s1="John";
String s2="John";
System.out.println(s==s1); //will print true
System.out.println(s1==s2); //will print true
System.out.println(s==s2); //will print true
}
}
If these strings would not get automatically interned then there s, s1 and s2 would be pointing to three different string objects with same value.
String created using new operator
Note: Creating strings using new operator is not a good practice.
Strings created using new operator need to be interned explicitly.They do not get automatically interned. For example,
String s= new String("John");
Here s will not get interned automatically and would NOT be stored in string pool.
If you create two more strings with same value like:
String s1= new String("John");
String s2= new String("John");
Then s ,s1 and s2 would be pointing to three different string objects instead of single shared string object.
Example,
public class StringInterning{
public static void main(String[] args){
String s= new String("John");
String s1= new String("John");
String s2= new String("John");
System.out.println(s==s1); //will print false
System.out.println(s1==s2); //will print false
System.out.println(s==s2); //will print false
}
}
To make s,s1 and s2 point to same String object they need to be interned explicitly.
How to intern strings created by new? This is shown in following section.
The intern() method- Explicilty interning the Strings
We can intern strings which are created using new operator with intern() method of String class as:
String s=new String("John");
s1=s.intern();
s.intern() will first check whether the string pool already contains any String object having same value as s. If pool already contains a String object with value "John", intern() will return reference of that object from pool into s1. Otherwise it will create a new String object with same contents, add it into the pool and return its reference.
Now s and s1 would be two different String objects. s does not belongs to string pool while s1 belongs to string pool (and would be shared ).
s==s1 will give false.
If your write like this,
String s=new String("John");
s=s.intern();
Here you have stored the returned reference into same variable s. The reference to old original string object created by new is now lost and it will be garbage collected. Variable s is now pointing to interned string.
Example
public class StringInterning{
public static void main(String[] args){
String s= "John";
String s1= new String("John");
String s2= s1.intern();
System.out.println(s==s1); //will print false
System.out.println(s1==s2); //will print false
System.out.println(s==s2); //will print true
}
}
But why do we need interning Strings explicitly?
Suppose you have a database program which is retreiving names of employees from an Employee table who live in New York.
List namesList= new ArrayList();
ResultSet rs = stmt.executeQuery("select names from employee where city='New York'");
while(rs.next()){
namesList.add(rs.getString("name"));
}
Assume it retrieves 500 rows from table in which 150 employees have same name, say Sandy.
Now, rs.getString(); method of java.sql.ResultSet interface returns a new String object for every name it retrieves from table row. Therefore it means there would be now 500 different String objects in memory and out of which 150 String objects have same value Sandy. Is not this a memory wastage?
Now suppose we use intern() as:
List namesList= new ArrayList();
ResultSet rs = stmt.executeQuery("select names from employee where city='New York'");
while(rs.next()){
namesList.add(rs.getString("name").intern());
}
Now for our 150 same names of Sandy, there would be only one string object and that String object will lie in String pool.
This will save memory for 150 objects.
Thus you need explicit String interning :
1)To prevent duplicate strings in our program.
2)To prevent creation of new string objects having same values as of existing ones.
3) To save memory.
4) As creation of new objects is reduced and already present strings are shared, this increase the performance of application. (Though calling intern() many times also affects performance adversely. So you have to do a trade off between saving memory and performance hit caused by calling intern() in your program.)
Some facts about interned strings
1) Interned String can be compared using == .
You may have read in article String equality Check: equals() or == ? that when we check two strings for equality of contents we should always use equals() method of String class. You should not use == for comparing the contents of two strings.
But interned strings can be compared using == for contents also . This is because for interned strings same contents means same string objects.
This is described in detail in articleInterned String and use of ==.
2) == is faster than equals().
When you compare strings using ==, it only compares the reference variables and when you use equals() method it performs character by character comparision.Therefore using == is faster than using equals().
Now since you can use == for comparing the contents of interned strings, interning the strings makes your application faster if you use == in place of equals().
But you should always avoid it because using == for string's data comparision is a bad practice. If you forget to use intern() on any string and become habitual of using == for comparing string data, this can lead to unexpected results.
Moreover, if you look at the code of equals() method of String class, you will see that equals() first use == to check whether the reference variables of two strings under comparision are equal or not. If they are not equal then only it do character by character comparision otherwise returns false.
So equals() called on interned strings is faster than the equals() called on non interned strings.
3) Interned strings are stored on Permanent Generation Memoey Area.
Interned Strings are stored in Permanent Generation memory instead of heap memory( assuming permanent generation is separate from heap). Permanent generation memory has limited size. So when you call intern() on two many strings with different values, all those interned strings would be added to Permanent Generation area. This may end up in java.lang.OutOfMemoryError: PermGen space.
Note that above figures corresponds to pre -Java 7 versions. But in Java7 interned strings are no longer stored in Permanent Generation area but stored in heap memory. Thus it removes the memory restriction on interned strings.
You may have read that objects in the string pool are shared. Reference to a String object in the pool is shared in place of all those string literals that have same value as this String object has. This prevents the creation of new String objects.
Thus the main purpose of string interning is to save memory and increase performance of application by reducing the creation of new String objects.
Note: As described in this [article] when we say that String Pool contains String objects it actually means that String Pool contains 'references' to those String objects. Actual String objects are stored in permanent generation memory area. For the sake of convenience we use to say that String Pool contains String objects.
As you know that strings in Java can be created in two ways:
1) Using string literals and
2) Using "new" operator.
Strings created using String literals
Strings created using string literals are automatically interned and stored in string pool.
For example,
String s="John";
In above statement we have created a String object s using literal John. This string object would be automatically interned i.e automatically stored in string pool.
If we create two more strings with same value as:
String s1="John";
String s2="John";
All three reference variables - s,s1 and s2 would be pointing to same String object in the string pool.
Example,
public class StringInterning{
public static void main(String[] args){
String s="John";
String s1="John";
String s2="John";
System.out.println(s==s1); //will print true
System.out.println(s1==s2); //will print true
System.out.println(s==s2); //will print true
}
}
If these strings would not get automatically interned then there s, s1 and s2 would be pointing to three different string objects with same value.
String created using new operator
Note: Creating strings using new operator is not a good practice.
Strings created using new operator need to be interned explicitly.They do not get automatically interned. For example,
String s= new String("John");
Here s will not get interned automatically and would NOT be stored in string pool.
If you create two more strings with same value like:
String s1= new String("John");
String s2= new String("John");
Then s ,s1 and s2 would be pointing to three different string objects instead of single shared string object.
Example,
public class StringInterning{
public static void main(String[] args){
String s= new String("John");
String s1= new String("John");
String s2= new String("John");
System.out.println(s==s1); //will print false
System.out.println(s1==s2); //will print false
System.out.println(s==s2); //will print false
}
}
To make s,s1 and s2 point to same String object they need to be interned explicitly.
How to intern strings created by new? This is shown in following section.
The intern() method- Explicilty interning the Strings
We can intern strings which are created using new operator with intern() method of String class as:
String s=new String("John");
s1=s.intern();
s.intern() will first check whether the string pool already contains any String object having same value as s. If pool already contains a String object with value "John", intern() will return reference of that object from pool into s1. Otherwise it will create a new String object with same contents, add it into the pool and return its reference.
Now s and s1 would be two different String objects. s does not belongs to string pool while s1 belongs to string pool (and would be shared ).
s==s1 will give false.
If your write like this,
String s=new String("John");
s=s.intern();
Here you have stored the returned reference into same variable s. The reference to old original string object created by new is now lost and it will be garbage collected. Variable s is now pointing to interned string.
Example
public class StringInterning{
public static void main(String[] args){
String s= "John";
String s1= new String("John");
String s2= s1.intern();
System.out.println(s==s1); //will print false
System.out.println(s1==s2); //will print false
System.out.println(s==s2); //will print true
}
}
But why do we need interning Strings explicitly?
Suppose you have a database program which is retreiving names of employees from an Employee table who live in New York.
List
ResultSet rs = stmt.executeQuery("select names from employee where city='New York'");
while(rs.next()){
namesList.add(rs.getString("name"));
}
Assume it retrieves 500 rows from table in which 150 employees have same name, say Sandy.
Now, rs.getString(); method of java.sql.ResultSet interface returns a new String object for every name it retrieves from table row. Therefore it means there would be now 500 different String objects in memory and out of which 150 String objects have same value Sandy. Is not this a memory wastage?
Now suppose we use intern() as:
List
ResultSet rs = stmt.executeQuery("select names from employee where city='New York'");
while(rs.next()){
namesList.add(rs.getString("name").intern());
}
Now for our 150 same names of Sandy, there would be only one string object and that String object will lie in String pool.
This will save memory for 150 objects.
Thus you need explicit String interning :
1)To prevent duplicate strings in our program.
2)To prevent creation of new string objects having same values as of existing ones.
3) To save memory.
4) As creation of new objects is reduced and already present strings are shared, this increase the performance of application. (Though calling intern() many times also affects performance adversely. So you have to do a trade off between saving memory and performance hit caused by calling intern() in your program.)
Some facts about interned strings
1) Interned String can be compared using == .
You may have read in article String equality Check: equals() or == ? that when we check two strings for equality of contents we should always use equals() method of String class. You should not use == for comparing the contents of two strings.
But interned strings can be compared using == for contents also . This is because for interned strings same contents means same string objects.
This is described in detail in article
2) == is faster than equals().
When you compare strings using ==, it only compares the reference variables and when you use equals() method it performs character by character comparision.Therefore using == is faster than using equals().
Now since you can use == for comparing the contents of interned strings, interning the strings makes your application faster if you use == in place of equals().
But you should always avoid it because using == for string's data comparision is a bad practice. If you forget to use intern() on any string and become habitual of using == for comparing string data, this can lead to unexpected results.
Moreover, if you look at the code of equals() method of String class, you will see that equals() first use == to check whether the reference variables of two strings under comparision are equal or not. If they are not equal then only it do character by character comparision otherwise returns false.
So equals() called on interned strings is faster than the equals() called on non interned strings.
3) Interned strings are stored on Permanent Generation Memoey Area.
Interned Strings are stored in Permanent Generation memory instead of heap memory( assuming permanent generation is separate from heap). Permanent generation memory has limited size. So when you call intern() on two many strings with different values, all those interned strings would be added to Permanent Generation area. This may end up in java.lang.OutOfMemoryError: PermGen space.
Note that above figures corresponds to pre -Java 7 versions. But in Java7 interned strings are no longer stored in Permanent Generation area but stored in heap memory. Thus it removes the memory restriction on interned strings.
I would like to know your comments and if you liked the article then please share it on social networking buttons.
No comments:
Post a Comment