Let's look at two strings. One unicode (u"unicode string") and one not "not unicode string".
Python 2.7.2+ (default, Oct 4 2011, 20:03:08) >>> type("foo") type 'str' >>> type(u"foo") type "unicode" >>> u"foo" == "foo" True >>> u"foo" is "foo" False
So using == shows the two strings as equal, and 'is' doesn't. What's going on here?
Python interns its strings. Which means only one copy of each distinct string is stored. You can see this by using the built-in function id() to see the identity of our strings.
>>> a = "foo" >>> b = "foo" >>> c = u"foo" >>> print id(a) 3074129864 >>> print id(b) 3074129864 >>> print id(c) 3074128400You can see our normal strings have the same id because they are the same object. Our unicode string has a different id to our two 'normal' strings. Using the == operator asks python to compare equality of our two strings. Using 'is' compares the identity. As our unicode and normal string are different objects, comparing with 'is' returns false.
I wonder how many of us are guilty of misusing 'is' on strings?
No comments:
Post a Comment