1
+ [ Contents] ( ../Contents ) \| [ Previous (1.3 Numbers)] ( 03_Numbers ) \| [ Next (1.5 Lists)] ( 05_Lists )
2
+
1
3
# 1.4 Strings
2
4
3
- ### Representing Text
5
+ This section introduces way to work with text.
4
6
5
- String are text literals written in programs with quotes.
7
+ ### Representing Literal Text
8
+
9
+ String literals are written in programs with quotes.
6
10
7
11
``` python
8
12
# Single quote
@@ -20,12 +24,17 @@ look into my eyes, you're under.
20
24
'''
21
25
```
22
26
23
- Triple quotes capture all text enclosed in multiple lines.
27
+ Normally strings may only span a single line. Triple quotes capture all text enclosed across multiple lines
28
+ including all formatting.
29
+
30
+ There is no difference between using single (') versus double (")
31
+ quotes. The same type of quote used to start a string must be used to
32
+ terminate it.
24
33
25
34
### String escape codes
26
35
27
36
Escape codes are used to represent control characters and characters that can't be easily typed
28
- at the keyboard. Here are some common escape codes:
37
+ directly at the keyboard. Here are some common escape codes:
29
38
30
39
```
31
40
'\n' Line feed
@@ -38,8 +47,8 @@ at the keyboard. Here are some common escape codes:
38
47
39
48
### String Representation
40
49
41
- The characters in a string are Unicode and represent a so-called "code-point". You can
42
- specify an exact code-point using the following escape sequences:
50
+ Each character in a string is stored internally as a so-called Unicode "code-point" which is
51
+ an integer. You can specify an exact code-point value using the following escape sequences:
43
52
44
53
``` python
45
54
a = ' \xf1 ' # a = 'ñ'
@@ -54,6 +63,7 @@ available character codes.
54
63
### String Indexing
55
64
56
65
Strings work like an array for accessing individual characters. You use an integer index, starting at 0.
66
+ Negative indices specify a position relative to the end of the string.
57
67
58
68
``` python
59
69
a = ' Hello world'
@@ -62,7 +72,7 @@ c = a[4] # 'o'
62
72
d = a[- 1 ] # 'd' (end of string)
63
73
```
64
74
65
- You can also slice or select substrings with ` : ` .
75
+ You can also slice or select substrings specifying a range of indices with ` : ` .
66
76
67
77
``` python
68
78
d = a[:5 ] # 'Hello'
@@ -71,6 +81,8 @@ f = a[3:8] # 'lowo'
71
81
g = a[- 5 :] # 'world'
72
82
```
73
83
84
+ The character at the ending index is not included. Missing indices assume the beginning or ending of the string.
85
+
74
86
### String operations
75
87
76
88
Concatenation, length, membership and replication.
@@ -161,7 +173,8 @@ TypeError: 'str' object does not support item assignment
161
173
162
174
### String Conversions
163
175
164
- Use ` str() ` to convert any value to a string suitable for printing.
176
+ Use ` str() ` to convert any value to a string. The result is a string holding the
177
+ same text that would have been produced by the ` print() ` statement.
165
178
166
179
``` python
167
180
>> > x = 42
@@ -172,7 +185,7 @@ Use `str()` to convert any value to a string suitable for printing.
172
185
173
186
### Byte Strings
174
187
175
- A string of 8-bit bytes, commonly encountered with low-level I/O.
188
+ A string of 8-bit bytes, commonly encountered with low-level I/O, is written as follows:
176
189
177
190
``` python
178
191
data = b ' Hello World\r\n '
@@ -201,9 +214,13 @@ text = data.decode('utf-8') # bytes -> text
201
214
data = text.encode(' utf-8' ) # text -> bytes
202
215
```
203
216
217
+ The ` 'utf-8' ` argument specifies a character encoding. Other common
218
+ values include ` 'ascii' ` and ` 'latin1' ` .
219
+
204
220
### Raw Strings
205
221
206
- Raw strings are string literals with an uninterpreted backslash. They specified by prefixing the initial quote with a lowercase "r".
222
+ Raw strings are string literals with an uninterpreted backslash. They
223
+ are specified by prefixing the initial quote with a lowercase "r".
207
224
208
225
``` python
209
226
>> > rs = r ' c:\n ewdata\t est' # Raw (uninterpreted backslash)
@@ -237,9 +254,9 @@ is covered later.
237
254
238
255
## Exercises
239
256
240
- In these exercises, you experiment with operations on Python's string type.
241
- You should do this at the Python interactive prompt where you can easily see the results.
242
- Important note:
257
+ In these exercises, you'll experiment with operations on Python's
258
+ string type. You should do this at the Python interactive prompt
259
+ where you can easily see the results. Important note:
243
260
244
261
> In exercises where you are supposed to interact with the interpreter,
245
262
> ` >>> ` is the interpreter prompt that you get when Python wants
@@ -250,7 +267,7 @@ Important note:
250
267
251
268
Start by defining a string containing a series of stock ticker symbols like this:
252
269
253
- ``` pycon
270
+ ``` python
254
271
>> > symbols = ' AAPL,IBM,MSFT,YHOO,SCO'
255
272
>> >
256
273
```
@@ -259,7 +276,7 @@ Start by defining a string containing a series of stock ticker symbols like this
259
276
260
277
Strings are arrays of characters. Try extracting a few characters:
261
278
262
- ``` pycon
279
+ ``` python
263
280
>> > symbols[0 ]
264
281
?
265
282
>> > symbols[1 ]
@@ -273,8 +290,6 @@ Strings are arrays of characters. Try extracting a few characters:
273
290
>> >
274
291
```
275
292
276
- ### Exercise 1.14: Strings as read-only objects
277
-
278
293
In Python, strings are read-only.
279
294
280
295
Verify this by trying to change the first character of ` symbols ` to a lower-case 'a'.
@@ -287,22 +302,29 @@ TypeError: 'str' object does not support item assignment
287
302
>> >
288
303
```
289
304
290
- ### Exercise 1.15 : String concatenation
305
+ ### Exercise 1.14 : String concatenation
291
306
292
307
Although string data is read-only, you can always reassign a variable
293
308
to a newly created string.
294
309
295
310
Try the following statement which concatenates a new symbol "GOOG" to
296
311
the end of ` symbols ` :
297
312
298
- ``` pycon
313
+ ``` python
299
314
>> > symbols = symbols + ' GOOG'
300
315
>> > symbols
301
316
' AAPL,IBM,MSFT,YHOO,SCOGOOG'
302
317
>> >
303
318
```
304
319
305
- Oops! That's not what you wanted. Fix it so that the ` symbols ` variable holds the value ` 'HPQ,AAPL,IBM,MSFT,YHOO,SCO,GOOG' ` .
320
+ Oops! That's not what you wanted. Fix it so that the ` symbols ` variable holds the value ` 'AAPL,IBM,MSFT,YHOO,SCO,GOOG' ` .
321
+
322
+ ``` python
323
+ >> > symbols = ?
324
+ >> > symbols
325
+ ' AAPL,IBM,MSFT,YHOO,SCO,GOOG'
326
+ >> >
327
+ ```
306
328
307
329
In these examples, it might look like the original string is being
308
330
modified, in an apparent violation of strings being read only. Not
@@ -311,12 +333,12 @@ time. When the variable name `symbols` is reassigned, it points to the
311
333
newly created string. Afterwards, the old string is destroyed since
312
334
it's not being used anymore.
313
335
314
- ### Exercise 1.16 : Membership testing (substring testing)
336
+ ### Exercise 1.15 : Membership testing (substring testing)
315
337
316
338
Experiment with the ` in ` operator to check for substrings. At the
317
339
interactive prompt, try these operations:
318
340
319
- ``` pycon
341
+ ``` python
320
342
>> > ' IBM' in symbols
321
343
?
322
344
>> > ' AA' in symbols
@@ -326,13 +348,13 @@ True
326
348
>> >
327
349
```
328
350
329
- * Why did the check for "AA" return ` True ` ?*
351
+ * Why did the check for ` 'AA' ` return ` True ` ?*
330
352
331
- ### Exercise 1.17 : String Methods
353
+ ### Exercise 1.16 : String Methods
332
354
333
355
At the Python interactive prompt, try experimenting with some of the string methods.
334
356
335
- ``` pycon
357
+ ``` python
336
358
>> > symbols.lower()
337
359
?
338
360
>> > symbols
@@ -342,14 +364,14 @@ At the Python interactive prompt, try experimenting with some of the string meth
342
364
343
365
Remember, strings are always read-only. If you want to save the result of an operation, you need to place it in a variable:
344
366
345
- ``` pycon
367
+ ``` python
346
368
>> > lowersyms = symbols.lower()
347
369
>> >
348
370
```
349
371
350
372
Try some more operations:
351
373
352
- ``` pycon
374
+ ``` python
353
375
>> > symbols.find(' MSFT' )
354
376
?
355
377
>> > symbols[13 :17 ]
@@ -364,14 +386,14 @@ Try some more operations:
364
386
>> >
365
387
```
366
388
367
- ### Exercise 1.18 : f-strings
389
+ ### Exercise 1.17 : f-strings
368
390
369
391
Sometimes you want to create a string and embed the values of
370
392
variables into it.
371
393
372
394
To do that, use an f-string. For example:
373
395
374
- ``` pycon
396
+ ``` python
375
397
>> > name = ' IBM'
376
398
>> > shares = 100
377
399
>> > price = 91.1
@@ -383,6 +405,31 @@ To do that, use an f-string. For example:
383
405
Modify the ` mortgage.py ` program from [ Exercise 1.10] ( 03_Numbers ) to create its output using f-strings.
384
406
Try to make it so that output is nicely aligned.
385
407
408
+
409
+ ### Exercise 1.18: Regular Expressions
410
+
411
+ One limitation of the basic string operations is that they don't
412
+ support any kind of advanced pattern matching. For that, you
413
+ need to turn to Python's ` re ` module and regular expressions.
414
+ Regular expression handling is a big topic, but here is a short
415
+ example:
416
+
417
+ ``` python
418
+ >> > text = ' Today is 3/27/2018. Tomorrow is 3/28/2018.'
419
+ >> > # Find all occurrences of a date
420
+ >> > import re
421
+ >> > re.findall(r ' \d + /\d + /\d + ' , text)
422
+ [' 3/27/2018' , ' 3/28/2018' ]
423
+ >> > # Replace all occurrences of a date with replacement text
424
+ >> > re.sub(r ' ( \d + ) /( \d + ) /( \d + ) ' , r ' \3 -\1 -\2 ' , text)
425
+ ' Today is 2018-3-27. Tomorrow is 2018-3-28.'
426
+ >> >
427
+ ```
428
+
429
+ For more information about the ` re ` module, see the official documentation at
430
+ [ https://docs.python.org/library/re.html ] ( https://docs.python.org/3/library/re.html ) .
431
+
432
+
386
433
### Commentary
387
434
388
435
As you start to experiment with the interpreter, you often want to
0 commit comments