forked from mikeckennedy/talk-python-transcripts
-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy path022_cpython.vtt
More file actions
2615 lines (1743 loc) · 89.1 KB
/
022_cpython.vtt
File metadata and controls
2615 lines (1743 loc) · 89.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
WEBVTT
00:00:00.001 --> 00:00:06.040
It's time to look deep within the machine and understand what really happens when your Python code executes.
00:00:06.040 --> 00:00:11.720
We're code walking through the CPython code base and visualizing it at pythontutor.com.
00:00:11.720 --> 00:00:17.660
This is episode number 22 with Philip Guau, recorded Monday, August 3rd, 2015.
00:00:17.660 --> 00:00:47.540
Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the
00:00:47.540 --> 00:00:49.160
ecosystem, and the personalities.
00:00:49.160 --> 00:00:51.180
This is your host, Michael Kennedy.
00:00:51.180 --> 00:00:56.200
Follow me on Twitter where I'm @mkennedy, and keep up with the show and listen to past
00:00:56.200 --> 00:00:58.720
episodes at talkpython.fm.
00:00:58.720 --> 00:01:02.160
Be sure to follow the show on Twitter where it's at talkpython.
00:01:02.160 --> 00:01:06.480
This episode is brought to you by Hired and CodeChip.
00:01:06.480 --> 00:01:11.740
Thank them both for supporting the show on Twitter via at HiredHQ and at CodeChip.
00:01:11.740 --> 00:01:14.160
Now let me introduce Philip.
00:01:14.540 --> 00:01:19.360
Philip Guau is an assistant professor of computer science at the University of Rochester in New York.
00:01:19.360 --> 00:01:24.880
He researches human-computer interactions with a focus on user interfaces for online learning.
00:01:24.880 --> 00:01:29.780
He's especially interested in studying how to better train software engineers and data scientists.
00:01:30.000 --> 00:01:35.120
He created a free web-based visualization tool for learning programming called Online Python Tutor
00:01:35.120 --> 00:01:45.020
at pythontutor.com, which has been used by over 1.2 million people in over 165 countries to visualize over 11 million pieces of code.
00:01:45.020 --> 00:01:46.760
Philip, welcome to the show.
00:01:46.760 --> 00:01:47.620
My pleasure.
00:01:48.300 --> 00:01:50.160
Yeah, it's really exciting to have you here.
00:01:50.160 --> 00:01:53.020
We're going to talk a lot about many things.
00:01:53.020 --> 00:02:02.820
We're going to talk about CPython and a really cool project that you put on your website and on YouTube called CPython, a 10-hour code walk.
00:02:02.820 --> 00:02:04.860
And so we'll be digging into CPython.
00:02:04.860 --> 00:02:11.900
And we're also going to talk about this thing called Python Tutor at pythontutor.com that you are working to help people understand the internals of Python better.
00:02:12.400 --> 00:02:14.000
So that's going to be great stuff.
00:02:14.000 --> 00:02:14.680
Cool.
00:02:14.680 --> 00:02:16.140
I'm looking forward to it.
00:02:16.140 --> 00:02:16.520
Yeah.
00:02:16.520 --> 00:02:22.660
Before we get into the details, though, you know, everyone likes to know how people got into programming and how they got started in Python.
00:02:22.660 --> 00:02:23.460
What's your story?
00:02:23.460 --> 00:02:29.860
So my story was I was always interested in computers as a kid, like many people who got into computer science.
00:02:29.860 --> 00:02:34.720
But I never really had a strong programming background until I went to college.
00:02:34.720 --> 00:02:38.520
So I tried to learn QBasic by myself when I was 10.
00:02:39.080 --> 00:02:43.220
And that, you know, I had a book and then I failed after a few weeks because I had no one teaching me.
00:02:43.220 --> 00:02:46.080
I took an AP computer science course in high school.
00:02:46.080 --> 00:02:46.980
That was in C++.
00:02:46.980 --> 00:02:48.660
And that was really fun.
00:02:48.660 --> 00:02:52.440
And that was kind of my first introduction to really doing programming.
00:02:52.440 --> 00:02:57.580
And in college, I decided to major in electrical engineering and computer science.
00:02:57.760 --> 00:03:01.320
And that's when I started just learning programming formally.
00:03:01.320 --> 00:03:09.640
But really, the Python relevance is I didn't actually start hacking for fun until my about my senior year of college.
00:03:09.640 --> 00:03:15.980
And the first language that I learned for programming for fun and not just because I had to do it for class was actually Python.
00:03:15.980 --> 00:03:28.380
So the first kinds of programs I wrote were scripts to manage my photos and, you know, kind of manipulate and manage my own personal photo gallery and, you know, put it up on a simple website.
00:03:28.380 --> 00:03:30.660
So that was where I got started getting hooked on Python.
00:03:30.660 --> 00:03:32.460
That was, you know, it was about 10 years ago.
00:03:32.460 --> 00:03:33.680
That was around 2005.
00:03:33.680 --> 00:03:36.040
That was like Python 2.4 or something like that.
00:03:36.440 --> 00:03:38.040
Yeah, that's a great way to get started.
00:03:38.040 --> 00:03:44.680
I think a lot of people have interesting stories like that, you know, just they have some small problem they're trying to solve.
00:03:44.680 --> 00:03:47.700
And, you know, it leads you down this path.
00:03:47.700 --> 00:03:52.240
And all of a sudden, you discover this world where, hey, there's this great thing, you know, programming or Python or whatever.
00:03:52.240 --> 00:03:54.160
Yep, that's exactly right.
00:03:54.160 --> 00:03:57.520
So I see you're calling in from Seattle, right?
00:03:57.520 --> 00:03:58.340
What are you doing up there?
00:03:58.340 --> 00:04:05.340
So I am currently an assistant professor of computer science at the University of Rochester in upstate New York.
00:04:05.340 --> 00:04:06.560
So that's nowhere near Seattle.
00:04:06.560 --> 00:04:07.200
That's what I was going to say.
00:04:07.200 --> 00:04:08.880
You're not – it's not at all in Seattle.
00:04:08.880 --> 00:04:19.860
So I get to – one of the real benefits of being a professor is that your summers are free to do research or to travel or to do other sorts of scholarly work.
00:04:19.860 --> 00:04:27.080
So most professors in most terms, they stay on campus in the summers and they do research full time for three months.
00:04:27.300 --> 00:04:41.460
What I decided to do this summer since I had some colleagues at Microsoft was to spend most of my summer at Microsoft Research doing research and both in software engineering and in online education at the lab in Seattle.
00:04:41.460 --> 00:04:47.020
And I came here because I actually was an intern here a long time ago when I was back in grad school.
00:04:47.020 --> 00:04:49.320
So I'm actually back interning in the same group.
00:04:49.320 --> 00:04:51.200
So it's sort of a homecoming of sorts.
00:04:51.680 --> 00:04:52.620
Back to the future.
00:04:52.620 --> 00:04:53.200
That's excellent.
00:04:53.200 --> 00:04:55.560
Yeah, I've done some work with some of the guys up at Microsoft.
00:04:55.560 --> 00:04:57.020
It's a cool place up there.
00:04:57.020 --> 00:04:57.720
So excellent.
00:04:57.720 --> 00:05:00.080
Is this related to PythonTutor.com?
00:05:00.080 --> 00:05:01.820
No, not really.
00:05:01.820 --> 00:05:04.620
I mean, this is just a completely separate sort of research project.
00:05:04.620 --> 00:05:08.520
So there's nothing Python related in the work here, unfortunately.
00:05:08.520 --> 00:05:09.880
All right.
00:05:09.880 --> 00:05:10.160
Cool.
00:05:10.160 --> 00:05:11.620
All right.
00:05:11.640 --> 00:05:16.020
So let's talk about your CPython internals class.
00:05:16.020 --> 00:05:19.580
This was a class you did at University of Rochester, right?
00:05:19.580 --> 00:05:21.160
2014, I think.
00:05:21.160 --> 00:05:23.220
At least the recorded version was 2014.
00:05:23.220 --> 00:05:24.600
Yep.
00:05:24.600 --> 00:05:27.480
So this was a class I taught in fall 2014.
00:05:27.480 --> 00:05:32.120
And the name of the course was Dynamic Languages and Software Development.
00:05:32.120 --> 00:05:40.500
So I actually inherited this course from another professor who was taking a leave and teaching another class that term.
00:05:41.000 --> 00:05:43.080
And that class was originally in Ruby.
00:05:43.080 --> 00:05:48.980
So it was sort of a graduate level programming languages class about these sorts of dynamically type languages.
00:05:48.980 --> 00:05:50.840
And originally he did it in Ruby.
00:05:50.840 --> 00:05:59.600
But since I knew Python a lot better, I revamped the class to be in Python and basically turned it into what the videos are online.
00:05:59.600 --> 00:06:01.500
So I'd be happy to talk about that in detail.
00:06:02.280 --> 00:06:05.340
Just for everyone listening, the videos are online.
00:06:05.340 --> 00:06:08.260
And I actually spent like the last week going through your class.
00:06:08.260 --> 00:06:13.660
So I feel like I've had like some super intense summer course or something, you know, doing like 10 lectures.
00:06:14.440 --> 00:06:21.600
And people can find those on your website at pgbovine.net slash cpython dash internals dot htm.
00:06:21.600 --> 00:06:30.280
And I actually went through, unrelated to this conversation or maybe preceding this whole having you on the show, I just saw your videos and thought they were awesome.
00:06:30.280 --> 00:06:34.120
And I put them into a YouTube playlist at bit.ly slash cpython walk.
00:06:34.260 --> 00:06:35.580
So both of those work well.
00:06:35.580 --> 00:06:38.080
What was the main goal of the class?
00:06:38.080 --> 00:06:43.960
Sort of get people to understand what happens when you actually run dynamic code like Python?
00:06:43.960 --> 00:06:48.800
Yeah, I think that was basically that was basically a philosophy.
00:06:48.800 --> 00:06:54.200
So a lot of programming languages classes are taught from more of a theoretical perspective.
00:06:54.200 --> 00:06:54.640
Right.
00:06:54.640 --> 00:07:01.100
So it's usually kind of some formal syntax and semantics and maybe doing some proofs.
00:07:01.100 --> 00:07:04.300
And it's very, you know, kind of a formalism heavy.
00:07:04.300 --> 00:07:15.980
And I thought it would be interesting to do a very different sort of class for graduate students from the opposite side, which is something extremely applied to saying, you know, here is a here is a piece of Python code.
00:07:15.980 --> 00:07:20.460
Let's start with hello world or a simple for loop or a simple function call.
00:07:20.460 --> 00:07:29.200
And what actually happens throughout all the steps between that code being parsed and then the output appearing on your screen, let's say.
00:07:29.480 --> 00:07:39.160
So I wanted to dive into the interpreter and show students how everything worked under the hood and how there's really, you know, by deconstructing, you can show that there's really no magic here.
00:07:39.160 --> 00:07:43.160
There's just a lot of C code behind the scenes that keeps track of a lot of stuff.
00:07:43.160 --> 00:07:45.720
And eventually your program runs.
00:07:45.820 --> 00:07:50.960
So we don't do the parsing stage because I think parsing is fairly standard.
00:07:50.960 --> 00:07:54.860
And that's covered by most kind of introductory compilers classes.
00:07:54.860 --> 00:08:00.720
You write a grammar and parser generator and some code gives you a like an AST.
00:08:00.720 --> 00:08:03.940
And then that gets walked to turn into some kind of bytecode.
00:08:04.020 --> 00:08:07.800
So the class actually starts with assuming you have a bunch of Python bytecode.
00:08:07.800 --> 00:08:16.240
How does the bytecode actually get interpreted step by step by the by the interpreter runtime system to do your your programs operations?
00:08:16.240 --> 00:08:17.640
Yeah, that's really cool.
00:08:17.640 --> 00:08:32.920
And I think, you know, if I think about how like C code runs and then my intuition about how that C code actually executes, if you understand a little bit about registers and memory addresses and pointers, your intuition more or less will carry the day.
00:08:32.920 --> 00:08:36.940
I think with interpreted languages, all bets are off.
00:08:36.940 --> 00:08:37.180
Right.
00:08:37.240 --> 00:08:44.820
I mean, you have some concept of the programming language doing things, but then the way that happens, you really have to look inside.
00:08:44.820 --> 00:08:45.100
Right.
00:08:45.100 --> 00:08:47.060
Yeah, exactly.
00:08:47.060 --> 00:08:53.380
Because the these interpret languages are often not implemented like you would conceptually think of it.
00:08:53.380 --> 00:08:53.520
Right.
00:08:53.520 --> 00:08:57.760
You think of something as you have frames and variables and pointers to each other.
00:08:57.760 --> 00:09:04.680
But really, these bytecodes are this sort of the Python one is sort of this stack based kind of virtual machine.
00:09:04.820 --> 00:09:08.480
I think Java, the Java virtual machine is that way, too, but I forgot the exact semantics.
00:09:08.480 --> 00:09:11.200
But it's not something you would think about normally, but they do it that way.
00:09:11.200 --> 00:09:18.940
One, because it's really compact and it's kind of leads to really compact code and sort of easy to understand code for the implementer.
00:09:18.940 --> 00:09:24.480
But yeah, but that's very different than the conceptual model in your head at the very high level, how a program ought to work.
00:09:24.480 --> 00:09:30.060
And we can talk about that later when we talk about the Python tutor as well, because that kind of leads into that other tool.
00:09:30.060 --> 00:09:32.420
So we can keep talking about the CPython stuff first.
00:09:32.420 --> 00:09:43.380
Sure. So one of the things I thought was interesting was in your very first session, you did you have kind of a cool whiteboarding thing you're doing with a Microsoft Surface and like a pen where you can kind of draw on the code.
00:09:43.380 --> 00:09:44.080
And that's cool.
00:09:44.420 --> 00:09:50.900
You do a cool little sketch about what actually happens when you type Python space some file that py.
00:09:50.900 --> 00:09:54.700
And I mean, on one level, I knew it.
00:09:54.700 --> 00:09:56.480
On the other, it was a little surprising to me to say.
00:09:56.480 --> 00:09:57.740
And the first step is compilation.
00:09:58.500 --> 00:10:05.740
Can you maybe like talk just briefly about like what happens when I run my Python code before we get into the interpreter itself?
00:10:05.740 --> 00:10:06.960
Yeah.
00:10:06.960 --> 00:10:20.600
So many people are surprised when there's a compilation step in Python or in these sorts of dynamic or what people call scripting languages, because usually you think of running Python space, whatever, or Perl space, whatever, Ruby space, whatever.
00:10:20.760 --> 00:10:21.840
And it just runs, right?
00:10:21.840 --> 00:10:22.100
Right.
00:10:22.100 --> 00:10:24.660
I just thought, here we go with the interpreter.
00:10:24.660 --> 00:10:26.100
And now it's interpreting, right?
00:10:26.100 --> 00:10:27.420
Right.
00:10:27.420 --> 00:10:33.140
So with Java or C or C#, you have a compilation step and then you run a compiled binary.
00:10:33.140 --> 00:10:34.580
And there's two separate steps.
00:10:34.580 --> 00:10:40.680
But with Python, as with many other languages, the compilation happens before the execution.
00:10:40.680 --> 00:10:46.960
So what happens is as a standard kind of a front end to a compiler, it takes the source code.
00:10:46.960 --> 00:10:48.560
It does the lexical analysis.
00:10:48.560 --> 00:10:49.640
It does the parsing.
00:10:49.640 --> 00:10:52.700
It creates a AST or abstract syntax tree from that.
00:10:52.700 --> 00:10:56.400
And then it walks that tree and creates a bunch of bytecode.
00:10:56.400 --> 00:11:16.160
So the Python bytecode language, you can read in the documentation, it has, I don't know, a few dozen operations like add, load, store, and also some operations that are a little bit more Python specific, like build a list, build a dictionary, build a tuple, function call, those sorts of things.
00:11:16.160 --> 00:11:26.260
So the compilation step really takes your source code, which is in human readable, somewhat human readable form, and turns it into a linear stream of instructions.
00:11:26.260 --> 00:11:34.660
Very much like assembly language, except you can think of bytecode as an assembly language for a Python virtual computer.
00:11:35.440 --> 00:11:46.480
Right. That was kind of the impression I got as well, like a much, much richer assembly language where you have operations like build class and call method, push and pop stuff off stacks and so on.
00:11:46.480 --> 00:11:47.780
Yep, exactly.
00:11:47.780 --> 00:11:56.020
If we want to go work with this, right, we can go to python.org and download the code and decompress it or untar it or whatever.
00:11:56.020 --> 00:11:59.540
And it's just, it literally is a bunch of C code, right?
00:11:59.540 --> 00:12:03.640
The C in CPython is, here's your C implementation of this interpreter, right?
00:12:03.640 --> 00:12:04.540
That's right.
00:12:04.640 --> 00:12:07.280
So if you go to, this is what I do on the first day of class.
00:12:07.280 --> 00:12:19.640
We have everybody download the C interpreter source code from, sorry, the CPython source code from python.org and unzip it and do configure and make.
00:12:19.640 --> 00:12:26.440
Now, part of the class, I didn't require students to actually run the interpreter if they didn't want,
00:12:26.440 --> 00:12:29.800
because most of the class was actually reading through the code and walking through it.
00:12:29.800 --> 00:12:34.580
Now, the students who were a bit more adventurous, they could try to compile the interpreter themselves.
00:12:34.580 --> 00:12:40.500
And then try to, you know, put in debug statements or print statements to see how it works behind the scenes.
00:12:40.500 --> 00:12:49.400
But actually, compile interpreter itself might not be easy if you're on, say, especially, say, on a Windows machine, which doesn't have a lot of development tools, compilers.
00:12:49.400 --> 00:12:51.520
So I'm usually on Linux and Mac machines.
00:12:51.520 --> 00:12:58.860
If you install the standard developer tool chain with GCC and make and configure and all that stuff.
00:12:58.860 --> 00:13:01.760
In theory, right, building is always hard.
00:13:01.760 --> 00:13:10.120
But in theory, if you do dot slash configure and then and you type make all your your you'll actually call the C compiler on your machine.
00:13:10.120 --> 00:13:18.040
And it will compile all the C files and the C and the H files in the CPython slash directory.
00:13:18.040 --> 00:13:22.500
And in the end, it will produce a binary executable file called Python.
00:13:22.500 --> 00:13:24.720
And that Python you can just run.
00:13:24.720 --> 00:13:29.220
And that is the Python interpreter that you just compiled from C source code.
00:13:29.340 --> 00:13:34.980
So most of the class, what we do is we go over what a lot of those C files actually do and see.
00:13:34.980 --> 00:13:44.140
Maybe you could give us like a 10,000 foot view of what are the interesting parts of that source code and what is just noise and details.
00:13:44.140 --> 00:13:49.360
So there's like objects and then there's include there's see eval dot C.
00:13:49.360 --> 00:13:52.980
There's there's like a few really common parts that you come back to over and over and over.
00:13:53.080 --> 00:13:54.320
And then there's a bunch of details.
00:13:54.320 --> 00:13:55.580
Yeah.
00:13:55.580 --> 00:14:00.500
So on the Web site with all the videos, I actually show the files that they reference.
00:14:00.500 --> 00:14:08.520
But really, the core file that I keep on going back to what you're saying is in Python slash C eval dot C.
00:14:08.520 --> 00:14:12.840
And what that is that that file at its core is the main interpreter loop.
00:14:12.840 --> 00:14:19.720
So conceptually, how you execute how Python executes code is a byte code is just a bunch of them.
00:14:20.700 --> 00:14:22.580
It's just a list of instructions.
00:14:22.580 --> 00:14:28.120
Each one is add or subtract or build list or function call or so forth.
00:14:28.120 --> 00:14:40.960
And all the interpreter does is just go through one instruction at a time, take it off the list of instructions, do something and then move to the next instruction, do something, move the instruction and then do something else.
00:14:41.580 --> 00:14:46.440
And it might jump around the stream of instructions if you have, say, a function call or a loop.
00:14:46.440 --> 00:14:54.560
But really, the main interpreter loop in C eval dot C, all it does is it's just a big, wild, true, infinite loop that just.
00:14:54.560 --> 00:14:56.700
Yeah, there's like a huge switch statement.
00:14:56.700 --> 00:14:57.700
And it is huge, right?
00:14:57.700 --> 00:14:58.200
That's right.
00:14:58.200 --> 00:15:02.200
Yeah, there's like a 3000 or whatever line switch statement.
00:15:02.320 --> 00:15:03.860
There's a fun fact in there.
00:15:03.860 --> 00:15:15.840
If you actually I don't know if it's in all the versions, but at least in some of the versions I saw, there's some kind of comment in there saying that they needed to like break up the switch statement in some weird way.
00:15:15.840 --> 00:15:21.040
Because some C compilers just can't take switch statements that are that big.
00:15:21.040 --> 00:15:28.760
So they had to actually break up the code into pieces because, you know, it wouldn't compile on some kind of computers because that code was just too giant.
00:15:28.760 --> 00:15:30.100
Yeah, that's pretty funny.
00:15:30.100 --> 00:15:31.680
It's like a 3000 line switch statement.
00:15:31.680 --> 00:15:32.380
It's pretty cool.
00:15:32.380 --> 00:15:37.280
But those are more or less the steps that have all the opcodes.
00:15:37.800 --> 00:15:50.340
And so if I look at Python, it's not necessarily mapping one to one the Python code I write to these opcodes, which is a good thing for Python programmers, right?
00:15:50.340 --> 00:15:52.000
That means you're working in a high level language.
00:15:52.000 --> 00:15:54.220
You're not working like down in the detail, right?
00:15:54.220 --> 00:16:01.700
But it also means it's hard for me to understand if I write, you know, create a class and I say, you know, T equals new test class.
00:16:01.700 --> 00:16:03.420
What does that actually mean?
00:16:03.420 --> 00:16:04.620
Like, how do I line that up?
00:16:04.620 --> 00:16:07.480
And so you had a cool way to disassemble that, right?
00:16:07.720 --> 00:16:08.280
And look at it.
00:16:37.640 --> 00:16:42.960
Currently, candidates receive five or more offers in just the first week and there are no obligations ever.
00:16:42.960 --> 00:16:45.060
Sounds pretty awesome, doesn't it?
00:16:45.060 --> 00:16:47.100
Well, did I mention there's a signing bonus?
00:16:47.100 --> 00:16:51.200
Everyone who accepts a job from Hired gets a $2,000 signing bonus.
00:16:51.200 --> 00:16:55.540
And as Talk Python listeners, it gets way sweeter.
00:16:55.540 --> 00:17:03.100
Use the link Hired.com slash Talk Python To Me and Hired will double the signing bonus to $4,000.
00:17:04.100 --> 00:17:04.820
Opportunity's knocking.
00:17:04.820 --> 00:17:08.440
Visit Hired.com slash Talk Python To Me and answer the call.
00:17:18.560 --> 00:17:19.620
Right, right.
00:17:19.620 --> 00:17:25.740
So the disassembler actually comes in the standard Python library.
00:17:25.740 --> 00:17:40.640
So if you do, right, so if you do Python space dash M space DIS, which runs the disk module space, the Python file name, name of Python file, I'll actually run the main function in the DIS module.
00:17:41.360 --> 00:17:46.900
And what that will do is I'll actually print out a somewhat human readable representation of the bytecode.
00:17:46.900 --> 00:17:55.640
And the cool thing about that is that it shows the line number of which line of your Python source code compiles into which bytecode.
00:17:55.740 --> 00:17:57.880
And as you mentioned, it's not a one-to-one mapping.
00:17:57.880 --> 00:18:03.220
So one line usually compiles to several bytecodes because the bytecode is at a lower level.
00:18:03.220 --> 00:18:05.420
So you can run that DIS command.
00:18:05.420 --> 00:18:16.060
And the DIS module, you can just search for, if you search on your favorite search engine for Python space DIS, you should see the documentation for this disassembler module.
00:18:16.060 --> 00:18:20.720
And that is in the standard library, and that gives you all of the stuff.
00:18:20.720 --> 00:18:24.860
So now that said, though, that only prints out the instructions.
00:18:24.860 --> 00:18:31.480
There was somebody who made a library called byteplay, which is B-Y-T-E-P-L-A-Y.
00:18:31.480 --> 00:18:39.700
And that library actually is an enhanced version of the disassembler that lets you get the disassembled bytecode into objects.
00:18:39.700 --> 00:18:41.440
You can actually play with it yourself.
00:18:41.440 --> 00:18:42.980
You can manipulate it.
00:18:42.980 --> 00:18:44.840
You can, you know, take it apart.
00:18:44.840 --> 00:18:45.800
You can analyze it.
00:18:45.800 --> 00:18:51.520
So this byteplay library, I haven't used it myself personally, but I know people who really like playing with it.
00:18:51.520 --> 00:18:52.840
Yeah, that's cool.
00:18:52.840 --> 00:18:53.660
A little more powerful.
00:18:53.660 --> 00:19:02.120
One thing about the DIS module is it's super easy to look at just sort of flat code in Python files.
00:19:02.120 --> 00:19:07.640
But if I want to look at the functions or I've got nested functions and classes, it's a little more work to do that, right?
00:19:07.640 --> 00:19:09.000
Yeah.
00:19:09.000 --> 00:19:15.540
So the default with the DIS module is it just disassembles the top level of your program.
00:19:15.540 --> 00:19:20.580
So all the top level says is that if you define a function, it'll just say function definition.
00:19:20.800 --> 00:19:27.960
And then what you have to do is you actually have to go inside that function and disassemble that function itself.
00:19:27.960 --> 00:19:29.680
So it is a little bit more hairy.
00:19:29.680 --> 00:19:34.060
And I don't know if byteplay handles all that out of the box, but it might.
00:19:34.060 --> 00:19:39.760
But the idea is that the DIS module, if you just run it by default, it will just disassemble the top level program.
00:19:39.760 --> 00:19:42.580
And any functions will not be disassembled automatically.
00:19:42.580 --> 00:19:47.460
You have to actually grab the code of those functions and go in there and call dis on that.
00:19:47.460 --> 00:19:50.400
So it is a little bit more tricky to do that.
00:19:50.400 --> 00:19:51.260
Sure.
00:19:51.260 --> 00:19:59.640
The other thing I thought was interesting is if I've got a function, let's say foo, in Python, I could say, what is it?
00:19:59.720 --> 00:20:03.400
Foo.func underscore bytecode.
00:20:03.400 --> 00:20:06.480
How do I – the bytecode is actually there on the function.
00:20:06.480 --> 00:20:12.220
And you can look at it in its encoded form, which is kind of some binary string type thing.
00:20:12.980 --> 00:20:15.000
And then you can also disassemble that as well, right?
00:20:15.000 --> 00:20:16.320
That's right.
00:20:16.320 --> 00:20:18.640
And that's what I think we're just leading into that.
00:20:18.640 --> 00:20:26.280
So the idea is that DIS itself, if you just run it, it disassembles the bytecode of the, I guess, of the top level file.
00:20:26.280 --> 00:20:29.180
But each function itself has its own code.
00:20:29.180 --> 00:20:34.200
And like you said, I think it's – it's actually different in Python 2 and 3, the name of it.
00:20:34.200 --> 00:20:39.500
But I think in one version it's like the function object dot func underscore code.
00:20:39.500 --> 00:20:43.100
The other one is just like just dot code or something like that.
00:20:43.100 --> 00:20:50.340
But the idea is that the code of the function just appears inside of it as a binary string of data.
00:20:50.340 --> 00:20:53.960
So if you actually print it out, it just looks like some garbled string.
00:20:53.960 --> 00:20:58.740
But if you run it through some – you can run it through some pretty printing function or through DIS.
00:20:58.740 --> 00:21:01.740
And it actually shows you the bytecode of the function.
00:21:01.740 --> 00:21:13.620
Because all a function object is that it's some context plus an actual string of bytecode that represents what the instructions are that the function is supposed to execute when you run it.
00:21:14.280 --> 00:21:22.740
Yeah, the other thing I thought was pretty cool is – or interesting to understand is that sort of compile step that you talk about, right?
00:21:22.740 --> 00:21:30.200
When I run Python My Python file, I get first like a compile step to bytecode and then the dynamic interpreted execution.
00:21:30.200 --> 00:21:34.220
But all those functions and stuff, that bytecode is there and ready to roll.
00:21:34.220 --> 00:21:38.700
It's just not kind of wired together until it gets to the interpreter, right?
00:21:39.700 --> 00:21:40.240
That's right.
00:21:40.240 --> 00:21:48.200
So you can actually compile – I think it's just the Python interpreter does the compiling and running all at the same time.
00:21:48.200 --> 00:21:57.320
But I think there's actually a mode in Python that you can just compile to – you can just compile the bytecode and not actually run it yet.
00:21:57.320 --> 00:21:59.980
I'm not sure exactly which flags are that one.
00:21:59.980 --> 00:22:06.400
But sometimes people actually ship pre-compiled Python bytecode instead of the source code.
00:22:06.820 --> 00:22:12.500
So there's – I don't know what reason people do this because you can just run the source code.
00:22:12.500 --> 00:22:20.140
And some people like to obfuscate their bytecode maybe, but I don't know how well that actually works because you can kind of reverse engineer it.
00:22:20.140 --> 00:22:24.780
But yeah, so the compile step is completely separate from the running step.
00:22:25.120 --> 00:22:33.260
And like you said, once you compile, it's just a bunch of – instead of a text file, a .py file, it's called, I think, a .pyo file or something.
00:22:33.260 --> 00:22:35.720
It's just a bunch of garbled stuff.
00:22:35.720 --> 00:22:40.360
And then that garbled stuff, you can just run through the interpreter and it'll do your – it'll run with your program.
00:22:40.360 --> 00:22:44.000
Yeah, it's really interesting to see how it's all coming together.
00:22:45.080 --> 00:22:49.280
What do you think some of the main reasons for studying Python at this level are?
00:22:49.280 --> 00:22:51.900
Like how does it make you a better programmer, do you think?
00:22:51.900 --> 00:22:53.780
That's a great question.
00:22:54.580 --> 00:23:09.600
I think that studying Python at this level of the implementation level, it kind of makes you – I feel like it makes you a better programmer in that you kind of, one, build a really good mental model of what goes on behind the scenes.
00:23:09.600 --> 00:23:14.340
And you see that these languages are just tools made by people.
00:23:14.340 --> 00:23:15.980
I think there's something really powerful in that.
00:23:15.980 --> 00:23:20.020
I feel this is a very kind of systems perspective of programming.
00:23:20.020 --> 00:23:26.920
So one analogy is that why do people study, say, operating systems or study compilers?
00:23:26.920 --> 00:23:27.860
That's a good example.
00:23:27.860 --> 00:23:39.080
Like the kind of classic thing in college is that a lot of people have to take an operating systems course where they build a very simple sort of OS kernel in C and maybe some assembly.
00:23:39.080 --> 00:23:42.340
And their kernel kind of runs and it does a simple hello world.
00:23:42.340 --> 00:23:49.140
Or you do a compilers course where you build a compiler using some basic building blocks.
00:23:49.140 --> 00:23:56.200
And the idea there is that it's not that you're going to ever build an operating system or a compiler in real life or a new programming language.
00:23:56.200 --> 00:23:59.080
You're not – most people are not going to implement a new kind of programming language.
00:23:59.080 --> 00:24:12.160
But by studying the principles behind how it works, I feel like – I think it makes you a better programmer in that you kind of understand how large complex code bases are organized and logically broken down.
00:24:12.160 --> 00:24:18.820
So I view this class like you've seen with these videos as more of like a code reading or literature exercise in a way.
00:24:18.820 --> 00:24:23.300
Because we're actually reading through dozens of – actually not that many.
00:24:23.300 --> 00:24:30.420
Maybe a dozen really core complex files and seeing how they – the pieces fit together.
00:24:30.420 --> 00:24:34.920
So it's sort of like dissecting, you know, kind of a large piece of code.
00:24:35.000 --> 00:24:38.280
I think that's really interesting in its own right.
00:24:38.280 --> 00:24:39.200
Yeah.
00:24:39.200 --> 00:24:48.920
A lot of people when they're in school at least studying this stuff, it's all very – I don't know, like you said, abstract or maybe not – it's not quite what I'm looking for.
00:24:48.920 --> 00:24:54.420
But like it doesn't have the nitty-gritty details of the real world applied to it.
00:24:54.420 --> 00:24:58.980
So all the error conditions that are so bizarre and all the optimizations, you don't necessarily have to deal with that.
00:24:58.980 --> 00:25:05.240
And so when you do finally get to a real world complex code base, it's super hard to feel comfortable.
00:25:05.240 --> 00:25:08.200
And I think, you know, you kind of helped your students do that a lot in there.
00:25:08.200 --> 00:25:08.820
So that was cool.
00:25:08.820 --> 00:25:10.040
Yeah.
00:25:10.040 --> 00:25:13.020
I think that's – and like you mentioned, there's always a tradeoff, right?
00:25:13.020 --> 00:25:19.220
So even in my choice of what to cover in this class, if you notice, I only cover maybe a dozen or so files.
00:25:19.220 --> 00:25:24.360
I mean, the Python code base has hundreds or thousands of source code files.
00:25:24.360 --> 00:25:27.040
And obviously, I don't have – one, I don't have time to cover all that.
00:25:27.040 --> 00:25:31.080
And two, I feel like this dozen is really the conceptual core of the interpreter.
00:25:31.080 --> 00:25:33.660
A lot of the files are just modules, right?
00:25:33.660 --> 00:25:37.400
A lot of the files are just like here's how strings are implemented.
00:25:37.400 --> 00:25:41.300
Here's how, you know, the socket class is implemented.
00:25:41.480 --> 00:25:43.740
Here's how, you know, memory mapped iOS is implemented.
00:25:43.740 --> 00:25:45.980
Those are all, I feel, auxiliary things.
00:25:45.980 --> 00:25:48.480
But whereas the core thing is, you know, what is an object?
00:25:48.480 --> 00:25:50.620
What is, you know, a class?
00:25:50.620 --> 00:25:51.400
What is a function?
00:25:51.400 --> 00:25:52.380
What is the interpreter?
00:25:52.380 --> 00:25:58.020
So – and even as you notice from watching the videos, I don't go over every single line in excruciating detail.
00:25:58.020 --> 00:26:02.780
I basically gloss over things and say, look, this block happens if there's some kind of error.
00:26:02.780 --> 00:26:03.600
You run out of memory.
00:26:03.600 --> 00:26:05.340
So, you know, look at that in spare time.