WEBVTT 00:00:00.001 --> 00:00:03.900 Do you struggle to make sure your code is always correct before checking it in? 00:00:03.900 --> 00:00:08.320 What about your team member's code? That one person who never wants to run the linter, 00:00:08.320 --> 00:00:13.240 tired of dealing with tons of conflicts and spurious Git changes? You need Git pre-commit 00:00:13.240 --> 00:00:18.900 hooks. Well, we're lucky to have Stephanie Molin on the show today, who has done a bunch of writing 00:00:18.900 --> 00:00:26.280 and teaching of Git hooks. This is Talk Python To Me, episode 482, recorded October 24th, 2024. 00:00:27.120 --> 00:00:32.800 Are you ready for your host? You're listening to Michael Kennedy on Talk Python To Me. 00:00:32.800 --> 00:00:36.560 Live from Portland, Oregon, and this segment was made with Python. 00:00:36.560 --> 00:00:44.780 Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy. 00:00:44.780 --> 00:00:50.020 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython, 00:00:50.020 --> 00:00:56.080 both accounts over at fosstodon.org, and keep up with the show and listen to over nine years of 00:00:56.080 --> 00:01:01.820 episodes at talkpython.fm. If you want to be part of our live episodes, you can find the live streams 00:01:01.820 --> 00:01:07.600 over on YouTube. Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified 00:01:07.600 --> 00:01:13.000 about upcoming shows. This episode is brought to you by Sentry. Don't let those errors go unnoticed. 00:01:13.000 --> 00:01:19.880 Use Sentry like we do here at Talk Python. Sign up at talkpython.fm/sentry. And this episode is 00:01:19.880 --> 00:01:25.480 brought to you by Bluehost. Do you need a website fast? Get Bluehost. Their AI builds your WordPress site 00:01:25.480 --> 00:01:31.360 in minutes, and their built-in tools optimize your growth. Don't wait. Visit talkpython.fm 00:01:31.360 --> 00:01:36.700 slash Bluehost to get started. Hey, everyone. Before we jump into the interview with Stephanie, 00:01:36.700 --> 00:01:43.620 I want to tell you real quickly that I just released a blog for Talk Python. Now, we have had tons of RSS 00:01:43.620 --> 00:01:49.640 over there because that's what powers podcasts. You can subscribe to the episodes. You can subscribe to 00:01:49.640 --> 00:01:54.900 an RSS feed for new course announcements over at Talk Python Training. And I've had a personal blog 00:01:55.520 --> 00:02:01.740 time over at mkennedy.codes, but no official Talk Python blog. And so I'm going to be posting 00:02:01.740 --> 00:02:06.260 really cool things on there. I've already got a couple of articles posted, but I have plans for 00:02:06.260 --> 00:02:11.860 some interesting series. And anytime there's some more interesting announcements or exciting news I 00:02:11.860 --> 00:02:16.540 want to share with Talk Python, it's going to be over on the Talk Python blog. So if you're interested, 00:02:16.540 --> 00:02:21.660 I would really, really appreciate it. If you go to talkpython.fm, click on blog, right in the 00:02:21.660 --> 00:02:26.040 navigation or at the bottom and just subscribe to the RSS feed. That way we can stay in touch. 00:02:26.040 --> 00:02:32.520 And with that, let's talk pre-commit hooks. Stephanie, welcome to Talk Python. It's awesome 00:02:32.520 --> 00:02:38.080 to have you. Thanks for having me. Yeah, really looking forward to talking about pre-commit hooks. 00:02:38.080 --> 00:02:42.160 You know, these are things that I'm sure a lot of people have heard of. I've certainly heard of, 00:02:42.160 --> 00:02:47.420 but to be honest, it's not much I've done very much with. And I bet a lot of people out there 00:02:47.420 --> 00:02:51.340 listening are like, yeah, that'd be a good idea. Just like continuous integration and writing tests. 00:02:51.540 --> 00:02:54.160 Now let's get back to it. You know, something like that. So I think 00:02:54.160 --> 00:03:00.640 there's a lot for people to take on, take away here. And we'll talk about what are these pre-commit 00:03:00.640 --> 00:03:05.180 hooks, when to use them, how to build them, and a whole bunch of other things that you're up to. 00:03:05.180 --> 00:03:08.160 So it should be a lot of fun. I'm looking forward to it. Me too. 00:03:08.160 --> 00:03:14.060 Yeah. Now, before we get to that, how about your story? How do you get into programming Python and 00:03:14.060 --> 00:03:18.900 pre-commit hooks and all these things? Hello everyone. I'm Stephanie Molin. I am a software engineer at 00:03:18.900 --> 00:03:25.220 Bloomberg. And I would say, I guess I got into programming in Python. I initially was programming 00:03:25.220 --> 00:03:33.580 in R and I was doing more data analysis while still building some things. And I needed to build a web 00:03:33.580 --> 00:03:38.980 app. And one of my teammates had suggested that rather than battling with Shiny in R, that I just 00:03:38.980 --> 00:03:43.740 learn Python. So I took a few weeks and just forced myself to do that. And I built something 00:03:43.740 --> 00:03:50.660 with Flask. And that was how I got into it. Oh, that's really awesome. Yeah. You were doing work in 00:03:50.660 --> 00:03:56.360 not finance, but in ads or something like that with R. What kind of work was that? Like we just generally 00:03:56.360 --> 00:04:03.100 add, you don't have to go into details. Yeah. So it was, it was mainly reporting and doing analysis on how 00:04:03.100 --> 00:04:08.240 client campaigns were going. But what really got me started with programming was more, I had gotten 00:04:08.240 --> 00:04:13.500 involved with a hackathon team and we had built an alerting system. So just monitoring when something 00:04:13.500 --> 00:04:19.340 weird went on with the campaigns. And I really enjoyed building more, more so than the analysis. 00:04:19.340 --> 00:04:25.820 And so I had to find a way to, and I enjoy like a little bit of data and more on the coding side. 00:04:25.820 --> 00:04:28.560 So I had to find something that would let me combine those two. 00:04:28.560 --> 00:04:33.860 Yeah. Well, that sounds really fun. I definitely, I'm on the same wavelength as you with data analysis 00:04:33.860 --> 00:04:39.720 is fun, but the building is, is really where things get interesting and, you know, look back and see 00:04:39.720 --> 00:04:42.060 like, Oh, we built this thing. That's, that's a pretty awesome feeling. 00:04:42.060 --> 00:04:47.480 Yeah. It was, it was a ton of fun and we ended up getting, I think third place on the hackathon, 00:04:47.480 --> 00:04:52.640 but yeah, that was, that was really that moment where it was like, I got to taste of something else. 00:04:52.640 --> 00:04:54.700 And I was like, this is, this is what I want to be doing. 00:04:55.020 --> 00:04:58.800 Yeah. Oh, that's fantastic. Was that at your company or was that someone? 00:04:58.800 --> 00:05:04.220 That was at the previous, previous role. It was the ad tech company. And so that was actually 00:05:04.220 --> 00:05:10.180 all built in R, the alerting system. And then, Oh no. Yeah. Okay. Yeah. And then, and then as we 00:05:10.180 --> 00:05:16.400 worked more on it, certain things ended up moving into Python. So a lot easier to work with and to 00:05:16.400 --> 00:05:19.540 automate things and not have like some laptop running R somewhere. 00:05:19.540 --> 00:05:28.860 Yeah, exactly. It's, that's sort of the promise of Python over a lot of these things that at first 00:05:28.860 --> 00:05:35.080 blush seem somewhat equivalent, right? Is that it's, it's a real programming language that can go on to do 00:05:35.080 --> 00:05:40.440 all the stuff. You don't have to try to automate some weird thing. That's not really meant to be that 00:05:40.440 --> 00:05:40.840 way. Right. 00:05:40.840 --> 00:05:46.840 I know. And now, I mean, I could not write R if I, if I had to, I wouldn't, I don't think I would. 00:05:47.320 --> 00:05:53.580 Yeah. Well, I was going to ask you now, which side of the fence do you spend more time on R or Python? 00:05:53.580 --> 00:05:54.140 It sounds like. 00:05:54.140 --> 00:05:59.920 I haven't touched R in maybe six plus years at this point. So I, yeah. Other than the arrows, 00:05:59.920 --> 00:06:01.740 that's probably the only thing I could manage too. 00:06:01.740 --> 00:06:10.340 Yeah. No more equal size, just arrows. Okay. Got it. Awesome. Well, that's super fun. Let's talk 00:06:10.340 --> 00:06:16.420 about pre-commit hooks, right? I've had Anthony Sotili on the show to talk about his pre-commit project. 00:06:16.420 --> 00:06:21.720 It was a long time ago and I'm sure that project will get a bit of a shout out from your work as 00:06:21.720 --> 00:06:28.240 well. But, you know, congrats, you put together a really nice series of articles and resources 00:06:28.240 --> 00:06:35.440 teaching people what commit hooks are, how to debug them, how to build them, how to choose them. So I 00:06:35.440 --> 00:06:38.420 think, you know, the stuff we're going to talk about, I'll link, of course, in the show notes. 00:06:38.600 --> 00:06:41.700 It's a really nice resource for folks. So thank you. I appreciate that. 00:06:41.700 --> 00:06:51.020 Yeah. Yeah. You bet. So let's talk about NumPy doc, doc string validation. This is, this was your entry 00:06:51.020 --> 00:06:54.180 way into what this whole world of pre-commit hooks is, right? 00:06:54.180 --> 00:07:02.580 Yeah. So, and I think July, 2022, I was at my first EuroPython and I decided to do the sprints 00:07:02.580 --> 00:07:08.260 for the first time. I ended up working with the scikit learn team and they wanted to make sure that 00:07:08.260 --> 00:07:14.620 all of their doc strings were conforming to the NumPy doc standard. They had a file in place or a test 00:07:14.620 --> 00:07:19.480 file that you could run and just validate that whatever changes you made were now being validated 00:07:19.480 --> 00:07:27.300 as far as doc strings. And I remember at one point, like I had, I think done 12 or so PRs in that sprint. 00:07:27.300 --> 00:07:32.260 So I was very productive. And there was one early on, I think in the second or so, where it just wasn't 00:07:32.260 --> 00:07:37.340 working and I couldn't figure out why it was telling me it wasn't valid. It was saying that it wasn't 00:07:37.340 --> 00:07:43.460 ending in a period. And I had called over the, one of the maintainers and we both stared at it. To us, 00:07:43.460 --> 00:07:48.880 it looked like a period. And I ended up just deleting the doc string and starting over. And it turned out 00:07:48.880 --> 00:07:54.460 that it was a trailing space at the end. And so I had asked the maintainer, like, how do you not have 00:07:54.460 --> 00:07:59.580 this happen to you? And the response was, you should install pre-commit. And by then I had, I was already, 00:07:59.580 --> 00:08:04.960 I had to leave. So I was like, make, I made a note to myself. I need to research this when I get home. 00:08:04.960 --> 00:08:09.740 And when I did, I was like, well, how did I not know about this before? And I set it up on things. 00:08:09.740 --> 00:08:15.180 And then I went to look, does NumPy doc have that? This seems like exactly what you would want. 00:08:15.300 --> 00:08:19.220 As you're writing code, you want to make sure that it's going to check the doc string there. You don't 00:08:19.220 --> 00:08:24.420 want to have to run some other thing later on and remember to run it. So I looked and there was no 00:08:24.420 --> 00:08:29.720 pre-commit hook for NumPy doc. And I had made something, something that initially we had just 00:08:29.720 --> 00:08:35.440 used internally within my team. And then later on, I kind of wanted to use it for a personal project. 00:08:35.440 --> 00:08:42.580 And so I set about seeing how we could actually open source it. And I had contacted the NumPy doc team 00:08:42.580 --> 00:08:46.280 and they were very, very interested in it because there was a reason there was no hook. It's because 00:08:46.280 --> 00:08:52.060 no one knew how to do it. Right. And at that point I had the horrible realization that what I had written 00:08:52.060 --> 00:08:58.700 would never work outside because it was relying on things being installed. So, and then I felt pretty 00:08:58.700 --> 00:09:03.700 bad about promising that to them. So I managed to come up with an entirely new solution in a weekend 00:09:03.700 --> 00:09:10.360 and figured out how to use the abstract syntax tree to work through. And so I built an entirely 00:09:10.360 --> 00:09:15.900 new version of it. And that is what is currently available in NumPy doc. And that actually led to 00:09:15.900 --> 00:09:22.240 them inviting me to be a core developer for NumPy doc. Congratulations. How cool is that? 00:09:22.240 --> 00:09:27.380 Yeah, I know. It's like the full spectrum, right? And just having heard about it and then just 00:09:27.380 --> 00:09:30.960 seeing the connection between two things that weren't previously connected. 00:09:31.260 --> 00:09:37.120 Yeah. Yeah. Well, I think your comment about the pre-commit hook not previously existing, 00:09:37.120 --> 00:09:41.260 you know, for this project also is, it's pretty interesting, right? It's kind of like I hinted at, 00:09:41.260 --> 00:09:45.120 I mean, a lot of people hear about this kind of stuff, but that doesn't mean they're putting it 00:09:45.120 --> 00:09:45.920 into practice, right? 00:09:45.920 --> 00:09:46.840 Yeah, for sure. 00:09:47.140 --> 00:09:53.080 And so how do we, you know, let's, let's find our way over to pre-commit hooks in general. So how do we 00:09:53.080 --> 00:09:59.660 encourage people or ensure that people follow coding rules, right? We've got tools like black, 00:09:59.660 --> 00:10:06.300 we've got tools like rough. Now those will work awesome. If you give them a consistent config file 00:10:06.300 --> 00:10:13.100 or config settings, not so much with black, but rough. Anyway, they'll make those changes and do a lot of the 00:10:13.100 --> 00:10:16.640 kind of stuff that we're talking about here, but that requires, like you said, people to have it 00:10:16.640 --> 00:10:23.000 installed, people to run it and people to buy into the whole concept of the project in the first place, 00:10:23.000 --> 00:10:23.320 right? 00:10:23.320 --> 00:10:24.400 Yeah, that last bit. 00:10:24.440 --> 00:10:28.360 We're all using these tools and we're all going to run them and we're going to remember to run them 00:10:28.360 --> 00:10:33.140 until one person goes, I don't like these tools. I'm not doing it. And then their settings fight with 00:10:33.140 --> 00:10:36.380 your settings or their spacing fights with your spacing or whatever, right? 00:10:36.380 --> 00:10:41.800 Yeah. I think what has, what really helped in my experience, when you incorporate these things, 00:10:41.800 --> 00:10:46.060 even like going and approaching open source projects that didn't have a pre-commit set up and just asking 00:10:46.060 --> 00:10:51.100 if they were interested in it, it's, you really see the value when you've, you think if you've ever 00:10:51.100 --> 00:10:56.220 reviewed something or gotten review comments about, you should start a new line here. I don't like this 00:10:56.220 --> 00:11:01.800 space here. And then you think about how much time you waste at that stage. And then you still have 00:11:01.800 --> 00:11:08.180 zero consistency because you did it one way, someone else does it another way. And even further than that, 00:11:08.180 --> 00:11:13.420 it's just the time you waste in your code. Oh, I should put this on a new line and reformatting files 00:11:13.420 --> 00:11:18.140 when you could actually be writing things and thinking about how should I design this algorithm, 00:11:18.140 --> 00:11:23.240 them. Right. And so I think a big part of making sure that once you find these tools that you're 00:11:23.240 --> 00:11:28.000 going to use and actually make sure they're using, it's making it easy to use. Like you said, yeah, 00:11:28.000 --> 00:11:32.720 you can just run black or rough, but you have to remember to run black or rough. And that is the 00:11:32.720 --> 00:11:38.500 key problem. And what's so great about pre-commit or even extensions in your IDE is that these things 00:11:38.500 --> 00:11:43.100 become automatic and that's what you need to get towards for these things to actually stick. 00:11:43.100 --> 00:11:49.580 Yeah. To make them automatic and not part of it. And to some degree, continuous integration can do 00:11:49.580 --> 00:11:54.400 those kinds of things. But a lot of times it's too late at that point. It's already checked in, 00:11:54.400 --> 00:11:59.620 it's already committed. And then you've got the back and forth of now it's a diff, but it's only a diff 00:11:59.620 --> 00:12:04.500 because they spaced it differently when they hit save in their IDE than when you hit save in yours and 00:12:04.500 --> 00:12:11.040 all that. So pre-commit hooks run prior to actually leaving your computer, right? 00:12:11.040 --> 00:12:16.840 Yeah. So it's actually prior to even the commit. So when you do get commit and you, let's say you pass 00:12:16.840 --> 00:12:21.700 your message and if it's successful, you normally, you see the hash that gets generated. If you have 00:12:21.700 --> 00:12:28.160 pre-commit hooks enabled, then if that, those checks don't pass, then that commit never gets created in 00:12:28.160 --> 00:12:33.280 the first place. So you still have the files staged, but nothing has made it to the commit. 00:12:33.280 --> 00:12:34.020 Yeah. That's great. 00:12:34.020 --> 00:12:38.220 Yeah. I was just going to explain maybe a little bit about how they work if you're curious. 00:12:38.220 --> 00:12:42.120 Yeah. Yeah. Well, let's start with just like, what even are, are these pre-commit hooks? 00:12:42.120 --> 00:12:49.620 Yeah. So pre-commit hooks, and I think the naming is, is quite overloaded and that leads to a lot of confusion. 00:12:49.920 --> 00:12:57.540 So at the lowest level, a Git repository in general supports a hooks system. So there's a variety of 00:12:57.540 --> 00:13:03.440 different types of actions that Git will trigger a script on your behalf. And one of those such actions 00:13:03.440 --> 00:13:08.500 is pre-commit. So as I described before, as you run Git commit, this gets triggered. Another thing 00:13:08.500 --> 00:13:15.060 might be pushing. You can have Git wired to run some script when you push. Now that is Git's version 00:13:15.060 --> 00:13:21.380 of pre-commit and hook, singular hook, because you can only have a single file run, single executable 00:13:21.380 --> 00:13:21.860 can run. 00:13:21.860 --> 00:13:27.860 Right. If you go to your, it's in the Git folder, there's a hooks subfolder and it's got little 00:13:27.860 --> 00:13:30.500 samples for all the different lifecycle things, right? 00:13:30.500 --> 00:13:35.640 Yeah. And yeah, they provide some, they have to, like I said, they had to be executable, but you can 00:13:35.640 --> 00:13:41.100 be in any language that you have available on the machine. And so Git provides some examples. I do think 00:13:41.100 --> 00:13:46.720 there are a few stages that don't have examples, but it's basically you take the name of the stage 00:13:46.720 --> 00:13:51.600 that you're going to use and that's the name of the file. And that has to be an executable and Git 00:13:51.600 --> 00:13:53.700 will run it at the designated moment. 00:13:53.700 --> 00:14:00.040 Okay. So it could be a Python executable or it could be a Go executable or whatever, but it's just 00:14:00.040 --> 00:14:00.580 one, right? 00:14:00.580 --> 00:14:05.040 It's just one. Yeah. Cause it has to be named. And like in the case of pre-commit, it has to be called 00:14:05.040 --> 00:14:06.320 pre-commit, nothing else. 00:14:07.280 --> 00:14:12.740 This portion of Talk Python To Me is brought to you by Sentry. Code breaks. It's a fact of life. 00:14:12.740 --> 00:14:18.680 With Sentry, you can fix it faster. As I've told you all before, we use Sentry on many of our apps 00:14:18.680 --> 00:14:24.680 and APIs here at Talk Python. I recently used Sentry to help me track down one of the weirdest bugs I've 00:14:24.680 --> 00:14:30.360 run into in a long time. Here's what happened. When signing up for our mailing list, it would crash 00:14:30.360 --> 00:14:36.700 under a non-common execution paths, like situations where someone was already subscribed or entered an 00:14:36.700 --> 00:14:42.240 invalid email address or something like this. The bizarre part was that our logging of that 00:14:42.240 --> 00:14:49.720 unusual condition itself was crashing. How is it possible for our log to crash? It's basically a 00:14:49.720 --> 00:14:54.580 glorified print statement. Well, Sentry to the rescue. I'm looking at the crash report right now, 00:14:54.580 --> 00:14:59.680 and I see way more information than you'd expect to find in any log statement. And because it's 00:14:59.680 --> 00:15:06.120 production, debuggers are out of the question. I see the traceback, of course, but also the browser version, 00:15:06.120 --> 00:15:13.000 client OS, server OS, server OS version, whether it's production or Q&A, the email and name of the person 00:15:13.000 --> 00:15:18.520 signing up. That's the person who actually experienced the crash. Dictionaries of data on the call stack and so much 00:15:18.520 --> 00:15:25.180 more. What was the problem? I initialized the logger with the string info for the level rather than the 00:15:25.180 --> 00:15:33.340 enumeration.info, which was an integer-based enum. So the login statement would crash, saying that I could not use 00:15:33.340 --> 00:15:40.140 less than or equal to between strings and ints. Crazy town. But with Sentry, I captured it, 00:15:40.140 --> 00:15:46.400 fixed it, and I even helped the user who experienced that crash. Don't fly blind. Fix code faster with 00:15:46.400 --> 00:15:52.520 Sentry. Create your Sentry account now at talkpython.fm/sentry. And if you sign up with the code 00:15:52.520 --> 00:16:00.040 TALKPYTHON, all capital, no spaces, it's good for two free months of Sentry's business plan, which will give you up to 00:16:00.040 --> 00:16:03.040 20 times as many monthly events as well as other features. 00:16:04.640 --> 00:16:10.800 So if you want to run more, you basically have to, potentially, write a program which then itself 00:16:10.800 --> 00:16:16.440 figures out all the things to do and then delegates to running them. Like if you want to run ruff with 00:16:16.440 --> 00:16:23.860 a fixed formatting issues and you want to run the checker fixer for NumPy doc strings and all those 00:16:23.860 --> 00:16:27.100 things, you'd have to write a sort of orchestrating program for that, right? 00:16:27.100 --> 00:16:32.520 Yeah, it's almost like you're writing in the case of like a bash script, like a giant bash script where you have 00:16:32.520 --> 00:16:38.160 to decide, you know, do you fail early? How do you like and check, do I run this one and then this one? And then 00:16:38.160 --> 00:16:43.700 even worse, you're actually, in that case, you're probably running everything, you know, you're running everything 00:16:43.700 --> 00:16:49.240 sequentially. And if you don't do it carefully, then you know, maybe, maybe you want to fail early, maybe you don't. 00:16:49.300 --> 00:16:55.780 So that becomes very, very challenging to configure and also to share because the thing about that file is that is not 00:16:55.780 --> 00:17:01.480 included in version control. So that would be something that you would maybe have to store somewhere else and then do a 00:17:01.480 --> 00:17:05.920 symbolic link. And then that becomes already a lot trickier for everyone to manage. 00:17:05.920 --> 00:17:09.920 Yeah, I was just doing that last night and that's an AI question. I don't remember how to do that. 00:17:09.920 --> 00:17:16.740 I know you can do it. It's not that hard. It involves LN, but you know, ChatGPT, what do I do exactly? 00:17:16.740 --> 00:17:19.120 LN-S. I've had to do that quite a bit. 00:17:19.120 --> 00:17:22.780 It's burned into the brain, huh? 00:17:22.780 --> 00:17:31.240 So one of the things that you recommend so we don't have to build this orchestration piece is actually pre-commit, 00:17:31.240 --> 00:17:32.880 which is a Python project, right? 00:17:32.880 --> 00:17:37.700 Yes. And it's not the only one. So again, that's where like the naming becomes challenging. 00:17:37.700 --> 00:17:42.360 But pre-commit is built in Python, but it can run hooks in a variety of languages. 00:17:42.580 --> 00:17:48.860 And it interfaces with GitHub's system for you. So it creates that executable and plants it there. 00:17:48.860 --> 00:17:56.120 But that executable is then pointing back to pre-commit so that you can just define a simple YAML file like you can see part of it on the screen right now. 00:17:56.120 --> 00:18:01.300 And it becomes very easy because essentially you're just configuring what you want to run. 00:18:01.300 --> 00:18:05.820 You're not actually coding the logic of the checks and how they relate to each other. 00:18:05.820 --> 00:18:12.720 Right. So let's assume that all the pre-commit hooks that you want to run somehow exist out there in the world, right? 00:18:12.720 --> 00:18:14.440 You don't have to create them for the moment. 00:18:14.740 --> 00:18:20.000 So what you can do with pre-commit is you can set up a YAML file. 00:18:20.000 --> 00:18:22.060 I always get those crisscrossed. 00:18:22.060 --> 00:18:30.600 A YAML file, a pre-commit config YAML file, which then has a bunch of listings of here's a Git repository. 00:18:30.600 --> 00:18:38.200 And if you install it as a Python package, here's a bunch of things that you can run on it, like check toml, check YAML and so on, right? 00:18:38.200 --> 00:18:40.740 Well, it doesn't actually have to be a Python package, right? 00:18:40.740 --> 00:18:48.780 So in that repo, and we're maybe jumping ahead, but there's a special file in that repo, which will tell pre-commit how it actually needs to install it. 00:18:48.780 --> 00:18:49.580 So it could be anything. 00:18:49.580 --> 00:18:50.620 Oh, that's interesting. 00:18:50.620 --> 00:19:01.200 So the thing that integrates with the pre-commit project, it has to opt in in a sense in that it has to have a configuration file or a launch file or a setup file, something like that. 00:19:01.200 --> 00:19:04.340 Yeah. So right now we're looking at pre-commit config. 00:19:04.340 --> 00:19:09.600 There's pre-commit hooks, and that one is kind of registering it with pre-commit system. 00:19:09.600 --> 00:19:13.300 So it tells pre-commit how to install it once it gets a hold of it. 00:19:13.300 --> 00:19:21.660 And it also lists out these hooks that we see here under ID, but that will be defined over there so that pre-commit knows, well, what is check toml? 00:19:21.660 --> 00:19:22.760 What is check YAML? 00:19:22.760 --> 00:19:24.400 Okay. Yeah. 00:19:24.400 --> 00:19:25.380 That's really cool. 00:19:25.380 --> 00:19:30.580 And you can have more than one of these repositories in there, right? 00:19:30.580 --> 00:19:31.080 Correct. 00:19:31.080 --> 00:19:42.420 Yeah. So the repos section is a list of repo sections, and then each repo then has other config, like the individual hooks that you want to run from that repo. 00:19:42.420 --> 00:19:43.160 Right, right. 00:19:43.160 --> 00:19:53.180 So for the first example that you have in this, and this is your article, I guess, I don't know if I give this the proper announcement, but how to set up pre-commit hooks. 00:19:53.280 --> 00:19:57.480 This is your, I perceive this as kind of your getting started article for this whole series. 00:19:57.480 --> 00:19:58.860 I don't know if you see it that way. 00:19:58.860 --> 00:19:59.280 Yeah. 00:19:59.280 --> 00:20:01.180 Yeah, this was the first one. 00:20:01.180 --> 00:20:04.540 I had gotten a lot of questions on how to do this. 00:20:04.920 --> 00:20:15.380 And I think it's always interesting, especially when you think about, you know, speaking at conferences, I feel like, and which I do a lot of, and I feel like a lot of what gets more hits in that sense is like the advanced stuff, maybe more creating it. 00:20:15.380 --> 00:20:20.740 But there's so much value in people just getting started and figuring out how do I even use this in the first place? 00:20:20.740 --> 00:20:23.060 Because this saves you so much time. 00:20:23.060 --> 00:20:25.760 So I really, this was where I got started for that reason. 00:20:25.760 --> 00:20:28.320 I think a lot of people were able to benefit from this article. 00:20:28.320 --> 00:20:30.000 Yeah, it seems like it. 00:20:30.000 --> 00:20:36.260 I know it's fun to talk about the super advanced deep dive things, but most people, they just need to get started. 00:20:36.260 --> 00:20:37.760 They just need some foundation, right? 00:20:37.760 --> 00:20:47.000 And I think, I think that's actually where most of the benefit comes from, even though it is really fun to see some cool deep dive talk that people are going into, right? 00:20:47.000 --> 00:20:57.040 So this next one is pretty interesting that we're adding here in this example, and that's the rough pre-commit from straight from Astral, right? 00:20:57.040 --> 00:21:02.540 So this is just github.com/astral.sh, which is the company behind rough newbie. 00:21:02.540 --> 00:21:04.340 And this is the rough pre-commit. 00:21:04.340 --> 00:21:09.240 But what's interesting about this is, well, one, that it has nothing to do with the pre-commit project. 00:21:09.240 --> 00:21:14.080 But two, that this one also takes special arguments that you can pass to it. 00:21:14.080 --> 00:21:20.860 Yeah, so I think the rough pre-commit one is just a smaller version so that it works faster with pre-commit. 00:21:20.860 --> 00:21:23.480 Because pre-commit will have to install these at some point. 00:21:23.480 --> 00:21:25.020 It will have a cache. 00:21:25.020 --> 00:21:28.660 So if you don't change the version in this case, it will be able to reuse that. 00:21:28.660 --> 00:21:31.400 But that first time, you do have a bit of a delay. 00:21:31.400 --> 00:21:33.340 And that's not something you want. 00:21:33.340 --> 00:21:36.280 It's something you have to be very careful of when you want to be using these. 00:21:36.280 --> 00:21:43.360 And then the args thing is nice because you have a few options when you configure these tools, depending on what the tool supports. 00:21:43.360 --> 00:21:47.420 In this case, rough supports, as I think we mentioned a little bit earlier, configuration file. 00:21:47.420 --> 00:21:50.320 So, for example, you could have stuff in your pyproject.toml. 00:21:50.560 --> 00:21:55.200 But the key here is that maybe you're using rough in your IDE. 00:21:55.200 --> 00:22:00.080 And maybe you don't want to do the same kind of changes that you want to do in pre-commit. 00:22:00.080 --> 00:22:03.720 Maybe you wanted to ask you if it's going to change something. 00:22:03.720 --> 00:22:06.460 Whereas in the pre-commit stage, you definitely want it to be fixed. 00:22:06.460 --> 00:22:13.940 So you can use the args here to provide stuff that you only want to happen when it's running in the context of pre-commit. 00:22:14.180 --> 00:22:21.260 Yeah, and rough has a exit non-zero on fix, which means if it goes through and you say to fix it, it will fix it. 00:22:21.260 --> 00:22:30.320 But then it'll error out and say that wasn't a smooth transition or whatever, which is cool because that will then fail the commit itself. 00:22:30.320 --> 00:22:30.880 Correct. 00:22:30.880 --> 00:22:34.060 Give you the modified files and say basically have a look. 00:22:34.060 --> 00:22:36.280 See if you like it now, right? 00:22:36.280 --> 00:22:37.740 Before it actually just ships it off. 00:22:37.740 --> 00:22:43.200 That's so important because sometimes you realize there was some rule that you hadn't reviewed before. 00:22:43.300 --> 00:22:45.860 That's not quite doing what I want and let me tweak my setup. 00:22:45.860 --> 00:22:50.180 So it's nice to have that bit where you can verify what was actually changed is what you want. 00:22:50.180 --> 00:22:54.400 Yeah, I guess it's a little bit dangerous to just say change it and then commit it. 00:22:54.400 --> 00:22:55.500 I've had people. 00:22:55.500 --> 00:23:03.460 So I did a workshop on pre-commit both on setting it up and then making your own hooks at EuroPython this year. 00:23:03.460 --> 00:23:05.800 And I did have a few people actually. 00:23:05.800 --> 00:23:12.220 One very insistent asking me why wasn't there a hook or why don't they support just fixing it 00:23:12.220 --> 00:23:14.700 and then automatically adding it and committing it on your behalf. 00:23:14.700 --> 00:23:18.020 And to me, as a person who works in security, that just sounds very scary. 00:23:18.020 --> 00:23:20.520 I don't want things doing that. 00:23:20.520 --> 00:23:23.940 I want to see what is being changed and whether or not I agree with it or not. 00:23:23.940 --> 00:23:24.780 Yeah. 00:23:24.780 --> 00:23:28.100 Why doesn't it just go ahead and push it as well? 00:23:28.100 --> 00:23:28.560 Come on. 00:23:28.560 --> 00:23:28.820 Yeah. 00:23:28.820 --> 00:23:30.840 Well, I think that was part of the suggestion. 00:23:30.840 --> 00:23:32.980 I was like, I certainly don't want that running on my machine. 00:23:34.100 --> 00:23:39.000 Yeah, it does skip out on some of the benefits of the multi-stage aspects of Git, I suppose. 00:23:39.000 --> 00:23:40.080 But it is efficient. 00:23:40.080 --> 00:23:41.360 You just get it done all at once. 00:23:41.360 --> 00:23:42.180 That's pretty cool. 00:23:42.180 --> 00:23:44.540 Yeah, but you don't know what else is grabbing, which is the scary part. 00:23:44.540 --> 00:23:45.360 No, of course not. 00:23:45.360 --> 00:23:45.800 I know. 00:23:45.800 --> 00:23:46.900 Super bad. 00:23:47.900 --> 00:23:53.140 So this example that we're talking about here where we've got a pre-commit hook that we're grabbing 00:23:53.140 --> 00:23:57.820 and then it takes these arguments, I think this is an interesting point of discussion. 00:23:57.820 --> 00:24:01.980 So the example you have in your article just says, what we're going to tell rough is dash, 00:24:01.980 --> 00:24:06.940 dash, fix, dash, dash, exit non-zero fix, and show fixes, which is all good. 00:24:07.180 --> 00:24:11.920 But rough can be pretty complex in its configuration, right? 00:24:11.920 --> 00:24:14.720 You can say, disable flight gate, turn this one on. 00:24:14.720 --> 00:24:15.320 These are warnings. 00:24:15.320 --> 00:24:16.100 These are errors. 00:24:16.100 --> 00:24:21.340 And there's a whole, you know, here's how many line columns I want and all of this stuff, right? 00:24:21.340 --> 00:24:27.240 So you can either do this argument thing, or if it's supported, you could also potentially have, 00:24:27.240 --> 00:24:28.660 say, a rough.toml, right? 00:24:28.660 --> 00:24:29.080 Yeah. 00:24:29.080 --> 00:24:34.000 So I tend to want to minimize the amount of configuration files I have. 00:24:34.000 --> 00:24:38.340 So in my case, I think below I talk about having it in the pyproject.toml. 00:24:38.340 --> 00:24:38.780 Yeah, exactly. 00:24:38.780 --> 00:24:42.140 So you just add a rough section in there and then you configure things. 00:24:42.140 --> 00:24:46.820 And this is stuff that you'd want to use both in your editor as well as in the pre-commit stage, 00:24:46.820 --> 00:24:47.820 because you want them to agree. 00:24:47.820 --> 00:24:51.300 And nothing worse than one telling you the lines too long and the other one like, 00:24:51.300 --> 00:24:51.960 nope, that's good. 00:24:51.960 --> 00:24:52.500 Go ahead. 00:24:52.500 --> 00:24:59.220 Or put a space after the comma in parameters and then take away the space and put the space and take away the space. 00:24:59.220 --> 00:24:59.240 Exactly. 00:24:59.240 --> 00:25:00.460 You don't want them fighting. 00:25:00.460 --> 00:25:01.320 You want them in agreement. 00:25:01.320 --> 00:25:02.600 No, no, you don't. 00:25:02.700 --> 00:25:09.800 So I suppose that's a massive bonus of having either the tool.rough settings in your pyproject or just a rough.toml, 00:25:09.800 --> 00:25:11.520 however you go about that, it doesn't really matter. 00:25:11.520 --> 00:25:17.120 Because then no matter how you're using rough via the pre-commit or for your project, it'll be the same thing, right? 00:25:17.120 --> 00:25:17.680 Exactly. 00:25:17.680 --> 00:25:18.240 Yeah. 00:25:18.240 --> 00:25:18.640 Okay. 00:25:18.640 --> 00:25:19.140 Yeah. 00:25:19.140 --> 00:25:20.520 That's pretty awesome. 00:25:20.520 --> 00:25:24.500 Now, I guess maybe we got a bit ahead of ourselves. 00:25:24.980 --> 00:25:32.300 If I want to somehow install a pre-commit hook or pre-commit so that when I then give it one of these toml files, 00:25:32.300 --> 00:25:34.800 it'll go subsequently grab them and do the things. 00:25:34.800 --> 00:25:36.420 How do you get started with that? 00:25:36.420 --> 00:25:39.780 I think I need a rephrasing of that question. 00:25:39.780 --> 00:25:40.360 Yeah. 00:25:40.360 --> 00:25:40.700 Sorry. 00:25:40.960 --> 00:25:49.460 So if I have just a plain GitHub repository and I want to have pre-commit manage the hooks for that repository, 00:25:49.460 --> 00:25:50.640 like what do I do? 00:25:50.920 --> 00:25:51.140 Okay. 00:25:51.140 --> 00:25:54.140 So the first thing is you have to actually install pre-commit. 00:25:54.140 --> 00:25:56.920 And that's not the command that's on the screen. 00:25:56.920 --> 00:25:58.260 This is more of a pip install. 00:25:58.260 --> 00:26:02.020 So make sure you have the Python library in place. 00:26:02.020 --> 00:26:05.680 And then you need to have this configuration file. 00:26:05.680 --> 00:26:09.800 At least one hook in there so that you have a valid file. 00:26:09.800 --> 00:26:12.240 And then you can run pre-commit install. 00:26:12.240 --> 00:26:15.620 And I omitted it here, but what I talk about in a different article, 00:26:15.760 --> 00:26:21.940 when you run this command, pre-commit actually tells you that it created the git hooks pre-commit file. 00:26:21.940 --> 00:26:25.060 And if you open that up, and I have an example on that other article, 00:26:25.060 --> 00:26:28.860 it's very simple and it's just calling pre-commit the tool itself. 00:26:28.860 --> 00:26:33.440 So in all cases, you need to have it installed in your environment. 00:26:33.440 --> 00:26:39.960 And a single time you run pre-commit install, which then does the wiring on the git side. 00:26:39.960 --> 00:26:45.340 And this is something that everyone in your project has to run on any machine that they are using. 00:26:45.740 --> 00:26:50.240 Because it's part of the repository itself, that file needs to exist there. 00:26:50.240 --> 00:26:52.220 And that can only happen if you run this command. 00:26:52.220 --> 00:26:53.100 Yeah. 00:26:53.100 --> 00:26:56.980 So there's a .pre-commit.config.yaml file. 00:26:56.980 --> 00:27:01.880 That's what you put into GitHub at the root of your project or something like this. 00:27:01.880 --> 00:27:07.720 But then to actually configure git itself, you've got to run this pre-commit space install. 00:27:07.720 --> 00:27:11.120 And it basically wires up the hooks to make that happen, right? 00:27:11.120 --> 00:27:11.660 Correct. 00:27:11.660 --> 00:27:14.840 So yeah, when you run this, that file gets created on your behalf. 00:27:14.920 --> 00:27:17.100 And then you don't have to worry about wiring that up. 00:27:17.100 --> 00:27:18.720 And then it's transparent. 00:27:18.720 --> 00:27:22.740 All you have to do is tweak your config and then the changes happen. 00:27:22.740 --> 00:27:23.200 Nice. 00:27:23.200 --> 00:27:26.640 I don't know if the naming, how much to believe the naming. 00:27:26.640 --> 00:27:28.420 Can it do things other than pre-commit? 00:27:28.420 --> 00:27:28.940 Yes. 00:27:28.940 --> 00:27:31.880 Can it do pre-push and those kinds of things? 00:27:32.260 --> 00:27:35.660 They don't support every single one. 00:27:35.660 --> 00:27:38.280 But there are quite a few that they do support. 00:27:38.280 --> 00:27:44.980 For example, I once configured an open source project with a pre-push because it was a slower 00:27:44.980 --> 00:27:45.400 check. 00:27:45.400 --> 00:27:48.520 And that's something you definitely don't want running on each commit. 00:27:48.520 --> 00:27:51.780 But it might be something where you want to make sure when you push the files that you've 00:27:51.780 --> 00:27:53.600 addressed something that's maybe a little bit longer. 00:27:54.080 --> 00:28:00.000 And that is really not any different than configuring with the pre-commit config YAML. 00:28:00.000 --> 00:28:03.680 There's just a separate item that goes in there that says which stage to run. 00:28:03.680 --> 00:28:04.940 By default, it's pre-commit. 00:28:04.940 --> 00:28:06.060 So you don't see it. 00:28:06.060 --> 00:28:07.400 But if you needed to change it, you can. 00:28:07.400 --> 00:28:07.860 Yeah. 00:28:07.860 --> 00:28:08.860 I figured that was the case. 00:28:08.860 --> 00:28:09.880 But I'd never tried. 00:28:09.880 --> 00:28:13.980 And given that it's named pre-commit, you know, it's kind of named after one of the hooks, 00:28:13.980 --> 00:28:14.180 right? 00:28:14.180 --> 00:28:14.780 But of course. 00:28:15.100 --> 00:28:17.420 I think that's named probably the most useful one. 00:28:17.420 --> 00:28:18.880 I would. 00:28:18.880 --> 00:28:19.900 Yeah, I would think so. 00:28:19.900 --> 00:28:25.980 I think a very popular example would perhaps be the commit message hook. 00:28:25.980 --> 00:28:30.220 So there's a lot of tools that work on, you know, making sure your commits are following 00:28:30.220 --> 00:28:30.920 a certain standard. 00:28:30.920 --> 00:28:32.480 I think one of them is called committizen. 00:28:32.480 --> 00:28:36.600 And so that runs on, my guess is on the commit message hook. 00:28:36.600 --> 00:28:37.340 Committizen? 00:28:37.340 --> 00:28:38.000 Yes. 00:28:38.000 --> 00:28:38.400 Okay. 00:28:38.400 --> 00:28:39.900 What is this committizen about? 00:28:39.900 --> 00:28:40.900 I haven't heard of this. 00:28:40.900 --> 00:28:42.920 I don't think their example uses that. 00:28:42.920 --> 00:28:44.740 But I think they do have a pre-commit hook. 00:28:45.020 --> 00:28:46.840 And I believe it works that way. 00:28:46.840 --> 00:28:47.240 Yeah. 00:28:47.240 --> 00:28:47.540 Yeah. 00:28:47.540 --> 00:28:48.240 Interesting. 00:28:48.240 --> 00:28:48.760 Okay. 00:28:48.760 --> 00:28:49.600 What's this thing? 00:28:49.600 --> 00:28:51.820 A release management tool for teams. 00:28:51.820 --> 00:28:52.180 Yeah, sure. 00:28:52.180 --> 00:28:56.320 That makes sense that you want to kind of be a little bit careful about what your commit 00:28:56.320 --> 00:28:57.140 messages are. 00:28:57.140 --> 00:29:01.320 Maybe you want to grab certain commit messages and add them to your changelog or something 00:29:01.320 --> 00:29:01.860 like that, right? 00:29:01.860 --> 00:29:02.300 Yeah. 00:29:02.300 --> 00:29:07.740 I think there's been quite a bit of talk about this one at conferences I've been lately. 00:29:07.740 --> 00:29:09.440 I think it's gotten a lot of traction. 00:29:09.440 --> 00:29:09.960 Yeah. 00:29:09.960 --> 00:29:11.840 2.5,000 GitHub stars. 00:29:11.840 --> 00:29:12.380 That's pretty good. 00:29:12.380 --> 00:29:13.000 I'll check it out. 00:29:13.000 --> 00:29:14.080 This is news to me. 00:29:14.080 --> 00:29:18.860 This portion of Talk Python To Me is brought to you by Bluehost. 00:29:18.860 --> 00:29:22.100 Got ideas, but no idea how to build a website? 00:29:22.100 --> 00:29:23.220 Get Bluehost. 00:29:23.220 --> 00:29:28.640 With their AI design tool, you can quickly generate a high-quality, fast-loading WordPress 00:29:28.640 --> 00:29:29.740 site instantly. 00:29:29.740 --> 00:29:33.400 Once you've nailed the look, just hit enter and your site goes live. 00:29:33.400 --> 00:29:34.420 It's really that simple. 00:29:34.520 --> 00:29:39.000 And it doesn't matter whether you're a hobbyist, entrepreneur, or just starting your side hustle. 00:29:39.000 --> 00:29:44.620 Bluehost has you covered with built-in marketing and e-commerce tools to help you grow and scale 00:29:44.620 --> 00:29:46.060 your website for the long haul. 00:29:46.060 --> 00:29:50.300 Since you're listening to my show, you probably know Python, but sometimes it's better to focus 00:29:50.300 --> 00:29:55.520 on what you're creating rather than a custom-built website and add another month until you launch 00:29:55.520 --> 00:29:56.000 your idea. 00:29:56.380 --> 00:30:02.160 When you upgrade to Bluehost cloud, you get 100% of time and 24-7 support to ensure your 00:30:02.160 --> 00:30:04.560 site stays online through heavy traffic. 00:30:04.560 --> 00:30:08.420 Bluehost really makes building your dream website easier than ever. 00:30:08.420 --> 00:30:09.700 So what's stopping you? 00:30:09.700 --> 00:30:10.980 You've already got the vision. 00:30:10.980 --> 00:30:11.760 Make it real. 00:30:11.760 --> 00:30:16.700 Visit talkpython.fm/bluehost right now and get started today. 00:30:16.900 --> 00:30:19.220 And thank you to Bluehost for supporting the show. 00:30:19.220 --> 00:30:21.720 All right. 00:30:21.720 --> 00:30:23.760 What other takeaways should we talk about in this first one? 00:30:23.760 --> 00:30:26.380 I think we maybe have pretty much covered it. 00:30:26.380 --> 00:30:26.880 Let's see. 00:30:26.880 --> 00:30:32.340 I guess, you know, we mentioned before, but if people want to see sort of examples of pre-commit 00:30:32.340 --> 00:30:37.140 hooks failing or succeeding or failing because they changed something, which is not exactly 00:30:37.140 --> 00:30:42.160 a failure, but stopping and starting over, you have a nice example of what that's like 00:30:42.160 --> 00:30:42.400 there. 00:30:42.400 --> 00:30:49.860 So one thing that I guess might be useful is sometimes maybe you don't want to run the 00:30:49.860 --> 00:30:50.560 pre-commit hooks. 00:30:50.560 --> 00:30:57.100 Maybe you need to check in something in a certain way to fix the servers down, right? 00:30:57.100 --> 00:30:58.100 We have to check this in. 00:30:58.100 --> 00:31:01.560 I can't fix this hook, whatever this hook is upset about right now. 00:31:01.560 --> 00:31:03.280 It needs to go in right away. 00:31:03.280 --> 00:31:05.200 Just let me commit it, right? 00:31:05.200 --> 00:31:05.900 You can do that. 00:31:05.900 --> 00:31:10.180 I mean, I think there are probably several use cases or something like this. 00:31:10.180 --> 00:31:13.540 Maybe you're going to be squashing things later and it doesn't, and it's, you don't, 00:31:13.540 --> 00:31:17.040 maybe you don't even know what the API for you're doing, what you're doing is going to 00:31:17.040 --> 00:31:17.440 look like. 00:31:17.440 --> 00:31:22.260 It could be, and this kind of ties back to what we talked about earlier, perhaps roughs 00:31:22.260 --> 00:31:25.780 doing something and you don't agree with, but you need to like check with the rest of 00:31:25.780 --> 00:31:29.440 your team to make sure that everyone's in agreement with let's remove this rule. 00:31:29.440 --> 00:31:29.900 Right. 00:31:29.960 --> 00:31:34.060 So it's, I, this definitely don't encourage always doing this. 00:31:34.060 --> 00:31:35.280 That defeats the purpose, right? 00:31:35.280 --> 00:31:40.100 But there is kind of a break glass solution here where you, let's say you first run, get 00:31:40.100 --> 00:31:45.200 commit and something fails and it's not something that you either want to fix at the moment or 00:31:45.200 --> 00:31:45.980 really can fix. 00:31:45.980 --> 00:31:48.960 Then you can just pass it, pass in dash, dash, no verify. 00:31:48.960 --> 00:31:51.440 And none of the checks run at that point. 00:31:51.440 --> 00:31:54.400 So it's like, as if the checks were never there in the first place. 00:31:54.400 --> 00:31:54.880 Right. 00:31:55.080 --> 00:31:55.260 Right. 00:31:55.260 --> 00:31:55.380 Right. 00:31:55.380 --> 00:31:55.820 Okay. 00:31:55.820 --> 00:31:56.840 That's pretty interesting. 00:31:56.840 --> 00:32:00.160 Like you say, hopefully people don't run that all the time. 00:32:00.160 --> 00:32:03.380 At that point, just remove the pre-commit setup, save yourself. 00:32:03.380 --> 00:32:03.720 Yeah. 00:32:03.720 --> 00:32:05.680 Like what are you, what are you even doing? 00:32:05.680 --> 00:32:05.900 Right. 00:32:05.900 --> 00:32:11.460 I suppose there's an interesting interplay between pre-commit hooks and continuous integration, 00:32:11.460 --> 00:32:12.120 right? 00:32:12.120 --> 00:32:16.220 Like in a sense, they are often checking some of the same things. 00:32:16.220 --> 00:32:17.040 What do you think? 00:32:17.040 --> 00:32:22.680 So I think it's probably an example, like not, not quite a Venn diagram. 00:32:22.680 --> 00:32:29.060 I probably, the circle for pre-commit is entirely contained within the circle for the CICD. 00:32:29.060 --> 00:32:33.080 The difference is there are certain things where you can get immediate feedback, quick 00:32:33.080 --> 00:32:37.620 feedback locally, and that should be something that you can put pre-commit things like linting, 00:32:37.620 --> 00:32:38.520 formatting, et cetera. 00:32:38.520 --> 00:32:42.200 And then CICD may be running your test suite. 00:32:42.200 --> 00:32:44.620 That's definitely not something you want to be doing in a commit. 00:32:44.620 --> 00:32:48.700 Imagine you have a test suite that takes three minutes to run, even maybe three minutes isn't 00:32:48.700 --> 00:32:52.760 that bad, but every commit waiting three minutes is definitely not something you want to do. 00:32:52.760 --> 00:32:53.140 No. 00:32:53.140 --> 00:32:55.240 But it's still a check that you should definitely be running. 00:32:55.240 --> 00:32:57.620 So in CICD, I would run everything. 00:32:57.620 --> 00:32:58.920 Do the linting, do the formatting. 00:32:58.920 --> 00:33:03.680 That's your final, that's your last layer of defense and you need to be checking everything. 00:33:03.680 --> 00:33:06.480 And this just allows developers to get that feedback sooner. 00:33:06.480 --> 00:33:07.020 Right. 00:33:07.020 --> 00:33:12.480 So what you're actually checking in and you finally approve is much closer to what CICD 00:33:12.480 --> 00:33:14.140 would kind of want in the first place, right? 00:33:14.140 --> 00:33:14.540 Yeah. 00:33:14.540 --> 00:33:14.920 Yeah. 00:33:14.920 --> 00:33:15.380 Okay. 00:33:15.380 --> 00:33:17.180 And it's also a much faster feedback, right? 00:33:17.180 --> 00:33:20.940 So like if the thing has to run all the way through the linting, the formatting, the testing, 00:33:20.940 --> 00:33:25.380 the type checking, whatever, you might be waiting 10, 15 minutes for all the things to run when 00:33:25.380 --> 00:33:30.280 you could have had, you know, under a minute, hopefully way under a minute feedback instantly that 00:33:30.280 --> 00:33:31.740 your file wasn't formatted correctly. 00:33:31.740 --> 00:33:34.400 It should be near instantaneous, right? 00:33:34.500 --> 00:33:40.680 I mean, instant maybe is asking too much, but some of that astral stuff is kind of ridiculous. 00:33:40.680 --> 00:33:41.240 Yeah. 00:33:41.240 --> 00:33:43.480 I think you have to be very careful, right? 00:33:43.480 --> 00:33:47.680 Because there's all these checks and I think you had up on the screen maybe earlier, like 00:33:47.680 --> 00:33:53.240 the pre-commit hooks, the general ones provided by the pre-commit organization. 00:33:53.240 --> 00:33:53.940 Yeah. 00:33:53.940 --> 00:33:56.940 There's tons of things in there, but you do have to be careful, right? 00:33:56.940 --> 00:33:59.560 Because if you're like, oh, this could be good and this could be good and this could 00:33:59.560 --> 00:33:59.820 be good. 00:33:59.820 --> 00:34:01.820 Each check is adding time. 00:34:02.000 --> 00:34:05.920 Assuming, like I say, assuming they're all running on Python files, you're adding time 00:34:05.920 --> 00:34:06.500 to how long. 00:34:06.500 --> 00:34:09.500 So you do have to be mindful of what you actually need. 00:34:09.500 --> 00:34:15.020 And if you go to the point where you end up making the whole process take too long, people 00:34:15.020 --> 00:34:16.040 are going to stop using it. 00:34:16.040 --> 00:34:17.380 And then that defeats the... 00:34:17.380 --> 00:34:17.540 Yeah. 00:34:17.540 --> 00:34:18.180 Yeah, exactly. 00:34:18.180 --> 00:34:22.520 As soon as it becomes a point where people go, I'm not using this thing, then you're kind 00:34:22.520 --> 00:34:25.660 of kind of sort of lost unless you can just say, no, you have to use it. 00:34:25.660 --> 00:34:27.640 But then you just have unhappy teammates. 00:34:27.640 --> 00:34:28.300 Exactly. 00:34:28.300 --> 00:34:29.960 Either way, it's not a real great outcome, is it? 00:34:29.960 --> 00:34:35.380 I mean, if there's something that maybe only runs on a few files every once in a while, then 00:34:35.380 --> 00:34:39.900 if you are having problems with speed, then you can also consider moving that to the CICD. 00:34:39.900 --> 00:34:45.080 And I am definitely a big fan of rough, as you said, like just switching from black, flaky, 00:34:45.260 --> 00:34:50.000 all that onto rough, you do save a significant amount of time on these checks and it's a huge 00:34:50.000 --> 00:34:50.320 benefit. 00:34:50.320 --> 00:34:51.580 Yeah, it's pretty ridiculous. 00:34:51.580 --> 00:34:54.620 Now, this is not a get pre-commit thing. 00:34:54.620 --> 00:34:57.160 This is a pre-commit the project thing. 00:34:57.160 --> 00:35:01.700 But you can, if you're using this pre-commit project we've been talking about, you can say 00:35:01.700 --> 00:35:07.080 pre-commit space run and do kind of a test without actually doing a commit, right? 00:35:07.080 --> 00:35:07.580 Correct. 00:35:07.580 --> 00:35:07.960 Yeah. 00:35:07.960 --> 00:35:09.840 So there's a bit of nuances. 00:35:09.840 --> 00:35:14.200 So if you just do pre-commit run, it's going to run all of your hooks, but on the staged 00:35:14.200 --> 00:35:17.280 changes, because it's thinking essentially you're doing like a dry run. 00:35:17.280 --> 00:35:22.180 If you, let's say, are adding a new hook and you want to make sure all of your files are 00:35:22.180 --> 00:35:26.260 compatible with that new hook, then you might want to do something like pre-commit run dash 00:35:26.260 --> 00:35:27.220 dash all files. 00:35:27.220 --> 00:35:31.500 So look through your entire repository, regardless of whether you have changes in place. 00:35:31.500 --> 00:35:37.400 So if you say pre-commit run, it only works on your, basically your changed files, not the 00:35:37.400 --> 00:35:38.760 stuff that's already there and accepted. 00:35:38.760 --> 00:35:39.460 Correct. 00:35:39.460 --> 00:35:44.100 And another neat thing is in the case I mentioned where you add a new hook, you might just want 00:35:44.100 --> 00:35:44.900 to run that hook. 00:35:44.900 --> 00:35:49.080 So you can say pre-commit run and then the hook ID, and then you would just run that hook 00:35:49.080 --> 00:35:52.740 and then you can define either a certain set of files or the staged runs, whatever. 00:35:52.740 --> 00:35:53.280 Yeah. 00:35:53.280 --> 00:35:56.840 That sounds pretty useful when you're building your own pre-commit hook, right? 00:35:56.840 --> 00:36:01.880 So yeah, depending on how you build it, you can either use that or they have also a try 00:36:01.880 --> 00:36:03.000 repo command. 00:36:03.000 --> 00:36:03.840 Right. 00:36:03.840 --> 00:36:04.300 Got it. 00:36:04.300 --> 00:36:04.580 Got it. 00:36:04.580 --> 00:36:06.340 Well, let's see. 00:36:06.340 --> 00:36:12.720 Maybe we could jump over and talk a bit through your hook creation guide, a step-by-step guide 00:36:12.720 --> 00:36:14.280 to developing your own pre-commit hook. 00:36:14.280 --> 00:36:17.380 I thought this was really, like I said, a good article. 00:36:17.380 --> 00:36:23.860 And maybe one of the first things we talk about is just what makes a good hook in the first 00:36:23.860 --> 00:36:24.380 place, right? 00:36:24.720 --> 00:36:29.760 You said that they can't be too long or people will go crazy and turn them off or skip them 00:36:29.760 --> 00:36:30.080 or whatever. 00:36:30.080 --> 00:36:31.120 But what else? 00:36:31.120 --> 00:36:36.840 So I think another big thing is if you're able to fix something, then you should fix it. 00:36:36.840 --> 00:36:41.060 In the case of formatting and you're saying, oh, this should have a trailing comma, then 00:36:41.060 --> 00:36:42.140 that's easy enough. 00:36:42.140 --> 00:36:43.240 You can add the trailing comma. 00:36:43.240 --> 00:36:44.900 You don't make more work for the user. 00:36:44.900 --> 00:36:48.800 If you can't do that, then you should be very specific saying this file. 00:36:48.960 --> 00:36:53.760 And if you have a line number saying exactly where it is, because just saying there's something 00:36:53.760 --> 00:36:57.320 wrong in this file and someone has to hunt it is also not a good user experience. 00:36:57.320 --> 00:37:00.700 No, that's going to be frustrating and super, super quick. 00:37:00.700 --> 00:37:01.140 Yeah. 00:37:01.140 --> 00:37:03.220 So be really descriptive about it. 00:37:03.220 --> 00:37:06.820 And then also, maybe choose not to make it a pre-commit hook, right? 00:37:06.820 --> 00:37:09.400 Not necessarily everything needs to run on every commit. 00:37:09.400 --> 00:37:12.860 Yeah, I think that the speed thing is a huge factor. 00:37:12.860 --> 00:37:18.880 And in general, I think one big thing that is key to note here is that it's even, 00:37:18.880 --> 00:37:23.900 though, let's say you change files that, let's say you change a Python file, a Markdown file 00:37:23.900 --> 00:37:24.980 and an image file. 00:37:24.980 --> 00:37:30.400 If you're making a hook that only runs on a certain type of file, if you're careful and 00:37:30.400 --> 00:37:34.080 specify that, then it's not necessarily a bad thing to include that in there because it will 00:37:34.080 --> 00:37:36.480 only get triggered on those certain types of files. 00:37:36.480 --> 00:37:40.400 And so like an example I have is the XF stripper. 00:37:40.400 --> 00:37:44.500 Well, I created when I was building my website. 00:37:44.500 --> 00:37:46.820 Your XF stripper is super interesting. 00:37:47.000 --> 00:37:48.960 I'm starting to think maybe I want this as well. 00:37:48.960 --> 00:37:53.580 Yeah, I was just very paranoid at one point about just working with images. 00:37:53.580 --> 00:37:57.480 And so they come with, what's up here? 00:37:57.480 --> 00:38:02.100 So exchangeable image file format data or XF as it's commonly called. 00:38:02.100 --> 00:38:06.340 It's metadata that is in the image that you might not realize is there. 00:38:06.340 --> 00:38:12.500 And so in this article, I talk about a picture of me presenting that I was given from a conference. 00:38:12.500 --> 00:38:15.500 And this was something that was stored, I think, in a Google Drive. 00:38:15.500 --> 00:38:18.320 So you have access to all the metadata that was available. 00:38:18.320 --> 00:38:20.620 So I never met the photographer. 00:38:20.620 --> 00:38:24.680 And yet I know the photographer's name, the camera they use, what type of computer they have, 00:38:24.680 --> 00:38:26.880 how they edited it, all kinds of information. 00:38:27.060 --> 00:38:31.200 And the dangerous part is the exact location of where this was. 00:38:31.200 --> 00:38:33.280 Now, conference, not a big deal. 00:38:33.280 --> 00:38:38.880 But you have to think about maybe you're blogging about something you did in your house or your apartment. 00:38:38.880 --> 00:38:47.180 And now you have a photo up on your website where anyone can potentially see it that has the GPS coordinates for where you live. 00:38:47.440 --> 00:38:48.460 Yeah, that wouldn't be great, no. 00:38:48.460 --> 00:38:50.980 So I was very paranoid about this. 00:38:50.980 --> 00:38:54.560 And I don't want the idea of like, oh, I'm going to add a new image. 00:38:54.560 --> 00:39:00.400 Let me go through my checklist of what I need to do because I know at some point I'm going to mess something up or forget it. 00:39:00.400 --> 00:39:03.960 And so this is a perfect use case for the pre-commit, right? 00:39:03.960 --> 00:39:08.480 Because you want something that is going to stop you and tell you, nope, you can't do this, right? 00:39:08.480 --> 00:39:14.700 And in this case, it can also remove the metadata because I am being super conservative and saying no metadata, 00:39:14.700 --> 00:39:19.800 which has the nice side benefit of shrinking files, which is good for serving them. 00:39:19.800 --> 00:39:20.240 Yeah. 00:39:20.240 --> 00:39:26.340 Well, what value is it to have all that metadata in there for a blog? 00:39:26.340 --> 00:39:29.540 Most of the time, most people are not, they just want to see, they want to read the blog. 00:39:29.540 --> 00:39:31.160 They're not going to dissect your image, right? 00:39:31.160 --> 00:39:36.480 I think it depends what you, I mean, maybe you have a travel blog and you want to know like, here's that location. 00:39:36.660 --> 00:39:40.800 And then you have one off post where you introduce yourself and oops, you know? 00:39:40.800 --> 00:39:41.580 Yeah. 00:39:41.580 --> 00:39:42.760 There's so many ways. 00:39:42.760 --> 00:39:46.400 And I think even just thinking, oh, I'm only going to be doing this. 00:39:46.400 --> 00:39:48.740 There's always going to be something that later on happens. 00:39:48.740 --> 00:39:53.420 So you have to be very careful just upfront that everything is going to go through this track. 00:39:53.420 --> 00:39:54.200 Sure. 00:39:54.200 --> 00:39:58.140 Can your exit thing, can it be selective about the metadata? 00:39:58.140 --> 00:40:00.860 That's something I do want to do in the future. 00:40:00.860 --> 00:40:03.240 Just remove the location if you say. 00:40:03.240 --> 00:40:11.140 But the thing is, there's like, looking through all of that, it's hard to tell if there might be something in one subset of images you take that might be sensitive. 00:40:11.140 --> 00:40:16.580 You can even think of certain situations where you might not want someone to know what kind of device you were using. 00:40:16.580 --> 00:40:16.900 Right. 00:40:16.900 --> 00:40:20.780 Because maybe they're like, oh, that device is vulnerable to something and I know they have it. 00:40:20.780 --> 00:40:20.980 Right. 00:40:21.900 --> 00:40:34.300 The worst of these is, I think, the multiple times, pretty sure it was the Samsung, but one of the Android companies posted a picture promoting the new phone. 00:40:34.300 --> 00:40:38.620 And, you know, the exit information had the picture as being from an iPhone or something like that. 00:40:38.620 --> 00:40:40.140 Oh, no, it was the other way around, I think. 00:40:40.140 --> 00:40:41.060 Oh, the other way around. 00:40:41.060 --> 00:40:41.620 I think I remember hearing that, yeah. 00:40:41.620 --> 00:40:49.400 Well, it was like one phone company was posting it from, but the picture was actually, even though it was about the phone, it was, you know, implying this picture comes from or something. 00:40:49.400 --> 00:40:50.160 It was like, nope. 00:40:50.160 --> 00:40:54.780 Whoever is on the marketing team just happens to have the other kind of phone and there it goes. 00:40:54.780 --> 00:40:54.960 Right. 00:40:54.960 --> 00:40:55.760 And it's a huge scandal. 00:40:55.760 --> 00:41:00.340 I mean, for those companies that talk about how awesome they're, how much better their cameras are or whatever. 00:41:00.340 --> 00:41:01.880 Well, I see that's also the thing, right? 00:41:01.880 --> 00:41:03.980 Because you never know who's going to look at the metadata either. 00:41:03.980 --> 00:41:09.540 So, and it's interesting because certain things will, certain platforms will remove it. 00:41:09.540 --> 00:41:12.320 So I mentioned like Google Drive, it's everything is preserved. 00:41:12.320 --> 00:41:15.520 But the thing is, is you have to know ahead of time. 00:41:15.520 --> 00:41:18.480 So you'd have to say, I'm planning to put this image here. 00:41:18.480 --> 00:41:20.240 Let me upload a dummy image. 00:41:20.240 --> 00:41:23.400 I don't care and check if the metadata is still there. 00:41:23.400 --> 00:41:24.500 Yeah, exactly. 00:41:24.500 --> 00:41:25.340 Yeah. 00:41:25.340 --> 00:41:27.780 I think, I think Mastodon might remove it. 00:41:27.780 --> 00:41:30.760 There's some certain platforms that will take away that metadata. 00:41:30.760 --> 00:41:31.900 I think Facebook might. 00:41:31.900 --> 00:41:33.100 It's been a long time. 00:41:33.100 --> 00:41:35.780 I mean, it's a huge security concern. 00:41:35.780 --> 00:41:41.720 So I imagine more and more places are, but I just wanted to have an abundance of caution and not risk anything happening. 00:41:41.720 --> 00:41:42.580 Well, yeah. 00:41:42.580 --> 00:41:49.220 And you're putting it on the internet as well, which there's, it goes straight from your computer through some sort of static website process. 00:41:49.220 --> 00:41:50.560 And then it's downloaded, right? 00:41:50.560 --> 00:41:53.060 There's very, there's no, nothing in between those two steps. 00:41:53.060 --> 00:41:53.860 Exactly. 00:41:53.860 --> 00:41:55.600 At least not in terms of image processing. 00:41:55.600 --> 00:41:55.960 Yeah. 00:41:55.960 --> 00:41:56.200 Yeah. 00:41:56.200 --> 00:41:56.560 Cool. 00:41:57.240 --> 00:41:58.060 Yeah, this is nice. 00:41:58.060 --> 00:42:01.160 I'm thinking about grabbing it and trying out. 00:42:01.160 --> 00:42:03.420 What file types does it work on? 00:42:03.420 --> 00:42:07.440 Does it work on just JPEGs or does it do like WebP and all that? 00:42:07.440 --> 00:42:12.120 Any image, anything that's classified as an image on pre-commit, the way pre-commit runs. 00:42:12.120 --> 00:42:15.760 And it has to work with, I'm using Pillow. 00:42:15.760 --> 00:42:17.920 So if Pillow can't read it, then it's not going to work. 00:42:17.920 --> 00:42:18.500 Right. 00:42:18.500 --> 00:42:20.540 Then I'll just skip over it or whatever. 00:42:20.540 --> 00:42:21.020 Yeah. 00:42:21.200 --> 00:42:21.380 Yeah. 00:42:21.380 --> 00:42:26.480 So really quick, while we're talking about stuff on your website, your website's super nice. 00:42:26.480 --> 00:42:28.600 Did you build this yourself? 00:42:28.600 --> 00:42:29.620 Like, how is this thing built? 00:42:29.620 --> 00:42:29.960 I did. 00:42:29.960 --> 00:42:31.320 I did build it to myself. 00:42:32.520 --> 00:42:38.100 I took a couple months in the beginning of the year and I had before a single page where 00:42:38.100 --> 00:42:39.600 it was just like some boxes. 00:42:39.600 --> 00:42:41.720 And then I was like, this needs to be revisited. 00:42:42.220 --> 00:42:47.940 So it's built with Next.js and so React and TypeScript. 00:42:47.940 --> 00:42:50.280 And then I use Tailwind CSS. 00:42:50.280 --> 00:42:54.700 And yeah, it was kind of just like, I mean, a lot of these things are for me because sometimes, 00:42:54.700 --> 00:43:00.740 you know, I like seeing all in one place where I'm speaking next or like stats about where 00:43:00.740 --> 00:43:02.480 I've spoken, like a map and stuff. 00:43:02.480 --> 00:43:08.060 And I went through, so kind of my process would be, you know, on my iPad, I would sketch out 00:43:08.060 --> 00:43:13.180 what I kind of envisioned a page looking at and then I would prototype it in React and 00:43:13.180 --> 00:43:17.780 see, okay, maybe this isn't fully work or like tweak things and iterate on a few times 00:43:17.780 --> 00:43:20.600 and bit by bit the pages formed. 00:43:20.600 --> 00:43:24.280 The latest thing I added was this timeline functionality. 00:43:24.280 --> 00:43:31.240 At EuroPython this year, I had this idea for a timeline and I kind of got really, really into 00:43:31.240 --> 00:43:31.400 it. 00:43:31.400 --> 00:43:31.860 So it was funny. 00:43:31.860 --> 00:43:32.780 I had a Python conference. 00:43:32.780 --> 00:43:34.040 I was doing tons of React. 00:43:34.640 --> 00:43:38.160 But if you scroll down a tiny bit, there's actually too much. 00:43:38.160 --> 00:43:39.040 This one, right? 00:43:39.040 --> 00:43:39.420 Yeah, yeah. 00:43:39.420 --> 00:43:41.120 Versus the little text. 00:43:41.120 --> 00:43:42.620 Oh, the complete upcoming. 00:43:42.620 --> 00:43:43.360 Yeah, I got you. 00:43:43.360 --> 00:43:44.180 So I built this. 00:43:44.180 --> 00:43:45.060 Oh, that's beautiful. 00:43:45.060 --> 00:43:45.620 I love it. 00:43:45.620 --> 00:43:48.780 It's like a little infographic of your upcoming events. 00:43:48.780 --> 00:43:49.360 Yeah. 00:43:49.360 --> 00:43:53.980 So I was like very inspired and I did this in a few days. 00:43:53.980 --> 00:43:59.480 But it's nice because, you know, going from the sketch to the React components, it's become 00:43:59.480 --> 00:44:03.380 very natural, which it takes a bit to get there. 00:44:03.380 --> 00:44:08.720 But it was nice because I did have to learn TypeScript for some changes in my team. 00:44:08.720 --> 00:44:10.540 We were going to be starting moving to TypeScript. 00:44:10.540 --> 00:44:15.700 So this was great to work on something that, you know, fit in my head as far as what needed 00:44:15.700 --> 00:44:16.200 to be done. 00:44:16.200 --> 00:44:17.960 And it was very, very helpful. 00:44:17.960 --> 00:44:20.300 But yeah, so I'm very proud of this. 00:44:20.300 --> 00:44:22.380 There's still more, tons more to do. 00:44:22.380 --> 00:44:23.720 I have massive lists. 00:44:24.100 --> 00:44:25.380 But yeah, I remember looking at Google. 00:44:25.380 --> 00:44:26.820 This is a nice static site. 00:44:26.820 --> 00:44:27.240 Very cool. 00:44:27.240 --> 00:44:28.820 And I didn't even see this feature. 00:44:28.820 --> 00:44:29.220 This is great. 00:44:29.220 --> 00:44:31.900 Broadvon out in the audience says fire emoji for it. 00:44:31.900 --> 00:44:32.300 Very good. 00:44:32.300 --> 00:44:33.580 Thank you. 00:44:33.580 --> 00:44:36.020 And also, thanks. 00:44:36.020 --> 00:44:38.200 I see you put the podcast appearance on here as well. 00:44:38.200 --> 00:44:38.860 That's cool. 00:44:38.860 --> 00:44:40.040 So that's happening today. 00:44:40.040 --> 00:44:41.360 Watch the live stream now. 00:44:41.360 --> 00:44:43.360 If you're not watching now, then it's probably missed it. 00:44:43.360 --> 00:44:45.000 But the recording will be there, of course. 00:44:45.000 --> 00:44:49.320 But the reason I say that is you maybe want to give a shout out to some of your upcoming 00:44:49.320 --> 00:44:50.120 events. 00:44:50.120 --> 00:44:51.000 Yeah, why not? 00:44:51.120 --> 00:44:57.360 So I'm going to be in San Francisco next week talking about my Datamorph project. 00:44:57.360 --> 00:45:02.680 And I'll also be doing a book signing there for my hands-on data analysis with Pandas book, 00:45:02.680 --> 00:45:03.320 second edition. 00:45:03.320 --> 00:45:09.840 And then after that, I'm off to France to give a workshop on Pandas and then also talk about 00:45:09.840 --> 00:45:12.140 getting started in open source contributions. 00:45:12.140 --> 00:45:18.220 And then a couple of weeks after that, I will be at the final conference of the year in Australia. 00:45:18.220 --> 00:45:21.100 And I will be talking about Datamorph once again. 00:45:21.100 --> 00:45:26.280 And I'm hoping to run my third development sprint on Datamorph while I'm there. 00:45:26.280 --> 00:45:27.120 Oh, that's cool. 00:45:27.120 --> 00:45:28.980 Yeah, we'll talk about Datamorph in a second. 00:45:28.980 --> 00:45:30.380 That's some interesting stuff. 00:45:30.380 --> 00:45:33.000 But this is quite the agenda. 00:45:33.000 --> 00:45:34.400 You got a full trip coming up. 00:45:34.400 --> 00:45:35.360 No, I'm excited. 00:45:35.360 --> 00:45:39.600 It's nice to see different cultures. 00:45:39.960 --> 00:45:43.960 It definitely does land different, you know, the topics and just reactions. 00:45:43.960 --> 00:45:46.300 Some people are at the top excited. 00:45:46.300 --> 00:45:48.340 Some of them are just straight face. 00:45:48.340 --> 00:45:49.500 You're like, I enjoy it. 00:45:49.500 --> 00:45:54.660 I think it really comes into play as far as giving workshops. 00:45:54.660 --> 00:46:00.100 I was in Portugal last week and I did the data analysis workshop. 00:46:00.100 --> 00:46:03.500 And I think that was one of the best ones I've ever had. 00:46:03.560 --> 00:46:07.500 It was very, very highly interactive and it was a really fun time for me. 00:46:07.500 --> 00:46:09.240 And hopefully everyone else thought so as well. 00:46:09.240 --> 00:46:11.180 Yeah, that's fantastic. 00:46:11.180 --> 00:46:12.940 How did you get into public speaking? 00:46:12.940 --> 00:46:19.960 Yeah, so I wrote the hands-on data analysis with Panda's book in 2019. 00:46:20.640 --> 00:46:25.240 And at that time, if you had told me, go do some public speaking, I'm like, please no. 00:46:25.240 --> 00:46:29.180 You're going to France and Australia and Portugal recently. 00:46:29.180 --> 00:46:30.160 So I'm like, no, no, no. 00:46:30.160 --> 00:46:30.640 Yeah. 00:46:30.640 --> 00:46:38.940 And then, well, during pandemic times, a conference reached out to me about doing a workshop on pandas 00:46:38.940 --> 00:46:41.560 because I had written the book and doing it virtually. 00:46:41.560 --> 00:46:47.160 And to me, that felt like a good stepping stone to get over that fear of public speaking and 00:46:47.160 --> 00:46:48.380 the fact that it would be virtual. 00:46:48.840 --> 00:46:50.260 I wouldn't really have to look at anyone. 00:46:50.260 --> 00:46:55.620 And I was still absolutely terrified when it came to actually delivering that talk. 00:46:55.620 --> 00:46:57.560 And when you think about it, it wasn't a talk, right? 00:46:57.560 --> 00:47:01.440 So it was my first thing was a four-hour workshop. 00:47:01.440 --> 00:47:08.180 And now I'm at the point where a virtual thing is much less desirable because it's so hard when 00:47:08.180 --> 00:47:12.520 you can't see people, you can't see our things landing, are they confused, are they with me? 00:47:12.520 --> 00:47:13.740 Are they even still there? 00:47:14.740 --> 00:47:19.180 So, and then after I did, you know, I made it to the end and I was like, okay, that's 00:47:19.180 --> 00:47:22.200 definitely something I want to work on and do it again. 00:47:22.200 --> 00:47:26.240 So I did, I came up with a second workshop on data visualization. 00:47:26.640 --> 00:47:31.260 And then I think I did two or three more virtual sessions. 00:47:31.260 --> 00:47:35.820 And then it became that some conferences were now in person. 00:47:35.820 --> 00:47:37.940 And I was like, okay, I think I should try this. 00:47:37.940 --> 00:47:40.080 And again, it was still a long one. 00:47:40.080 --> 00:47:42.320 It may have even been a six-hour session that time. 00:47:42.320 --> 00:47:43.800 So it's like crazy, right? 00:47:43.800 --> 00:47:45.700 And then I did that in person. 00:47:45.700 --> 00:47:47.520 And I was like, okay, I survived. 00:47:47.780 --> 00:47:52.680 And then it kind of just felt like something, if I kept doing it, I would get over it or 00:47:52.680 --> 00:47:56.620 at least get to the point where, you know, I could do it without being terrified for a 00:47:56.620 --> 00:47:57.400 month ahead of time. 00:47:57.400 --> 00:47:57.800 Right. 00:47:57.800 --> 00:47:59.000 And I am at that point now. 00:47:59.000 --> 00:48:04.160 It is like, I enjoy doing it because I enjoy, I'm very passionate about knowledge sharing and 00:48:04.160 --> 00:48:09.480 just teaching people and getting that interaction that, oh, people are really like getting value 00:48:09.480 --> 00:48:10.080 out of this. 00:48:10.080 --> 00:48:11.760 And that to me is very nice. 00:48:11.760 --> 00:48:12.180 Yeah. 00:48:12.180 --> 00:48:13.060 It's super rewarding. 00:48:13.520 --> 00:48:15.340 So, but yeah, this is quite impressive. 00:48:15.340 --> 00:48:18.600 So just, I got the sense you kind of got started pretty soon. 00:48:18.600 --> 00:48:19.320 You said 2019. 00:48:19.320 --> 00:48:21.400 So that's, haven't been doing it for that long. 00:48:21.400 --> 00:48:22.140 And this is great. 00:48:22.140 --> 00:48:27.020 So maybe, you know, you brought it, maybe we could talk a bit about your book as well. 00:48:27.020 --> 00:48:29.520 I don't know what to say about this one. 00:48:29.520 --> 00:48:33.220 Just that it exists and people should check it out. 00:48:33.220 --> 00:48:33.980 It's giant. 00:48:33.980 --> 00:48:34.600 It's giant. 00:48:34.600 --> 00:48:36.500 As you can see, 788 pages. 00:48:36.500 --> 00:48:37.680 Holy moly. 00:48:37.680 --> 00:48:38.300 That is giant. 00:48:38.300 --> 00:48:40.180 Yeah. 00:48:40.180 --> 00:48:42.120 So this is the second edition. 00:48:42.480 --> 00:48:46.040 If you scroll down, there's also the covers for the Korean and Chinese editions. 00:48:46.040 --> 00:48:47.660 Oh, awesome. 00:48:47.660 --> 00:48:52.160 And I do not read either of those, but I do have copies. 00:48:52.160 --> 00:48:53.520 You can act of faith to put your name on them. 00:48:53.520 --> 00:48:54.420 You know what? 00:48:54.420 --> 00:48:59.380 I've been told by people that read both of those languages that the name is not quite translated 00:48:59.380 --> 00:49:02.020 correctly, but you know, I'll forget about that. 00:49:02.020 --> 00:49:03.480 It's cool to have the copies. 00:49:03.480 --> 00:49:04.680 Yeah. 00:49:04.680 --> 00:49:10.400 So this book covers obviously pandas working through the basics of data analysis. 00:49:10.400 --> 00:49:13.460 We also talk about data visualization. 00:49:13.460 --> 00:49:19.120 And then there is a little bit towards the end about like actually applying this stuff 00:49:19.120 --> 00:49:21.840 to use cases and also a little bit of machine learning. 00:49:21.840 --> 00:49:22.200 Cool. 00:49:22.200 --> 00:49:22.960 Yeah. 00:49:23.020 --> 00:49:24.540 So I'll put a link in the show notes. 00:49:24.540 --> 00:49:26.840 People can check it out if they would like to. 00:49:26.840 --> 00:49:27.240 All right. 00:49:27.240 --> 00:49:28.300 I feel like there's a few things. 00:49:28.300 --> 00:49:31.880 We didn't make it very far in our creation guide. 00:49:31.880 --> 00:49:33.820 So let's talk about the recipe. 00:49:33.820 --> 00:49:34.440 All right. 00:49:34.440 --> 00:49:35.720 What are the four steps? 00:49:35.720 --> 00:49:39.000 At least Stephanie's recipe for pre-commit hook. 00:49:39.200 --> 00:49:39.520 Yeah. 00:49:39.520 --> 00:49:41.100 This is definitely my recipe. 00:49:41.100 --> 00:49:46.200 I mean, I've, I think I've made two that are published ones and then obviously a few other 00:49:46.200 --> 00:49:48.200 for trainings and explanation purposes. 00:49:48.200 --> 00:49:50.980 And this, this is something that works well for me. 00:49:50.980 --> 00:49:53.880 And I think makes sense as far as thinking about the pieces. 00:49:53.880 --> 00:49:59.040 So the first thing, the hardest thing is actually to figure out what are you checking and how do 00:49:59.040 --> 00:50:00.100 you actually code that up? 00:50:00.100 --> 00:50:03.960 And if you want to do this in Python, this is just, okay, code your logic. 00:50:03.960 --> 00:50:04.360 Yeah. 00:50:04.360 --> 00:50:04.540 Right. 00:50:04.540 --> 00:50:04.820 Yeah. 00:50:04.820 --> 00:50:09.140 Well, and if it has a --fix, maybe that's even harder than just trying to 00:50:09.140 --> 00:50:10.020 understand, right? 00:50:10.020 --> 00:50:13.600 Because now you got to not break somebody's code or sorts of things like that. 00:50:13.600 --> 00:50:13.780 Yeah. 00:50:13.780 --> 00:50:18.260 But this would be where you start at the basic level, probably first, you know, find, 00:50:18.260 --> 00:50:22.140 figure out, can you find the issue and show people where it is? 00:50:22.140 --> 00:50:23.320 And then you can look into fixing it. 00:50:23.320 --> 00:50:27.160 But yeah, you have to be very careful, especially if you're going to be touching things. 00:50:27.160 --> 00:50:32.540 So I guess it's pretty straightforward, but the magic of Python is not just the language 00:50:32.540 --> 00:50:37.400 and the static, the standard library, but the 500,000 external packages, right? 00:50:37.420 --> 00:50:41.080 There's probably a ton of external packages that understand code, check different things. 00:50:41.080 --> 00:50:44.520 And you could, you can use those in your hook implementation, right? 00:50:44.520 --> 00:50:47.860 Just like a standard Python package, it can have dependencies and stuff. 00:50:47.860 --> 00:50:48.380 Yes. 00:50:48.380 --> 00:50:54.020 And so I talk about this in the third step, but I do like to make it as a package just 00:50:54.020 --> 00:50:57.820 because you know that that's going to work and grab the dependencies as long as you follow 00:50:57.820 --> 00:50:58.900 what you already know. 00:50:59.300 --> 00:51:04.580 And pre-commit will, you will tell pre-commit in the fourth step in that pre-commit hooks 00:51:04.580 --> 00:51:06.120 file how it should be installed. 00:51:06.120 --> 00:51:11.020 So when you say this is, this is Python, then it will know, okay, so I should be using, for 00:51:11.020 --> 00:51:12.400 example, pip to install this. 00:51:12.400 --> 00:51:16.960 And if you have, for example, pyproject.tomo and you specify how it should be built, then 00:51:16.960 --> 00:51:18.800 all of that just happens as it normally would. 00:51:18.800 --> 00:51:20.500 It's just that pre-commit is doing it instead of you. 00:51:20.500 --> 00:51:20.900 Yeah. 00:51:20.900 --> 00:51:21.280 Yeah. 00:51:21.340 --> 00:51:25.880 That's kind of, instead of you doing a pip install dashy dot or whatever, that it's 00:51:25.880 --> 00:51:26.680 kind of figuring that out. 00:51:26.680 --> 00:51:31.420 And I guess we haven't really talked too much about it, but when you pre-commit install, it 00:51:31.420 --> 00:51:36.420 looks at the, this hooks YAML file and then it, it creates the environment and it downloads 00:51:36.420 --> 00:51:39.120 all the packages the first time to kind of set it up. 00:51:39.120 --> 00:51:41.060 Then it just runs over and over after that. 00:51:41.060 --> 00:51:41.240 Right. 00:51:41.240 --> 00:51:41.640 Yeah. 00:51:41.640 --> 00:51:47.440 Unless you change something in your pre-commit config file, then it won't need to rebuild the 00:51:47.440 --> 00:51:48.560 environment for this. 00:51:48.560 --> 00:51:51.060 So if you keep the same version, then it's kind of like you said. 00:51:51.160 --> 00:51:52.840 I installed this version of the package. 00:51:52.840 --> 00:51:56.400 And as long as you don't say you need to update the package and it's kind of like a virtual 00:51:56.400 --> 00:51:56.760 environment. 00:51:56.760 --> 00:51:57.100 Okay. 00:51:57.100 --> 00:51:57.800 You already have that. 00:51:57.800 --> 00:51:58.520 There's no need to. 00:51:58.520 --> 00:51:59.100 Yeah. 00:51:59.100 --> 00:51:59.700 Yeah. 00:51:59.700 --> 00:51:59.980 Excellent. 00:51:59.980 --> 00:52:07.060 So your recipe is one, design the check function to turn it into a CLI, which there's some interesting 00:52:07.060 --> 00:52:08.260 stuff in that one as well. 00:52:08.260 --> 00:52:08.600 That's. 00:52:08.600 --> 00:52:13.040 And I think that's kind of where the --fix comment comes into play. 00:52:13.040 --> 00:52:13.200 Right. 00:52:13.200 --> 00:52:18.980 So your logic, that check function, you should be able to say this was successful. 00:52:18.980 --> 00:52:21.140 This was not successful as in stop the commit. 00:52:21.140 --> 00:52:26.660 And then the CLI provides a very easy way to plug into that. 00:52:26.660 --> 00:52:31.460 Maybe you want to say --fix or dash dash, you know, leave this type of file alone, 00:52:31.460 --> 00:52:33.720 whatever kind of modification you want to do. 00:52:33.720 --> 00:52:36.100 You can expose that in a CLI. 00:52:36.460 --> 00:52:42.840 And that's also a quicker way to get started versus trying to, let's say, read the pipe, 00:52:42.840 --> 00:52:46.160 find the pipe project.tongle, read it in, parse out things. 00:52:46.160 --> 00:52:51.300 That's all stuff that can come later once you figure out exactly how you want your tool to 00:52:51.300 --> 00:52:52.060 be configured. 00:52:52.060 --> 00:52:52.560 Yeah. 00:52:52.700 --> 00:52:57.240 Especially if it just has one or two arguments, it might not be necessary to be too, too over 00:52:57.240 --> 00:52:58.460 the top with all the configuration. 00:52:58.460 --> 00:53:00.680 And then you make it installable. 00:53:00.680 --> 00:53:05.160 Basically, like you said, make it a package and then create the pre-commit hooks. 00:53:05.160 --> 00:53:05.380 Yeah. 00:53:05.380 --> 00:53:06.400 Well, those are the steps. 00:53:06.780 --> 00:53:09.280 So I think write the function, that's pretty straightforward. 00:53:09.280 --> 00:53:12.320 You just, whatever you want it to do, you just write a function that does it. 00:53:12.320 --> 00:53:19.880 You do have an example in here about checking for valid file names and snake cased file names. 00:53:19.880 --> 00:53:25.540 So things like it can't be just one letter and it has to be snake cased and so on. 00:53:25.740 --> 00:53:25.900 Right. 00:53:25.900 --> 00:53:31.740 But then to turn that into a CLI, there's a lot of options in Python these days, right? 00:53:31.740 --> 00:53:37.180 You can click, you can type, but if you want something built in, yeah, if you want something 00:53:37.180 --> 00:53:39.620 built in, argparse is pretty straightforward, right? 00:53:39.620 --> 00:53:40.140 Yeah. 00:53:40.140 --> 00:53:45.940 And I think also, I mean, if you look at the pre-commit hooks repo provided by pre-commit org, 00:53:45.940 --> 00:53:49.100 a lot of them, or maybe all of them are just using argpars. 00:53:49.100 --> 00:53:54.300 Because for most hooks, all you'll need to say is, I have an argument parser and it accepts 00:53:54.300 --> 00:53:54.880 file names. 00:53:55.220 --> 00:53:58.180 And at that point you have this boilerplate that you can just copy and you don't even 00:53:58.180 --> 00:54:02.240 need to worry about configuring multiple, you know, different arguments. 00:54:02.240 --> 00:54:06.640 It doesn't have to be too advanced with like sub commands and all that kind of stuff necessarily. 00:54:06.640 --> 00:54:07.480 Yeah. 00:54:07.480 --> 00:54:07.660 Yeah. 00:54:07.660 --> 00:54:09.380 And then make it installable. 00:54:09.380 --> 00:54:15.280 This is, you recommend a pyproject.toml, which yeah, for packages these days, that seems 00:54:15.280 --> 00:54:17.560 pretty much the de facto standard, right? 00:54:17.560 --> 00:54:18.140 Yeah. 00:54:18.140 --> 00:54:21.520 And then what's nice is, yeah, you're using current things. 00:54:21.520 --> 00:54:23.200 You're not relying on setup.py. 00:54:23.620 --> 00:54:27.080 And also in there, there's a way to expose an entry point. 00:54:27.080 --> 00:54:28.840 And that's line 24. 00:54:28.840 --> 00:54:29.460 Yeah. 00:54:29.460 --> 00:54:29.740 Yeah. 00:54:29.740 --> 00:54:29.960 Yeah. 00:54:29.960 --> 00:54:30.020 Yeah. 00:54:30.020 --> 00:54:30.800 That's really nice. 00:54:30.800 --> 00:54:31.880 I love entry points. 00:54:31.880 --> 00:54:35.480 I think it's, I think they're massively underused in Python. 00:54:35.480 --> 00:54:40.440 You know, people talk about how do I create a script that I can give it to somebody so they 00:54:40.440 --> 00:54:41.220 can run something. 00:54:41.220 --> 00:54:44.760 And that so often involves like, where is it? 00:54:44.760 --> 00:54:46.040 Where is its associated files? 00:54:46.040 --> 00:54:47.580 Where is its Python? 00:54:47.580 --> 00:54:48.720 And where is its dependence? 00:54:48.720 --> 00:54:52.820 All of that stuff you, if you just create a package and it has an entry point, you can 00:54:52.820 --> 00:54:55.060 pipx install it or uv tool install it. 00:54:55.060 --> 00:54:58.820 Or, and now you just have all these commands and people don't have to mess with all the Python 00:54:58.820 --> 00:54:59.240 stuff. 00:54:59.240 --> 00:55:01.860 Even if you know how to do it, you don't necessarily want to do that all the time. 00:55:01.860 --> 00:55:02.080 Right? 00:55:02.080 --> 00:55:02.480 Yeah. 00:55:02.480 --> 00:55:05.520 And then it's just easy to, you can kind of call it from anywhere at that point. 00:55:05.520 --> 00:55:06.340 Yeah, exactly. 00:55:06.680 --> 00:55:12.040 So in this example, you give, you put a, a validate dash file name command and you 00:55:12.040 --> 00:55:16.060 just point to, you know, what module and then what function to call. 00:55:16.060 --> 00:55:17.500 And that's the CLI. 00:55:17.500 --> 00:55:17.660 Yeah. 00:55:17.660 --> 00:55:18.140 That's really nice. 00:55:18.140 --> 00:55:22.440 And then of course that, that function in there is built and backed with arg parse. 00:55:22.440 --> 00:55:24.640 So it all, it kind of all comes through a circle right there. 00:55:24.640 --> 00:55:24.800 Yeah. 00:55:24.800 --> 00:55:25.240 Yeah. 00:55:25.240 --> 00:55:29.840 So it's like you, it's almost like you had created, you know, some command line utility, 00:55:29.840 --> 00:55:31.380 like bash wise or something. 00:55:31.380 --> 00:55:35.520 And you just have that available and it's hooks into your, your CLI. 00:55:35.520 --> 00:55:39.920 I also want to call out on a 21 line 21, cause we talked about dependencies, right? 00:55:39.920 --> 00:55:44.640 So anything you put in there, that's automatically will get grabbed when pre-commit installs. 00:55:44.640 --> 00:55:46.040 So in this case, there's nothing. 00:55:46.040 --> 00:55:50.340 And then the case of the exit stripper I mentioned, like we need to install pillow, right? 00:55:50.340 --> 00:55:55.020 So this is how you can configure how pre-commit will grab everything. 00:55:55.020 --> 00:55:58.560 And I also see it has, yeah, I see there's a requires Python version. 00:55:58.560 --> 00:56:01.380 Does pre-commit help you get Python in any way? 00:56:01.380 --> 00:56:03.560 Or is it just assume that there's a... 00:56:03.560 --> 00:56:06.840 You need to have whatever languages you're relying on, you do need to have them installed 00:56:06.840 --> 00:56:07.140 already. 00:56:07.140 --> 00:56:07.700 Okay. 00:56:07.700 --> 00:56:12.560 So in order for you to use this pre-commit hook on your machine, you'd have to have, for 00:56:12.560 --> 00:56:16.960 example, Python 3, 10, 11, 12, something like that installed, given that it says 310 00:56:16.960 --> 00:56:17.300 or greater. 00:56:17.300 --> 00:56:21.860 So for example, like if you saw some hook that sounded interesting, but it's written in Go 00:56:21.860 --> 00:56:24.760 and you don't have Go on your computer, you have to figure that out first. 00:56:24.760 --> 00:56:25.580 That's a no-go. 00:56:25.580 --> 00:56:28.020 It's a no-go. 00:56:28.120 --> 00:56:28.320 All right. 00:56:28.320 --> 00:56:28.660 Let's see. 00:56:28.660 --> 00:56:30.540 Yeah. 00:56:30.540 --> 00:56:36.000 And then last thing to do is you say, create the pre-commit hooks.yaml file. 00:56:36.000 --> 00:56:39.820 And is this the thing that goes into your repo? 00:56:39.820 --> 00:56:42.400 So when pre-commit sees it, it knows what to do? 00:56:42.400 --> 00:56:42.800 Yeah. 00:56:42.800 --> 00:56:47.400 So for example, in the exif stripper repo, there's this file exists. 00:56:47.400 --> 00:56:51.160 So if someone uses exif stripper, they point to that repository. 00:56:51.160 --> 00:56:54.380 And then when pre-commit goes and grabs it, it looks for this file, right? 00:56:54.380 --> 00:56:58.960 And then the key things here, for one being language. 00:56:58.960 --> 00:57:02.880 So language tells pre-commit, how does it try to install that? 00:57:02.880 --> 00:57:04.580 So in this case, it says, oh, this is Python. 00:57:04.580 --> 00:57:06.020 So then it knows, okay, pip. 00:57:06.020 --> 00:57:12.160 The ID at the top, that's the name that you reference in the pre-commit config. 00:57:12.160 --> 00:57:17.340 Like when you want to, like we saw check toml, check yaml in the beginning, those correspond 00:57:17.340 --> 00:57:23.140 to entries in the pre-commit hooks yaml of that repository that they were being referenced 00:57:23.140 --> 00:57:23.440 from. 00:57:23.440 --> 00:57:28.540 So pre-commit can, so first finds this file, it can install, then it can see, oh, which 00:57:28.540 --> 00:57:29.420 hook do you want? 00:57:29.420 --> 00:57:30.880 Validate file name in this case. 00:57:30.880 --> 00:57:32.820 And then how do I call this? 00:57:32.820 --> 00:57:33.640 And that's entry. 00:57:33.640 --> 00:57:38.480 And this is pointing to the entry point that we made, but it can be anything, right? 00:57:38.520 --> 00:57:43.180 You could call rough and then add, you know, 20 different command line flags if you want. 00:57:43.180 --> 00:57:44.440 And that can be your hook. 00:57:44.440 --> 00:57:46.140 And that would be fine as well. 00:57:46.140 --> 00:57:51.500 And what's very interesting here is it's optional, but it's the types one at the bottom. 00:57:51.500 --> 00:57:55.240 So I talked before about XF stripper only running on images, right? 00:57:55.240 --> 00:57:58.620 It'd be wasteful to have it look at toml and markdown, right? 00:57:58.620 --> 00:57:59.840 If it's not going to do anything with it. 00:57:59.840 --> 00:58:01.720 Can't find any XF information in the toml. 00:58:01.720 --> 00:58:02.120 Yeah. 00:58:02.120 --> 00:58:04.380 So this controls that. 00:58:04.500 --> 00:58:08.680 So for example, this hook will only run on Python files. 00:58:08.680 --> 00:58:14.140 And this logic, I'm blanking on the name of the tool that pre-commit uses to figure this 00:58:14.140 --> 00:58:14.320 out. 00:58:14.320 --> 00:58:15.320 But this is handled elsewhere. 00:58:15.320 --> 00:58:17.260 So there's like certain names that you can use. 00:58:17.260 --> 00:58:18.180 Right. 00:58:18.180 --> 00:58:23.880 Some sort of category mapping over to these file extensions or these bombs at the beginning 00:58:23.880 --> 00:58:26.020 of the file or whatever mean that it's this thing. 00:58:26.020 --> 00:58:26.640 Exactly. 00:58:26.640 --> 00:58:31.100 There is a very dangerous thing with this and that types is an and. 00:58:31.540 --> 00:58:35.840 So if you say, if you wanted to do like this should run on Python and markdown, you can't 00:58:35.840 --> 00:58:41.360 use this because it will look for files that are both Python and markdown and will not end 00:58:41.360 --> 00:58:41.600 well. 00:58:41.600 --> 00:58:43.580 Not too many of those exist. 00:58:43.580 --> 00:58:43.800 Yeah. 00:58:43.800 --> 00:58:46.680 There's a separate types or that you have to use. 00:58:46.680 --> 00:58:47.940 That's like a little gotcha. 00:58:47.940 --> 00:58:51.900 It's like an ORM sort of instead of a SQL statement. 00:58:51.900 --> 00:58:52.820 Kind of you got to. 00:58:52.820 --> 00:58:53.640 Yeah. 00:58:53.640 --> 00:58:54.480 Those things always get weird. 00:58:54.480 --> 00:58:56.160 Like import the or operator. 00:58:56.160 --> 00:58:57.020 Like, okay. 00:58:57.020 --> 00:58:58.780 Yeah. 00:58:58.920 --> 00:58:59.100 Cool. 00:58:59.100 --> 00:58:59.340 Okay. 00:58:59.340 --> 00:59:03.940 That's actually that that is very good to know because it looks like a list of options. 00:59:03.940 --> 00:59:05.060 It is. 00:59:05.060 --> 00:59:05.240 Yeah. 00:59:05.240 --> 00:59:05.900 But they combine. 00:59:05.900 --> 00:59:08.680 So you might have something like it is a file and it's Python. 00:59:08.680 --> 00:59:10.320 That might be one thing I've seen. 00:59:10.320 --> 00:59:10.740 Right. 00:59:10.740 --> 00:59:11.800 Okay. 00:59:11.800 --> 00:59:12.720 Yeah. 00:59:12.720 --> 00:59:12.920 Cool. 00:59:12.920 --> 00:59:16.940 So if I wanted to have more than one hook, I could put it into one. 00:59:16.940 --> 00:59:18.300 I could have more than one here. 00:59:18.300 --> 00:59:18.860 Is that possible? 00:59:18.860 --> 00:59:19.280 Yeah. 00:59:19.280 --> 00:59:20.480 So this looks like a list. 00:59:20.480 --> 00:59:21.100 Yeah, exactly. 00:59:21.100 --> 00:59:22.920 It's structured as a YAML list. 00:59:22.920 --> 00:59:27.460 So you just kind of could copy that block, paste the new one, and then just change whatever 00:59:27.460 --> 00:59:28.420 fields you want. 00:59:28.420 --> 00:59:31.000 And then that's now the second hook that you expose. 00:59:31.000 --> 00:59:31.860 Right. 00:59:31.860 --> 00:59:36.280 And working backwards, I suppose you just expose a different entry point potentially and then 00:59:36.280 --> 00:59:38.000 just call it out or whatever you want. 00:59:38.000 --> 00:59:41.440 Well, I mean, you could like maybe you have a validate file name and maybe you have another 00:59:41.440 --> 00:59:44.820 one that's like validate long file names or something where you're like, now they have 00:59:44.820 --> 00:59:45.580 to be this long. 00:59:45.580 --> 00:59:47.720 And then it's just a shortcut for something else. 00:59:47.720 --> 00:59:49.100 So it doesn't have to be a different thing. 00:59:49.380 --> 00:59:49.860 Oh, yeah. 00:59:49.860 --> 00:59:52.860 You just put an argument in there as a default kind of for people. 00:59:52.860 --> 00:59:57.000 So we talked about args earlier and that was something the user could tweak. 00:59:57.000 --> 01:00:01.220 Anything you put in here is essentially like it will always run with these. 01:00:01.220 --> 01:00:04.220 So you could bake in certain things that have to happen. 01:00:04.220 --> 01:00:04.700 Yeah. 01:00:04.700 --> 01:00:05.160 Awesome. 01:00:05.160 --> 01:00:06.140 I love it. 01:00:06.140 --> 01:00:06.380 Okay. 01:00:06.380 --> 01:00:12.220 We're pretty much out of time, but let's talk about one final thing. 01:00:12.220 --> 01:00:13.300 Not this one. 01:00:13.300 --> 01:00:15.300 Your Datamorph project. 01:00:15.300 --> 01:00:19.060 Give a quick shout out to that before we wrap things up. 01:00:19.100 --> 01:00:19.420 What do you think? 01:00:19.420 --> 01:00:19.960 Sure. 01:00:19.960 --> 01:00:26.460 So this project started related to the pandas workshop I had mentioned. 01:00:26.460 --> 01:00:31.980 I wanted to have a visual to really drive home the point that we needed to visualize our 01:00:31.980 --> 01:00:35.700 data because pandas very much data wrangling. 01:00:35.700 --> 01:00:40.320 And after talking to people two hours about data wrangling and statistics, you can calculate 01:00:40.320 --> 01:00:41.200 on tabular data. 01:00:41.200 --> 01:00:44.060 Some people just feel like, oh, okay, we're done. 01:00:44.060 --> 01:00:45.620 I mean, you know, we're done. 01:00:45.620 --> 01:00:47.600 And that's definitely not the case. 01:00:47.600 --> 01:00:52.640 And I was thinking about, and you had it on the screen before, but the data source doesn't. 01:00:52.640 --> 01:00:53.320 So yeah. 01:00:53.320 --> 01:01:01.320 So there was research in 2017 by Autodesk where they took the idea of Anscombe's Quartet, which 01:01:01.320 --> 01:01:07.960 is, sorry, just a little bit above that, which is just a set of four, yeah, four data sets. 01:01:08.360 --> 01:01:10.560 They share the same summary statistics. 01:01:10.560 --> 01:01:16.000 So the mean in X and Y, the standard deviation in X and Y, and the Pearson correlation coefficient. 01:01:16.000 --> 01:01:17.720 And they look very different. 01:01:17.720 --> 01:01:24.680 And if you think of, naively, you think, well, I know the average and maybe how spread out 01:01:24.680 --> 01:01:25.140 things are. 01:01:25.300 --> 01:01:28.760 So I can kind of get a sense of what this data probably means. 01:01:28.760 --> 01:01:33.820 But in reality, outliers and other weird things could just completely blow up those ideas, 01:01:33.820 --> 01:01:34.100 right? 01:01:34.100 --> 01:01:34.520 Yeah. 01:01:34.520 --> 01:01:40.820 And so in 2017, they had developed this algorithm using simulated annealing. 01:01:40.820 --> 01:01:47.100 So if you scroll down once more, where they take the dinosaur at the top and they use 01:01:47.100 --> 01:01:48.480 simulated annealing to push the points. 01:01:48.480 --> 01:01:50.600 Let me describe this really quick for just people listening. 01:01:50.600 --> 01:01:56.920 So there's a matplotlib looking graph of some data points, and it has a certain standard 01:01:56.920 --> 01:01:58.680 deviation, certain mean, et cetera. 01:01:58.680 --> 01:02:02.680 But if you actually look at it, it looks like a T-Rex, right? 01:02:02.680 --> 01:02:03.320 Something like this? 01:02:03.320 --> 01:02:03.760 Yes. 01:02:03.760 --> 01:02:05.820 Is that a decent enough description? 01:02:05.820 --> 01:02:07.000 That's a perfect description. 01:02:07.000 --> 01:02:07.540 Yeah. 01:02:07.540 --> 01:02:12.860 So what the researchers have done is they use this simulated annealing algorithm to push 01:02:12.860 --> 01:02:13.960 the points around. 01:02:13.960 --> 01:02:18.420 So starting from that dinosaur and just moving the points ever so slightly in such a way where 01:02:18.420 --> 01:02:23.440 the summary statistics are unchanged, at least to the two decimal places where they're currently 01:02:23.440 --> 01:02:26.080 shown, and tried to make other shapes. 01:02:26.080 --> 01:02:32.240 So some of the other shapes they have are a bullseye, a circle, lines slanted vertically 01:02:32.240 --> 01:02:33.480 or a star. 01:02:33.480 --> 01:02:38.940 And all of these can be formed from that dinosaur, some to varying degrees of success. 01:02:38.940 --> 01:02:45.100 But they're visually recognizable, which is the point that is pretty important here, right? 01:02:45.100 --> 01:02:48.960 So you cannot, as we said, rely on those summary statistics because you don't know. 01:02:48.960 --> 01:02:49.600 Is it the star? 01:02:49.600 --> 01:02:50.320 Is it the dinosaur? 01:02:50.320 --> 01:02:51.300 Is it a line? 01:02:51.300 --> 01:02:52.240 It could be anything. 01:02:52.240 --> 01:02:56.760 And they also had animation that they included. 01:02:56.760 --> 01:03:00.660 So basically, you could start from the dinosaur and then turn it into a circle. 01:03:00.660 --> 01:03:06.300 And that's even more impractical because you realize at that point that it's not just the 01:03:06.300 --> 01:03:10.280 dinosaur and the circle that have something in common, but it's the infinite number of 01:03:10.280 --> 01:03:14.020 points arrangements that you can make between them that actually share that. 01:03:14.020 --> 01:03:20.900 And so I wanted to explore if I could extend that to working for arbitrary data sets and also 01:03:20.900 --> 01:03:21.680 different shapes. 01:03:21.680 --> 01:03:27.140 So I found the research code and spent quite a bit hacking at it and even just trying to 01:03:27.140 --> 01:03:29.800 get it to work for their example. 01:03:29.800 --> 01:03:30.980 And that took quite a bit of time. 01:03:30.980 --> 01:03:35.860 And then I had this idea of being that it was for a pandas workshop to take a panda and 01:03:35.860 --> 01:03:36.300 turn it. 01:03:36.300 --> 01:03:38.160 Initially, I wanted to turn it into the dinosaur. 01:03:38.160 --> 01:03:44.100 I still have not found a good way to do that yet, but I also haven't been trying at all this 01:03:44.100 --> 01:03:45.520 year on that, to be honest. 01:03:45.520 --> 01:03:51.700 But I figured out how to, and by adding a lot of other things that didn't exist in the initial 01:03:51.700 --> 01:03:56.540 algorithm, things like calculating bounds of the data and different metrics that I figured 01:03:56.540 --> 01:03:59.180 out a way to get it to work regardless. 01:03:59.180 --> 01:04:04.700 So I can give it a panda data set or a soccer ball and it can perform these transformations 01:04:04.700 --> 01:04:06.300 and move the points around. 01:04:06.300 --> 01:04:11.600 So on the screen, we have the first time I shared this publicly, what I had been working on, 01:04:11.600 --> 01:04:12.780 it happened to be Easter. 01:04:12.780 --> 01:04:17.080 So I made a bunny holding an Easter egg with the words, happy Easter off the side. 01:04:17.080 --> 01:04:22.460 And it turns into two vertical lines all while preserving the summary statistics. 01:04:22.880 --> 01:04:28.980 This is something I think makes it for a very good teaching tool in say like an introductory 01:04:28.980 --> 01:04:32.820 statistics course to encourage people that they need to visualize. 01:04:32.820 --> 01:04:38.600 There's an interesting study, I think called the hypothesis is a liability. 01:04:38.600 --> 01:04:44.960 And they talked about taking students in a statistical analysis course and they split them into two. 01:04:44.960 --> 01:04:49.800 And one set of students were just given the data set and say, here, explore, see what you find. 01:04:49.800 --> 01:04:53.360 And then the other set were given a set of hypotheses to test. 01:04:53.360 --> 01:04:56.320 And it turns out that the data is shaped like a gorilla. 01:04:56.320 --> 01:05:02.820 And the students who were told here, test these hypotheses were five times less likely to even 01:05:02.820 --> 01:05:05.700 realize that it was shaped like a gorilla because they never plotted it. 01:05:05.700 --> 01:05:06.040 Yeah. 01:05:06.040 --> 01:05:10.760 This is such a huge thing to like get people learning this early. 01:05:10.760 --> 01:05:14.120 And the more shocking these visuals are, the better. 01:05:14.800 --> 01:05:14.920 Yeah. 01:05:14.920 --> 01:05:17.320 And I think these are super shocking, right? 01:05:17.320 --> 01:05:22.300 Having T-Rexes and bunnies and go, you know, that bunny is, you know, equivalent. 01:05:22.300 --> 01:05:28.040 And there's a continuous transformation from bunny to blob of dots with one outside dot, right? 01:05:28.040 --> 01:05:30.440 That kind of stuff kind of surprise you, I think. 01:05:30.440 --> 01:05:37.000 And one thing I see, especially when the dinosaur came out, but even when I posted some of my first 01:05:37.000 --> 01:05:41.680 examples is you see people comment right away, wow, that there's something that's so cool that 01:05:41.680 --> 01:05:44.120 that dinosaur is possible to do that with. 01:05:44.120 --> 01:05:44.780 Like, no, no, no. 01:05:44.780 --> 01:05:47.260 It's not, it's not just the dinosaur or just the panda. 01:05:47.260 --> 01:05:48.240 It's really like anything. 01:05:48.240 --> 01:05:53.260 And so the way this also works is that people can use their own data sets or they can add 01:05:53.260 --> 01:05:53.820 something new. 01:05:53.920 --> 01:05:59.420 And that's what I've had, that's what's what I've done this year in the two previous development 01:05:59.420 --> 01:06:07.060 sprints that I had people just been, I did one in EuroPython and one in PyCon Taiwan earlier 01:06:07.060 --> 01:06:07.540 this year. 01:06:07.540 --> 01:06:11.100 And hopefully in Australia, we'll do some more. 01:06:11.100 --> 01:06:14.880 But I had people add, for example, a target shape. 01:06:15.000 --> 01:06:21.040 So what the, for example, the panda would turn into, we have a club, like the card suit, 01:06:21.040 --> 01:06:24.080 which was quite a challenge, and the spade. 01:06:24.080 --> 01:06:25.640 And I had already had the heart. 01:06:25.640 --> 01:06:30.420 The heart is actually a trigonometric equation, which, you know, blew my mind at first. 01:06:30.420 --> 01:06:35.900 There's actually a page I found on, I think, Wolfram Alpha, which was like, I want to say 01:06:35.900 --> 01:06:40.520 like 10 or 15 different equations, trigonometric equations for different types of hearts. 01:06:40.520 --> 01:06:42.980 And you can pick the exact type of heart you wanted. 01:06:43.760 --> 01:06:45.840 Social media heart, the emoji heart, what are we talking about? 01:06:45.840 --> 01:06:48.360 No, no, it was just like, this is longer, this is more curved. 01:06:48.360 --> 01:06:49.780 Yeah, yeah, yeah, that's awesome. 01:06:49.780 --> 01:06:53.200 But these are all now math problems when you think about that side of it. 01:06:53.200 --> 01:06:57.980 So this could then be used maybe in a course where they want to focus on math, but also 01:06:57.980 --> 01:06:58.700 some more coding. 01:06:58.700 --> 01:07:02.340 So there's lots of different use cases, like just giving it the data. 01:07:02.340 --> 01:07:04.800 And that's very much more just pure statistics. 01:07:04.800 --> 01:07:09.360 But, you know, I think, and I've heard from a few teachers that, from what I presented, 01:07:09.360 --> 01:07:13.180 that they're, it sounds like this would be something that they would like to use. 01:07:13.260 --> 01:07:14.480 So hopefully that does happen. 01:07:14.480 --> 01:07:16.840 If not, it's a fun thing to put in my slides. 01:07:16.840 --> 01:07:18.700 And I did enjoy getting it to work. 01:07:18.700 --> 01:07:23.780 Yeah, I didn't pull up any good videos for the YouTube video, but there's some really nice 01:07:23.780 --> 01:07:27.140 animations of actually seeing it go from one to the other that you got. 01:07:27.140 --> 01:07:32.480 And this is, you're doing a talk at PyCon Australia, and then you're doing a sprint on 01:07:32.480 --> 01:07:33.620 this as well, right? 01:07:33.620 --> 01:07:36.440 Coming up in November 22nd, about a month from now. 01:07:36.440 --> 01:07:36.820 Correct. 01:07:37.040 --> 01:07:37.440 So cool. 01:07:37.440 --> 01:07:41.900 People can check that out if they happen to be at PyCon Australia and want to... 01:07:41.900 --> 01:07:45.580 Well, I'll also be talking about it in San Francisco next week. 01:07:45.580 --> 01:07:48.240 There won't be a sprint, but I will be talking about that. 01:07:48.240 --> 01:07:48.900 So people can... 01:07:48.900 --> 01:07:49.080 Okay. 01:07:49.080 --> 01:07:49.920 It's not a PyCon. 01:07:49.920 --> 01:07:50.720 Sure. 01:07:50.720 --> 01:07:51.520 It's still cool. 01:07:51.520 --> 01:07:52.440 All right. 01:07:52.520 --> 01:07:54.320 Well, Stephanie, thank you so much for being here. 01:07:54.320 --> 01:07:56.260 Let's wrap things up. 01:07:56.260 --> 01:08:00.920 But I guess, you know, give us a final call to action for people maybe interested in pre-commit 01:08:00.920 --> 01:08:02.180 hoax or other stuff that you're doing. 01:08:02.180 --> 01:08:06.520 Yeah, you can find everything that we mentioned here and the projects on my website. 01:08:06.520 --> 01:08:12.440 I'm putting much more effort into putting stuff on there this year now that I've rebuilt it. 01:08:12.440 --> 01:08:15.680 So definitely check there and sign up for my newsletter. 01:08:15.680 --> 01:08:16.980 Follow me on socials. 01:08:17.000 --> 01:08:19.180 There's no links down here, but you can find them. 01:08:19.180 --> 01:08:21.540 There'll be links on the episode page. 01:08:21.540 --> 01:08:22.780 So we'll put them there. 01:08:22.780 --> 01:08:23.280 All right. 01:08:23.280 --> 01:08:24.020 Well, thanks. 01:08:24.020 --> 01:08:24.900 Thanks for being here. 01:08:24.900 --> 01:08:25.760 It's great to talk to you. 01:08:25.760 --> 01:08:26.580 Thanks for coming on and sharing. 01:08:26.580 --> 01:08:27.540 Thanks for having me. 01:08:27.540 --> 01:08:27.920 Yeah. 01:08:27.920 --> 01:08:28.220 Bye-bye. 01:08:28.220 --> 01:08:32.400 This has been another episode of Talk Python To Me. 01:08:32.400 --> 01:08:34.220 Thank you to our sponsors. 01:08:34.220 --> 01:08:35.820 Be sure to check out what they're offering. 01:08:35.820 --> 01:08:37.240 It really helps support the show. 01:08:37.240 --> 01:08:39.380 Take some stress out of your life. 01:08:39.380 --> 01:08:45.180 Get notified immediately about errors and performance issues in your web or mobile applications with Sentry. 01:08:45.660 --> 01:08:50.160 Just visit talkpython.fm/sentry and get started for free. 01:08:50.160 --> 01:08:53.740 And be sure to use the promo code talkpython, all one word. 01:08:53.740 --> 01:08:56.360 This episode is brought to you by Bluehost. 01:08:56.360 --> 01:08:58.080 Do you need a website fast? 01:08:58.080 --> 01:08:58.980 Get Bluehost. 01:08:58.980 --> 01:09:04.340 Their AI builds your WordPress site in minutes and their built-in tools optimize your growth. 01:09:04.340 --> 01:09:05.300 Don't wait. 01:09:05.300 --> 01:09:08.900 Visit talkpython.fm/bluehost to get started. 01:09:08.900 --> 01:09:10.400 Want to level up your Python? 01:09:10.400 --> 01:09:14.440 We have one of the largest catalogs of Python video courses over at Talk Python. 01:09:14.440 --> 01:09:19.620 Our content ranges from true beginners to deeply advanced topics like memory and async. 01:09:19.620 --> 01:09:22.300 And best of all, there's not a subscription in sight. 01:09:22.300 --> 01:09:25.200 Check it out for yourself at training.talkpython.fm. 01:09:25.200 --> 01:09:27.300 Be sure to subscribe to the show. 01:09:27.300 --> 01:09:30.080 Open your favorite podcast app and search for Python. 01:09:30.080 --> 01:09:31.400 We should be right at the top. 01:09:31.400 --> 01:09:36.560 You can also find the iTunes feed at /itunes, the Google Play feed at /play, 01:09:36.720 --> 01:09:40.740 and the direct RSS feed at /rss on talkpython.fm. 01:09:40.740 --> 01:09:43.720 We're live streaming most of our recordings these days. 01:09:43.720 --> 01:09:47.120 If you want to be part of the show and have your comments featured on the air, 01:09:47.120 --> 01:09:51.560 be sure to subscribe to our YouTube channel at talkpython.fm/youtube. 01:09:51.560 --> 01:09:53.600 This is your host, Michael Kennedy. 01:09:53.600 --> 01:09:54.900 Thanks so much for listening. 01:09:55.040 --> 01:09:56.060 I really appreciate it. 01:09:56.060 --> 01:09:57.980 Now get out there and write some Python code. 01:09:57.980 --> 01:10:19.160 I'll see you next time.