I've been writing Python code (among other languages) at Google for many years, and I *definitely* write better code since I started working there. Here are some ways that the way I write code changed: * No code is temporary. Assume anything you check in will be used in production, because someone might decide to do so. They won't ask you. * Code isn't useful if you don't have proper unit tests. You literally can't check it in without tests. * Clear is better than fast. Clear is better than brief. Clever code and implicit assumptions are the first place you look to find the bugs. * Your tests are your spec. Anyone is free to change your code; the only way to ensure they know what it is *supposed* to do is to test it properly. * A given commit can EITHER refactor (in which case your tests do not change) OR change functionality (in which case your tests demonstrate the change in expectations). Never do both at the same time. You eliminate most mistakes this way. It's not in the public style guide (because it conflicts with pep8), but 2 spaces instead of 4 is used across all languages. It sounds dumb but once you go 2-space you will never be able to go back. Cannot unsee. 4-space feels like the old 8-space standard. Cannot unsee... Cannot go back. My eyes are ruined forever. 2-space it must be forever...
Thank you. All those bullet points are amazing. BTW, as soon as I left Google, I got back to bigger indentation space (like 4 spaces). I prefer to reduce the nested logic rather than letting me add more complicated nested code with 2 space indentation... Unless I write in lisp.
The only thing where I disagree is that you rather do not put assertions in your code but in your unit tests. Let me point out that there is a different kind of assertions that does not belong into a unit test, e.g. assertions about intermediate results and loop invariants. Moreover, assertions in your code also function as an executable documentation what property is expected to hold at this location in you program. This can give you further insights about how the code works. Since this "documentation" is executed, it typically does not rot as pure text does in comments. Personally, I use assertions a lot and combine it with advanced testing techniques that use randomization. I was already often surprised that assertions failed in my code I never thought possible, thus finding wrong assumptions and design flaws.
Those are good points, thanks for the in depth explanation. I’ll definitely experiment more with using asserts like this and it might actually be a good topic to talk about in a future video.
assertions are best only if they put separately in the code, because if someone runs the code with -O flag, then this can lead to disabling the assertion execution and can produce unexpected results.
@@shera2667 Generally, I disagree with that assertions should be separated from the other code. You can see assertions as an executable kind of documentation which are embedded into your code. If you only care about functional correctness, and you have assertions without side-effects, you should be fine. However, if the run time makes a difference, assertions may be problematic, here I agree.
Just keep in mind that the order of "for" loops and their variables stays the same when going from loops in a comprehension to 'normal' nested loops and vice versa. So "[bla for x in range(5) for y in range(5)]" translates to "for x in range(5): for y in range(5): bla". Since I had learned this easy rule of thumb from someone on stackoverflow, I don't find these list comprehensions hard to read at all. At least I'd doubt the traditional loop code looked so much better.
Yes and no. Working in teams means not everyone is used to it, but everyone will be used to classic for loops. And when using code formatters, the classic fors will be indented and looking very ugly, so there is a solid chance they get refactored. Just to keep the example, you could write seperate from itertools import it ... x_y = it.product(range(5), range(5)) for x, y in x_y: ... # or put the for back, but now more easily to grasp into the list comprehension Some might not be used to this a bit more functional approach and just put a part of the for looping into an own function call (like a def x_y_pairs(): return ...), but this will help the reader more (and might be even the better way longterm anyway).
I have been trying to avoid "for" loops unless there are side effects within the iterations. Not only do I avoid "for" loops, but I also avoid mutation. The nested list example in the style guide does both. For nesting, I sometimes use this pattern: items = [(x,y,z) for x in range(5) for y in range(6) for z in range(7)] results = [do_something_with(x,y,z) for x,y,z in items] Although in the above case, where the nesting is between independent iterables, I would agree that itertools.product would be better. And I use this pattern all the time, because I don't know a better way to "flatten" lists in Python: flat_list = [x for sublist in list_of_lists for x in sublist] When I need nesting between iterables that have some dependency, I might combine the two approaches: nested_items = [comma_separated_string.split(",") for comma_separated_string in my_strings] flat_items = [x for sublist in nested_items for x in sublist] results = [do_something_with(x) for x in flat_items]
Very helpful! Just half an hour ago, I was code-reviewing a function in a long-running service where we default to current time. However, the current time is being loaded when the module is loaded, so it falls foul of the default argument issues like the time.time() example. Just changed it to a Callable, so that the correct time is created from the Callable up each occasion the default value is needed
An alternative way is to have the argument default to None. The function checks for this value, and replaces it with the current time (or other dynamically-computed value, as appropriate).
Thanks for all these tips! As an extra tip, I would suggest to write short and clear functions that can be used for creating bigger functions. Basically this way, you won't be repeating any code you write and each of your functions will have not much code, so they will be easy to read.
I think the guide is very good for people who build a library or at least create code that other people use within their code. IN our company we have code that runs on our customer's servers, so the program must never crash and must always give some response our customer can send us. This means that we have certain places where we actually have to just catch every Exception that might come, especially those we don't expect.
I would include an exception traceback as part of “some response our customer can send us”. Your subsystem writes logfiles, doesn’t it? Any exception crash should end up in there; tell them to send those.
@arjoncodes can’t stop watching your content. Truly amazing stuff. If you ever get chance, could you make videos on api development and good practices around that topic please - thanks
Hi Arjan, great video, I have a question regarding 10:26, you are saying that a setter here is more appropriate than a @property.setter and your reasoning is complexity, My question is, why can't you have more complex logics inside of property? Why do you consider it bad practice? To me, as it seems, you are writing less code because you don't need to write a getter for that property. Many thanks, Dan
I’m of two minds regarding list comprehensions. On the one hand, reading code that uses functions like map and filter for operations on collections can be easy to follow, if you’re stringing together operations, it starts looking ugly. (One solution, perhaps, is to use packages like toolz that provide functions for piping, currying, and composing functions-but how pythonic is that?) On the other hand-and I’ll admit that my affinity for functional programming is really showing here-I don’t see that even a moderately complicated list comprehension is that confusing *if formatted in a way that’s easy to understand.* After all, list comprehension syntax resembles set notation; placing each for expression on its own line clarifies what’s going on quite a bit. I think that approach may make sense within a shop whose culture encourages thinking that way. In either case, there comes a point where readability is compromised. In those cases I find it useful simply to write a finction that returns a generator for the collection (using yield): sure, it’s not as declarative (and I really prefer-non-pythonically?-declarative over imperative code), but it might be easier for others to maintain.
11:44 The situation you’re describing -- accessing an undeclared variable declared in an outer scope, followed by an assignment to it -- is a Python language error. So it cannot lead to “unexpected behaviour”. Python requires you to declare the variable as either “global” or “nonlocal” before it permits this sort of thing.
2:04 I don’t have a problem with from-imports. If you search for “my_function”, eventually you will have to encounter either an actual definition or an import. That’s what your editor’s search function is for.
I've been writing Python at Google for a long time, and I gotta say this rule is my FAVORITE one. Yes, you're never more than one click away from the definition of anything, but the owning module is important information about what you're looking at -- it's effectively part of the name of the function or class, it gives context. Especially when you're reading code that someone else wrote, it is always more clear.
@@lawrencedoliveiro9104 do whatever you want, it's your code. But if you're working on a team, especially when a product like UA-cam depends on your code being perfect, "you'll be fine" doesn't cut it.
@@lawrencedoliveiro9104 nothing is ever perfect, but we all try. Like I mentioned in another thread, I haven't had a single bug of my own make it to production in over six years, which I think is a pretty damn good accomplishment. I don't personally work on YT so I can't comment on how their rules affect things, but I can definitely say that my own code wouldn't be a 10th as solid as it is if it wasn't for the rules I had to learn to follow. Every last one of them was inconvenient, but looking back I can definitely see the wisdom. And today I wouldn't argue with any of the restrictions that my code has to follow.
Doesn't the 'default argument value' support conflict with the idea that 'explicit is better than implicit' as well? I must admit that the default argument value feature does seem handy, esp when adding some new feature into a function (select a default that retains the previous behaviour), but I also know that adding more and more arguments to a function leads to code that's hard to reason about and maintain.
In theory, but if you're always supplying the same constants every time you call a function, then you're not being explicit, you're being verbose. Overriding a default should signal that you're overriding "normal" expectations.
Explicit is better than implicit, but practicality beats purity. If mostly the use of the function is with one value of the argument and only seldom with another, or if you don't want to make users bog down on what to pass to that when they start using the func, then default values make a lot of sense
I'm wondering... It's it possible to go: from my_package import my_module.my_function? Because if you can do that, then you you still get the origin of the function without importing the entire module.
There is one advice that has stuck with me as a software engineer: write code to be read. Yeah, it is a few more keystrokes, but you will save other people time trying to figure out where something came from
"Having readable code is more important than having efficient code". I definitely don't agree with this for my embedded world. Efficient code is often required. However, that list comprehension is horrible, I agree, but could easily be made readable with a comment. One of the few good uses of a comment, to explain something not readable, that has been done for performance.
Awesome video...helpful to understand those magical things happening in our code..this will help in reviewing our code and maintaining our code to be more readable .
@@ArjanCodes Was a bit disappointed not to see any comment around try-else while discussing exceptions. Per Guido/PEP the entire purpose is to enable minimizing code within the Try statement. It's just a little something extra that helps the programmer cope with letting go of a bad behavior :D
Great video! Would be cool to see some more "Insights" into the python interpreter because a lot of silly mistakes like the default variables and variable nesting are due to the nature of Python being a dynamic interpreted language.
Properties should follow this in a SW dev mode, but I feel that when you cross the physical boundaries in FW then properties should be allowed a bigger action task. e.g. having an I2C device registers in human readable form. I prefer the simplicity of the property, that calls get_register / set_register / rmw_register to do the work. So that device.some_device_property will create work on the I2C bus, but is generally a small amount of work on the remote device. But not doing things like device.running = False to stop it from running. Those should be verb functions e.g. device.start/stop
Great video, it would be nice to hear your thoughts on Google's doc block style guide, I've noticed in your videos you use the more simple one-line explanation docblock (which may be for keeping the lesson easier to understand). I come from a PHP / Java background so when picking which doc block style guide to use, Google's felt familiar to me compared to phpdoc (summary, params/args, returns, throws/raises). An interviewer raised a question to me that doc blocks may be redundant if you use strict types, type hinting and DAMP naming styles. I get his point but it's hard to undo years of habit-formed documentation because it just doesn't look right anymore (also can't automatically generate an api doc or provide extra info in IDE's when hovering methods etc). Be good to know where you stand on this :)
Googles docstring format pre-dates type checking, but it's worth noting that the most important part about documenting your parameters is that's the only place you get to describe what they actually DO, not just the data type. That said, Google's doc format for Go doesn't document args explicitly; instead you write it out as prose if needed. Honestly, I think it was the lack of types in python that originally made the doc format what it is, and now since we have ide integration for it, it's just handy.
I have a doubt with the imports, if it is true that sometimes it is valid to know where a class or function comes from, there are times when it seems obvious to me, for example, I will not import "from django.core import exceptions" then use "except exceptions.ObjectDoesNotExist"... or, "from modules.mymodule import serializers" to use "serializer_class = serializers.MySerializer" in my ViewSet. Sometimes I find it unnecessary to do this.
Google teaches R in their Google Data Analytics Specialization and talk about Using Python. Python is the most preferred language and they taught entire data analysis and viz in R.
As for data science/ml pipelines, if, within a foo.py file, there are multiple functions that are called from a main one (called “run” and sitting at the bottom of the script) and the latter is called from main.py: should one define a class within foo.py, containing all functions, and import the class from it, thus calling Foo().run() or simply leave functions separately and call foo.run() (as suggested in the video)? Thank you so much
The thing about mutable default values in functions is actually useful for memoization. The parameter's name should indicate its purpose (eg "memo"). It can start off empty or contain a base case or two. Thoughts?
Interesting idea. I've assigned attributes to functions before just for that, where on the first read you have to catch the AttributeError and assign the initial value. Using the default value would make that a lot easier - if there isn't a catch to it that I don't see right now. 😬
I think i'd rather use a class for that, it seems more explicit to me and the memo table could be either instance or class -wide giving some flexibility, using the default argument in function (at least to me) looks a bit hacky and confusing (is it really intentional or complete accident altho this can be mitigated by appropriate naming) also it relies on users not passing anything into the memo argument which seems somewhat counterintuitive, you shouldn't be expected not to pass anything into an argument in my opinion
@@funkmedaddy The user could pass an already filled in memo table which would replace the default empty one. Passing anything else and the user gets what they deserve: undefined behaviour. A potential problem is that the memo table is unreachable. But when would you need to access it? Anyway, a class would solve that.
Python readability admin at Google here. I did the recent re-wording of the "mutable global state" section. 1. It is cleaner to use a decorator to wrap up a function with memoization, rather than putting state into mutable default values. 2. This is still problematic, since the cache can keep heap objects alive you wouldn't want to lifetime-extend like that. In particular, memoization-caches ideally should be scoped. In some situations, one can fix this with weak references, but their precise semantics is a bit tricky. Personally, I'm not a big fan of "memoization with global cache". 3. functools.cached_property is a good thing.
Great video! I have a couple questions @ArjanCodes Do you have any recommendations for doing engineering calculations and keep units consistent (i.e. units kg, km, N, J, etc…). Especially for dealing with orbital mechanics? Also (this ones a bit more complicated) - how would you recommend setting up a function that can rearrange itself and solve itself based on the given inputs? For example (very simple example), you have a def calc_rectangle(area=None, width=None, length=None). You can pass any two parameters to the function and it would solve for the one unknown. You can put this into a conditional statement, but then you re-writing the same equation multiple times, to solve for each case. Is there a method that that can quickly solve it? Here is a more realistic use case: To calculate an orbits radius perigee (rp) there are several methods: rp = a * (1 - e) rp = 2 * a - ra rp = ra * (1-e) / (1+e) rp = (P^(2/3) mu^(1/3))/(2*pi)^(2/3) * (1 - e) etc... How would you create a function to solve for all these conditions without writing for every input case?
I've seen two different solutions to unit handling: 1) use a package like `units`, `pint`, `unyt`, or `unum`. This solution is more robust in that you end up dealing with compound objects (with both a magnitude and a unit), and thus you can do things like ensure incompatible dimensions cannot be added together (e.g., meters plus seconds). The drawbacks are that the code often ends up rather verbose, and your calculated values are now no longer primitive types, which can make it harder to pass results to other functions or libraries. 2) choose a standard set of units across your project, e.g., meters, seconds, kg, and define these as variables equal to 1.0. Then you can define other units like cm, inch, g, relative to the standards. Do this in a units.py file so that you don't pollute the namespace. Then every dimensionful quantity you define (or read in from a file, e.g.) MUST be multiplied by a unit (or use the default unit), e.g.: # units.py m = 1.0 kg = 1.0 s = 1.0 cm = 1e-2 * m g = 1e-3 * kg inch = 2.54 * cm cm3 = cm * cm * cm N = kg * m / (s*s) # main.py from . import units x = 5.4 * units.m m = 1.0 * units.g An advantage here is that you can change what units are printed simply by dividing: >>> print(f"x = {x/units.m} m or {x/units.cm} cm or {x/units.km} km or {x/units.inch} inches") Or find ratios between units: >>> units.ly / units.km # 9.461e12 km in a lightyear This solution is a lot more flexible but a little more dangerous as it's up to the user to apply the units on definition/import and make sure incompatible dimensions are not being added. But you end up with regular floating point types that only ever get multiplied/divided, saving some cognitive overhead and interoperability concerns. Another caveat: it also won't work for unit conversions that are not just scalar multiples, e.g., temperature between C and F. In the end, this solution is probably useful for small-scale projects, but for larger projects, one of the aforementioned libraries may be better.
About this thing with the imports. Most Python modules (e.g. NumPy, Pandas, TensorFlow) are expecting tons of arguments for function calls, so you can't fit everything into one line which I really don't like. And usually people import the module with a shortage name alias, so there's basically not a better readability anyways by explicitly putting the shortened module name. I'm really undecided which way is really better. For very similar APIs like TensorFow vs NumPy, I'd say it can be very confusing to mix them up for sure, so I'd add the module name in that case. But for my own code or standard Python modules, I'm usually importing explicit functions because pressing F12 is feasible 😅
Definitely a good point. I think what also plays a role is the balance between keeping things consistent and doing things differently as an individual developer. The Google style guide is intended for Google's Python libraries and I'm sure quite a few different developers work on those libraries. So then I can definitely understand that they lean toward being more strict in their style guide (even if in some cases it's overkill), to ensure consistency and avoid discussion.
Very good point! I'm usually working in small projects, so it's fine for me when only I'm working on the code. But enforcing style-guides by linters is generally a good idea for co-working. It helps you keep the code clean 😂 IMO those tools should lint metrics that are more relevant to achieving a good software design and readability. Putting docstrings, having imports in the wrong order etc. is less significant than functions with too many params, classes with bad cohesion, high cyclomatic complexity, ... SonarLint usually helps me with that 😄
For long function/method calls, put the arguments one to a line, and give the whole expression a two-dimensional layout, e .g.: aqsis_output = ReasonablePerms.subprocess.check_output \ ( args = ["aqsis"] + extra + [self._filename], stdin = subprocess.DEVNULL, stderr = subprocess.STDOUT, universal_newlines = True, cwd = self._parent._workdir, timeout = self._parent.timeout )
from package import module vs from package.module import function as module_function On the right example, wouldn't it be better? Since you probably will not need everything from module with hundreds of function, but just a few that can be easily imported individually and renamed like shown. I'm just curious about that right now.
I'd use the first one since it's more clear in my opinion. Neither of these pollute the global scope and everything gets imported (you just can't use it in the second case)
was thinking the same - you can find the origin of the imported function very easily if you wanted to, and as another commenter pointed out things get muddied when you use aliases anyways. importing functions explicitly from a module makes more sense to me.
As someone who works in a large codebase, we have standard aliases for modules. I think that makes a huge different, because no matter the file I am reading code from, if I see a certain alias, I know what module it is. In my opinion, aliases are problematic when there are not consistent; and that’s what code reviews are for
the list default arguments have been useful to me when writting recursive code though. or was i doing something very silly? doesn't happen often, but tree traversal is one case.
I'm not entirely sure how well Python does tree shaking. But in addition to following this import guide, I would suggest to keep your modules relatively small. So if memory usage increases, I don't expect that to be a problem if you keep your modules at a manageable size.
Some time ago I watched one of your videos regarding DataClasses. As an example, you were developing a small financial application with Deposits, Withdrawals, etc. I am trying to find that video without any success. Can you help ?
I'm guilty of importing my functions instead of their modules. I thought it was fine until it came back at my face and I had to refactor old code. It was a mess, but hey that's how we learn.
About the minimizing the usage of try/except clauses when you don't care about the error-- If you plan for another part of your code to deal with an error, I'm of the opinion that explicitly reraising the exception (with `except: raise`) is better than relying on the built-in behavior for Python to reraise an exception. This way, you are clearly acknowledging that a part of your code can raise an error and that it is handled elsewhere. If I'm looking at some code that calls some function which can raise an error and isn't inside a try/except block, I have no way of knowing if that error is caught elsewhere in the code or is just assumed to not raise an error.
7:41 Sorry but this is the first time ever that I see someone calling a function oops! 😃 True: foo is pretty common tho! I usually call my test functions "test" :| ... I know 💤
5:00 im gonna interrupt you there - understandable code is NOT more important than efficient code. Take for example fast inverse square root - nobody understands this without an explanantion, but it somehow revolutionized computer graphics in video games. It is indeed way more important than short code! But there is a big difference between short and efficient :D
There are some things tho that make code much more readable that don't sacrifice efficiency. Reasonable variable names for instance! The compiler would bake them down anyway. Reasonable comments just as well!
True: code is more read than it is written! BUT: it is also MUCH much more executed than its read! So I agree: Readability SHOULD be sacrificed for efficiency! But I'd say this is rarely the case that one has to do that.
Using the most hacky solution to a very specific problem that only affects a single function over every single codebase that implemented it as an argument against a general case isn't very useful, like at all. Supposing that most code is actually just waiting for another FISR to come along would make for an incredibly buggy, unreadable, and probably badly performing as well, codebase. Write readable code first, then if, and only if, the program is too slow, measure what causes the slowdown and then improve that. If in the end you end up writing another FISR like algorithm, then yeah cool, keep it and document it well, but until then, don't write unreadable code.
Here’s another tip: use a two-dimensional layout to make the structure of complex expressions clearer. Example: if ( not isinstance(cmds, (list, tuple)) or not all(isinstance(elt, (list, tuple)) and len(elt) == 2 for elt in cmds) or not all ( isinstance(elt[0], str) and elt[0] in valid_cmds and isinstance(elt[1], (list, tuple)) and len(elt[1]) == len(steps) for elt in cmds ) ) : raise TypeError("cmds must be sequence of («cmd», «args_seq»).") #end if
When an expression gets as complex as this one, a tip is to make it a function and add doc strings so the developer wanting to change this piece of code in the future (being you or someone else) has a nice time understanding what it is trying to achieve
@@christoferberruzchungata2722 It’s already a large chunk of the code in a single function. Splitting it off doesn’t seem very helpful, when the error message is already supposed to explain what the check is trying to achieve.
There is no perfect way to code. Just dont be sloppy. Google isnt the king of the world, they provide a service, and the style guide is probably informed by their particular business
Downside of importing modules instead of classes and functions - worse performance and longer lines. And bad examples of "no" comprehensions, all of them will be look as ugly in for loops.
@@sadhlife it's a basics of python optimization, it described many times in books, if you getting attribute of object, under hood you calling getattr, and it costs you something. But you can continue be doubt if you wish ;)
💡 Here's my FREE 7-step guide to help you consistently design great software: arjancodes.com/designguide.
I've been writing Python code (among other languages) at Google for many years, and I *definitely* write better code since I started working there. Here are some ways that the way I write code changed:
* No code is temporary. Assume anything you check in will be used in production, because someone might decide to do so. They won't ask you.
* Code isn't useful if you don't have proper unit tests. You literally can't check it in without tests.
* Clear is better than fast. Clear is better than brief. Clever code and implicit assumptions are the first place you look to find the bugs.
* Your tests are your spec. Anyone is free to change your code; the only way to ensure they know what it is *supposed* to do is to test it properly.
* A given commit can EITHER refactor (in which case your tests do not change) OR change functionality (in which case your tests demonstrate the change in expectations). Never do both at the same time. You eliminate most mistakes this way.
It's not in the public style guide (because it conflicts with pep8), but 2 spaces instead of 4 is used across all languages. It sounds dumb but once you go 2-space you will never be able to go back. Cannot unsee. 4-space feels like the old 8-space standard. Cannot unsee... Cannot go back. My eyes are ruined forever. 2-space it must be forever...
These are great tips.
I do the opposite regarding indentation : 4 spaces everywhere
2 spaces is not enough to inconsciously distinguish the different blocks
Thank you. All those bullet points are amazing.
BTW, as soon as I left Google, I got back to bigger indentation space (like 4 spaces). I prefer to reduce the nested logic rather than letting me add more complicated nested code with 2 space indentation... Unless I write in lisp.
Every week I just wait for your new video to learn something new. Thank you so much.
Happy to hear you like the videos!
The only thing where I disagree is that you rather do not put assertions in your code but in your unit tests. Let me point out that there is a different kind of assertions that does not belong into a unit test, e.g. assertions about intermediate results and loop invariants. Moreover, assertions in your code also function as an executable documentation what property is expected to hold at this location in you program. This can give you further insights about how the code works. Since this "documentation" is executed, it typically does not rot as pure text does in comments. Personally, I use assertions a lot and combine it with advanced testing techniques that use randomization. I was already often surprised that assertions failed in my code I never thought possible, thus finding wrong assumptions and design flaws.
I was about to write the same but you’ve explained it very well already.
Those are good points, thanks for the in depth explanation. I’ll definitely experiment more with using asserts like this and it might actually be a good topic to talk about in a future video.
Very nice comment! Thx. Glad Arjan take you up on these aspects
assertions are best only if they put separately in the code, because if someone runs the code with -O flag, then this can lead to disabling the assertion execution and can produce unexpected results.
@@shera2667 Generally, I disagree with that assertions should be separated from the other code. You can see assertions as an executable kind of documentation which are embedded into your code. If you only care about functional correctness, and you have assertions without side-effects, you should be fine. However, if the run time makes a difference, assertions may be problematic, here I agree.
Just keep in mind that the order of "for" loops and their variables stays the same when going from loops in a comprehension to 'normal' nested loops and vice versa. So "[bla for x in range(5) for y in range(5)]" translates to "for x in range(5): for y in range(5): bla". Since I had learned this easy rule of thumb from someone on stackoverflow, I don't find these list comprehensions hard to read at all. At least I'd doubt the traditional loop code looked so much better.
true, this is amazing
Yes and no. Working in teams means not everyone is used to it, but everyone will be used to classic for loops.
And when using code formatters, the classic fors will be indented and looking very ugly, so there is a solid chance they get refactored.
Just to keep the example, you could write seperate
from itertools import it
...
x_y = it.product(range(5), range(5))
for x, y in x_y:
...
# or put the for back, but now more easily to grasp into the list comprehension
Some might not be used to this a bit more functional approach and just put a part of the for looping into an own function call (like a def x_y_pairs(): return ...), but this will help the reader more (and might be even the better way longterm anyway).
I have been trying to avoid "for" loops unless there are side effects within the iterations. Not only do I avoid "for" loops, but I also avoid mutation. The nested list example in the style guide does both.
For nesting, I sometimes use this pattern:
items = [(x,y,z) for x in range(5) for y in range(6) for z in range(7)]
results = [do_something_with(x,y,z) for x,y,z in items]
Although in the above case, where the nesting is between independent iterables, I would agree that itertools.product would be better.
And I use this pattern all the time, because I don't know a better way to "flatten" lists in Python:
flat_list = [x for sublist in list_of_lists for x in sublist]
When I need nesting between iterables that have some dependency, I might combine the two approaches:
nested_items = [comma_separated_string.split(",") for comma_separated_string in my_strings]
flat_items = [x for sublist in nested_items for x in sublist]
results = [do_something_with(x) for x in flat_items]
The more I think about this, the more I think I should use itertools more. Maybe chain is the better way to flatten lists - what do you think?
Very helpful!
Just half an hour ago, I was code-reviewing a function in a long-running service where we default to current time. However, the current time is being loaded when the module is loaded, so it falls foul of the default argument issues like the time.time() example.
Just changed it to a Callable, so that the correct time is created from the Callable up each occasion the default value is needed
An alternative way is to have the argument default to None. The function checks for this value, and replaces it with the current time (or other dynamically-computed value, as appropriate).
@@lawrencedoliveiro9104 Yes. That's what I've seen recommended elsewhere.
Thanks for all these tips! As an extra tip, I would suggest to write short and clear functions that can be used for creating bigger functions. Basically this way, you won't be repeating any code you write and each of your functions will have not much code, so they will be easy to read.
Isn't it a DRY principle?
I think the guide is very good for people who build a library or at least create code that other people use within their code.
IN our company we have code that runs on our customer's servers, so the program must never crash and must always give some response our customer can send us. This means that we have certain places where we actually have to just catch every Exception that might come, especially those we don't expect.
I would include an exception traceback as part of “some response our customer can send us”. Your subsystem writes logfiles, doesn’t it? Any exception crash should end up in there; tell them to send those.
@arjoncodes can’t stop watching your content. Truly amazing stuff. If you ever get chance, could you make videos on api development and good practices around that topic please - thanks
My python mentor.
Why didn't i find you ever since.
Keep up the great content.
I am really learning alot.
Thank you, glad to hear the content is helpful to you.
really appreciate your effort. wondering if you could make more video about this topic
6:24 The argument is called “discount_perc”, but its interpretation is as a fraction, not as a percentage. I would say that is bad.
Great video, Arjan. Fully agree with points about properties and getters/setters.
The good MS / Java devs transition to python well but others tend to struggle so this provides a credible documented way to approach things.
Thank you for giving away your golden thoughts...
Hi Arjan, great video,
I have a question regarding 10:26, you are saying that a setter here is more appropriate than a @property.setter and your reasoning is complexity,
My question is, why can't you have more complex logics inside of property? Why do you consider it bad practice?
To me, as it seems, you are writing less code because you don't need to write a getter for that property.
Many thanks,
Dan
Great tips! Thanks.
You’re welcome!
I’m of two minds regarding list comprehensions.
On the one hand, reading code that uses functions like map and filter for operations on collections can be easy to follow, if you’re stringing together operations, it starts looking ugly. (One solution, perhaps, is to use packages like toolz that provide functions for piping, currying, and composing functions-but how pythonic is that?)
On the other hand-and I’ll admit that my affinity for functional programming is really showing here-I don’t see that even a moderately complicated list comprehension is that confusing *if formatted in a way that’s easy to understand.* After all, list comprehension syntax resembles set notation; placing each for expression on its own line clarifies what’s going on quite a bit. I think that approach may make sense within a shop whose culture encourages thinking that way.
In either case, there comes a point where readability is compromised. In those cases I find it useful simply to write a finction that returns a generator for the collection (using yield): sure, it’s not as declarative (and I really prefer-non-pythonically?-declarative over imperative code), but it might be easier for others to maintain.
Every time you abuse default argument values, God kills a kitten
11:44 The situation you’re describing -- accessing an undeclared variable declared in an outer scope, followed by an assignment to it -- is a Python language error. So it cannot lead to “unexpected behaviour”. Python requires you to declare the variable as either “global” or “nonlocal” before it permits this sort of thing.
really interesting share Arjan! Great video man!
The nonlocal keyword fixes the way that lexical scoping works so that python is no longer broken.
2:04 I don’t have a problem with from-imports. If you search for “my_function”, eventually you will have to encounter either an actual definition or an import. That’s what your editor’s search function is for.
I've been writing Python at Google for a long time, and I gotta say this rule is my FAVORITE one. Yes, you're never more than one click away from the definition of anything, but the owning module is important information about what you're looking at -- it's effectively part of the name of the function or class, it gives context. Especially when you're reading code that someone else wrote, it is always more clear.
Just avoid wildcard imports, and you’ll be fine.
@@lawrencedoliveiro9104 do whatever you want, it's your code. But if you're working on a team, especially when a product like UA-cam depends on your code being perfect, "you'll be fine" doesn't cut it.
@@TylerLarson I hope you’re not trying to offer up UA-cam as an example of “perfect” code ...
@@lawrencedoliveiro9104 nothing is ever perfect, but we all try. Like I mentioned in another thread, I haven't had a single bug of my own make it to production in over six years, which I think is a pretty damn good accomplishment. I don't personally work on YT so I can't comment on how their rules affect things, but I can definitely say that my own code wouldn't be a 10th as solid as it is if it wasn't for the rules I had to learn to follow. Every last one of them was inconvenient, but looking back I can definitely see the wisdom. And today I wouldn't argue with any of the restrictions that my code has to follow.
Doesn't the 'default argument value' support conflict with the idea that 'explicit is better than implicit' as well?
I must admit that the default argument value feature does seem handy, esp when adding some new feature into a function (select a default that retains the previous behaviour), but I also know that adding more and more arguments to a function leads to code that's hard to reason about and maintain.
In theory, but if you're always supplying the same constants every time you call a function, then you're not being explicit, you're being verbose. Overriding a default should signal that you're overriding "normal" expectations.
Explicit is better than implicit, but practicality beats purity. If mostly the use of the function is with one value of the argument and only seldom with another, or if you don't want to make users bog down on what to pass to that when they start using the func, then default values make a lot of sense
8:50 Properties are useful for wrapping getter/setter functions when you are creating a Python wrapper around an external C library using ctypes.
Great , Thanks I was searching for this video
Thank you Rahul, glad you liked the video!
We did this on my team with Google's JS style guide.
I am always wary about the tradeoff between readability regarding where a function comes from vs importing only what you need
I'm wondering... It's it possible to go:
from my_package import my_module.my_function?
Because if you can do that, then you you still get the origin of the function without importing the entire module.
There is one advice that has stuck with me as a software engineer: write code to be read. Yeah, it is a few more keystrokes, but you will save other people time trying to figure out where something came from
I can understand the motivation for importing the module alone in large files, but that seems to me as overkill for smaller files (say
"Having readable code is more important than having efficient code". I definitely don't agree with this for my embedded world. Efficient code is often required. However, that list comprehension is horrible, I agree, but could easily be made readable with a comment. One of the few good uses of a comment, to explain something not readable, that has been done for performance.
Background music is good. Can you share it's link as well ?
Btw, great content : )
Awesome video...helpful to understand those magical things happening in our code..this will help in reviewing our code and maintaining our code to be more readable .
Glad it was helpful!
@@ArjanCodes Was a bit disappointed not to see any comment around try-else while discussing exceptions. Per Guido/PEP the entire purpose is to enable minimizing code within the Try statement. It's just a little something extra that helps the programmer cope with letting go of a bad behavior :D
Great video!
Would be cool to see some more "Insights" into the python interpreter because a lot of silly mistakes like the default variables and variable nesting are due to the nature of Python being a dynamic interpreted language.
your videos are very good.
Glad you like them!
Besides the interesting technical topics. Very clear language!
Thank you for the kind comment!
I still miss the fun quotes on the whiteboard! :-)
Properties should follow this in a SW dev mode, but I feel that when you cross the physical boundaries in FW then properties should be allowed a bigger action task. e.g. having an I2C device registers in human readable form. I prefer the simplicity of the property, that calls get_register / set_register / rmw_register to do the work. So that device.some_device_property will create work on the I2C bus, but is generally a small amount of work on the remote device. But not doing things like device.running = False to stop it from running. Those should be verb functions e.g. device.start/stop
Thanks - I have no experience myself with embedded software, but that's interesting.
Great video, it would be nice to hear your thoughts on Google's doc block style guide, I've noticed in your videos you use the more simple one-line explanation docblock (which may be for keeping the lesson easier to understand). I come from a PHP / Java background so when picking which doc block style guide to use, Google's felt familiar to me compared to phpdoc (summary, params/args, returns, throws/raises).
An interviewer raised a question to me that doc blocks may be redundant if you use strict types, type hinting and DAMP naming styles. I get his point but it's hard to undo years of habit-formed documentation because it just doesn't look right anymore (also can't automatically generate an api doc or provide extra info in IDE's when hovering methods etc). Be good to know where you stand on this :)
Googles docstring format pre-dates type checking, but it's worth noting that the most important part about documenting your parameters is that's the only place you get to describe what they actually DO, not just the data type. That said, Google's doc format for Go doesn't document args explicitly; instead you write it out as prose if needed. Honestly, I think it was the lack of types in python that originally made the doc format what it is, and now since we have ide integration for it, it's just handy.
Great content. Keep it up
Thanks, will do!
I love being a mere mortal. By the way, thanks mate.
Cool video. Nice to go over best practices to avoid spaghetti code 🍜🍝
Thank you Kevin, glad you liked the video!
very insightful video
I have a doubt with the imports, if it is true that sometimes it is valid to know where a class or function comes from, there are times when it seems obvious to me, for example, I will not import "from django.core import exceptions" then use "except exceptions.ObjectDoesNotExist"... or, "from modules.mymodule import serializers" to use "serializer_class = serializers.MySerializer" in my ViewSet. Sometimes I find it unnecessary to do this.
Google teaches R in their Google Data Analytics Specialization and talk about Using Python. Python is the most preferred language and they taught entire data analysis and viz in R.
As for data science/ml pipelines, if, within a foo.py file, there are multiple functions that are called from a main one (called “run” and sitting at the bottom of the script) and the latter is called from main.py: should one define a class within foo.py, containing all functions, and import the class from it, thus calling Foo().run() or simply leave functions separately and call foo.run() (as suggested in the video)? Thank you so much
If those functions are only called from the “main” function, why not make them local to that “main” function?
@@lawrencedoliveiro9104 for the sake of readability; there could be different types of functions and hundreads lines of code
Have difficulty navigating nested blocks? This is why I have editor commands defined to quickly jump between lines with matching indentation.
Great stuff!
For getters and setters could you show examples when they can be used neatly with @property Decorator?
The thing about mutable default values in functions is actually useful for memoization. The parameter's name should indicate its purpose (eg "memo"). It can start off empty or contain a base case or two. Thoughts?
Interesting idea. I've assigned attributes to functions before just for that, where on the first read you have to catch the AttributeError and assign the initial value. Using the default value would make that a lot easier - if there isn't a catch to it that I don't see right now. 😬
I think i'd rather use a class for that, it seems more explicit to me and the memo table could be either instance or class -wide giving some flexibility, using the default argument in function (at least to me) looks a bit hacky and confusing (is it really intentional or complete accident altho this can be mitigated by appropriate naming) also it relies on users not passing anything into the memo argument which seems somewhat counterintuitive, you shouldn't be expected not to pass anything into an argument in my opinion
@@funkmedaddy The user could pass an already filled in memo table which would replace the default empty one. Passing anything else and the user gets what they deserve: undefined behaviour.
A potential problem is that the memo table is unreachable. But when would you need to access it? Anyway, a class would solve that.
Sounds dangerously easy to accidentally mess up. Better to follow common practice here. Cleverness is where bugs live.
Python readability admin at Google here. I did the recent re-wording of the "mutable global state" section.
1. It is cleaner to use a decorator to wrap up a function with memoization, rather than putting state into mutable default values.
2. This is still problematic, since the cache can keep heap objects alive you wouldn't want to lifetime-extend like that. In particular, memoization-caches ideally should be scoped. In some situations, one can fix this with weak references, but their precise semantics is a bit tricky. Personally, I'm not a big fan of "memoization with global cache".
3. functools.cached_property is a good thing.
If you absolutely need to, using a property is the Pythonic way to deal with it!
Great video! I have a couple questions @ArjanCodes
Do you have any recommendations for doing engineering calculations and keep units consistent (i.e. units kg, km, N, J, etc…). Especially for dealing with orbital mechanics?
Also (this ones a bit more complicated) - how would you recommend setting up a function that can rearrange itself and solve itself based on the given inputs? For example (very simple example), you have a def calc_rectangle(area=None, width=None, length=None). You can pass any two parameters to the function and it would solve for the one unknown. You can put this into a conditional statement, but then you re-writing the same equation multiple times, to solve for each case. Is there a method that that can quickly solve it?
Here is a more realistic use case: To calculate an orbits radius perigee (rp) there are several methods:
rp = a * (1 - e)
rp = 2 * a - ra
rp = ra * (1-e) / (1+e)
rp = (P^(2/3) mu^(1/3))/(2*pi)^(2/3) * (1 - e)
etc...
How would you create a function to solve for all these conditions without writing for every input case?
I'd create a function for each and then a mapper function to select which to use in a given case
I've seen two different solutions to unit handling:
1) use a package like `units`, `pint`, `unyt`, or `unum`. This solution is more robust in that you end up dealing with compound objects (with both a magnitude and a unit), and thus you can do things like ensure incompatible dimensions cannot be added together (e.g., meters plus seconds). The drawbacks are that the code often ends up rather verbose, and your calculated values are now no longer primitive types, which can make it harder to pass results to other functions or libraries.
2) choose a standard set of units across your project, e.g., meters, seconds, kg, and define these as variables equal to 1.0. Then you can define other units like cm, inch, g, relative to the standards. Do this in a units.py file so that you don't pollute the namespace. Then every dimensionful quantity you define (or read in from a file, e.g.) MUST be multiplied by a unit (or use the default unit), e.g.:
# units.py
m = 1.0
kg = 1.0
s = 1.0
cm = 1e-2 * m
g = 1e-3 * kg
inch = 2.54 * cm
cm3 = cm * cm * cm
N = kg * m / (s*s)
# main.py
from . import units
x = 5.4 * units.m
m = 1.0 * units.g
An advantage here is that you can change what units are printed simply by dividing:
>>> print(f"x = {x/units.m} m or {x/units.cm} cm or {x/units.km} km or {x/units.inch} inches")
Or find ratios between units:
>>> units.ly / units.km # 9.461e12 km in a lightyear
This solution is a lot more flexible but a little more dangerous as it's up to the user to apply the units on definition/import and make sure incompatible dimensions are not being added. But you end up with regular floating point types that only ever get multiplied/divided, saving some cognitive overhead and interoperability concerns. Another caveat: it also won't work for unit conversions that are not just scalar multiples, e.g., temperature between C and F. In the end, this solution is probably useful for small-scale projects, but for larger projects, one of the aforementioned libraries may be better.
About this thing with the imports. Most Python modules (e.g. NumPy, Pandas, TensorFlow) are expecting tons of arguments for function calls, so you can't fit everything into one line which I really don't like. And usually people import the module with a shortage name alias, so there's basically not a better readability anyways by explicitly putting the shortened module name.
I'm really undecided which way is really better. For very similar APIs like TensorFow vs NumPy, I'd say it can be very confusing to mix them up for sure, so I'd add the module name in that case. But for my own code or standard Python modules, I'm usually importing explicit functions because pressing F12 is feasible 😅
Definitely a good point. I think what also plays a role is the balance between keeping things consistent and doing things differently as an individual developer. The Google style guide is intended for Google's Python libraries and I'm sure quite a few different developers work on those libraries. So then I can definitely understand that they lean toward being more strict in their style guide (even if in some cases it's overkill), to ensure consistency and avoid discussion.
Very good point! I'm usually working in small projects, so it's fine for me when only I'm working on the code. But enforcing style-guides by linters is generally a good idea for co-working. It helps you keep the code clean 😂
IMO those tools should lint metrics that are more relevant to achieving a good software design and readability. Putting docstrings, having imports in the wrong order etc. is less significant than functions with too many params, classes with bad cohesion, high cyclomatic complexity, ... SonarLint usually helps me with that 😄
For long function/method calls, put the arguments one to a line, and give the whole expression a two-dimensional layout, e .g.:
aqsis_output = ReasonablePerms.subprocess.check_output \
(
args = ["aqsis"] + extra + [self._filename],
stdin = subprocess.DEVNULL,
stderr = subprocess.STDOUT,
universal_newlines = True,
cwd = self._parent._workdir,
timeout = self._parent.timeout
)
from package import module vs from package.module import function as module_function
On the right example, wouldn't it be better? Since you probably will not need everything from module with hundreds of function, but just a few that can be easily imported individually and renamed like shown.
I'm just curious about that right now.
I'd use the first one since it's more clear in my opinion. Neither of these pollute the global scope and everything gets imported (you just can't use it in the second case)
was thinking the same - you can find the origin of the imported function very easily if you wanted to, and as another commenter pointed out things get muddied when you use aliases anyways. importing functions explicitly from a module makes more sense to me.
As someone who works in a large codebase, we have standard aliases for modules. I think that makes a huge different, because no matter the file I am reading code from, if I see a certain alias, I know what module it is. In my opinion, aliases are problematic when there are not consistent; and that’s what code reviews are for
the list default arguments have been useful to me when writting recursive code though. or was i doing something very silly? doesn't happen often, but tree traversal is one case.
Yay! Finally I'm the first to comment. Thanks a lot uncle Arjan. Very useful tips.
You win! :) Thanks, and enjoy the video.
Could you fix the automatic subtitles? This is because the automatic subtitling is detecting as korean.
Done!
@@ArjanCodes Thank you very much!!
Whats a "physical computation?"
Doesn't it increase the memory usage when import an entire module and just one function is used? 🙃
I'm not entirely sure how well Python does tree shaking. But in addition to following this import guide, I would suggest to keep your modules relatively small. So if memory usage increases, I don't expect that to be a problem if you keep your modules at a manageable size.
From what I understand, when you import anything from a module it has to load the entire module even if you only want one specific part.
Why did you choose Korean subtitles for your video?
Apparently UA-cam thinks I'm Korean, haha. I've updated the captions to English.
Some time ago I watched one of your videos regarding DataClasses. As an example, you were developing a small financial application with Deposits, Withdrawals, etc. I am trying to find that video without any success. Can you help ?
Is it this one? ua-cam.com/video/FM71_a3txTo/v-deo.html
@@ArjanCodes Yes, thank you. And thank you, for the most informative videos.
i do exactly opposite of what google mafia states. And I'm so happy with the result.
I'm guilty of importing my functions instead of their modules. I thought it was fine until it came back at my face and I had to refactor old code. It was a mess, but hey that's how we learn.
Isn't it better to keep things light rather than importing the entire module? I think this is ok in moderation
I got caught in the nested loops inside list comprehensions, lol
At 4:18, the example is actually of a dictionary comprehension not a set comprehension.
About the minimizing the usage of try/except clauses when you don't care about the error-- If you plan for another part of your code to deal with an error, I'm of the opinion that explicitly reraising the exception (with `except: raise`) is better than relying on the built-in behavior for Python to reraise an exception. This way, you are clearly acknowledging that a part of your code can raise an error and that it is handled elsewhere.
If I'm looking at some code that calls some function which can raise an error and isn't inside a try/except block, I have no way of knowing if that error is caught elsewhere in the code or is just assumed to not raise an error.
My wife (a google employee) thought I've being watching a couples therapy video
7:41 Sorry but this is the first time ever that I see someone calling a function oops! 😃
True: foo is pretty common tho! I usually call my test functions "test" :| ... I know 💤
I love your videos. I don't love the very distracting background "music".
It's only under a few parts, but noted.
I love generators. They are memory cheap.
What kind of Python he is using? Never saw def foo(my_list: list....) -> int?
Why is Python so terribly slow? ... Probably I got too much used of the speed of PHP.
5:00 im gonna interrupt you there - understandable code is NOT more important than efficient code. Take for example fast inverse square root - nobody understands this without an explanantion, but it somehow revolutionized computer graphics in video games. It is indeed way more important than short code! But there is a big difference between short and efficient :D
There are some things tho that make code much more readable that don't sacrifice efficiency. Reasonable variable names for instance! The compiler would bake them down anyway. Reasonable comments just as well!
True: code is more read than it is written! BUT: it is also MUCH much more executed than its read!
So I agree: Readability SHOULD be sacrificed for efficiency! But I'd say this is rarely the case that one has to do that.
@@ewerybody agreed
Using the most hacky solution to a very specific problem that only affects a single function over every single codebase that implemented it as an argument against a general case isn't very useful, like at all.
Supposing that most code is actually just waiting for another FISR to come along would make for an incredibly buggy, unreadable, and probably badly performing as well, codebase. Write readable code first, then if, and only if, the program is too slow, measure what causes the slowdown and then improve that. If in the end you end up writing another FISR like algorithm, then yeah cool, keep it and document it well, but until then, don't write unreadable code.
Here’s another tip: use a two-dimensional layout to make the structure of complex expressions clearer. Example:
if (
not isinstance(cmds, (list, tuple))
or
not all(isinstance(elt, (list, tuple)) and len(elt) == 2 for elt in cmds)
or
not all
(
isinstance(elt[0], str)
and
elt[0] in valid_cmds
and
isinstance(elt[1], (list, tuple))
and
len(elt[1]) == len(steps)
for elt in cmds
)
) :
raise TypeError("cmds must be sequence of («cmd», «args_seq»).")
#end if
When an expression gets as complex as this one, a tip is to make it a function and add doc strings so the developer wanting to change this piece of code in the future (being you or someone else) has a nice time understanding what it is trying to achieve
@@christoferberruzchungata2722 It’s already in a function.
@@christoferberruzchungata2722 It’s already a large chunk of the code in a single function. Splitting it off doesn’t seem very helpful, when the error message is already supposed to explain what the check is trying to achieve.
If you want to see that snippet in context, have a look here
/ldo/aleqsei , in the method Context.Rib.motion().
A1
There is no perfect way to code. Just dont be sloppy. Google isnt the king of the world, they provide a service, and the style guide is probably informed by their particular business
Downside of importing modules instead of classes and functions - worse performance and longer lines. And bad examples of "no" comprehensions, all of them will be look as ugly in for loops.
There's no real performance difference between importing and using a whole module or just a function from a module.
@@sadhlife there is performance difference, and more you're importing like this more "real" it is
@@clauseclause6640 prove it. I really, really doubt there's a difference. Post a timeit benchmark
@@sadhlife it's a basics of python optimization, it described many times in books, if you getting attribute of object, under hood you calling getattr, and it costs you something. But you can continue be doubt if you wish ;)
@@clauseclause6640 sure, there's extra dictionary lookups happening. But I doubt that makes even 0.1% of a difference in any program