Don't Be a Bash BRAINLET! (Shell is all about streams!)

Поділитися
Вставка
  • Опубліковано 5 вер 2024
  • lukesmith.xyz
    Scripts: github.com/Luk...

КОМЕНТАРІ • 164

  • @andrejshm
    @andrejshm 6 років тому +278

    You can actually do all the different shortcut generation stuff in parallel using tee:
    sed"/^#/d" "$folders" | tee >(awk_function_1) >(awk_function_2) >(awk_function_3) >(awk_function_4)
    this makes 4 separate awk processes and pipes clones of that input pipe into all of them at the same time, so you get real parallel processing.
    I wrote awk_function_# because it's preferable to write the insides of those parentheses as functions so it's not too cluttered). Don't forget to > to a file in each one of them. You can do it the unreadable way, too, with >(awk ..... > ....)

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому +63

      Thanks, I'm glad you posted this because I was just trying to figure out tee after another conversation. I'm getting an error `/dev/fd/63: Permission denied` though. Know anything about this?

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому +52

      Scratch that, it was a silly mistake on my part. Thanks for your help!

    • @cole8097
      @cole8097 6 років тому +4

      Do you need a `wait` after that or will it block the parent process automatically?

    • @cole8097
      @cole8097 6 років тому +6

      Also, you just hinted at something I wanted to comment on which is functionally oriented code makes complicated lines much more elegant if done properly.

    • @kas1987kas
      @kas1987kas 6 років тому +22

      You can also remove sed.
      awk '/^[^#]/ {print}'
      awk '{ if (!match($0, /^#/)) print}'
      perl -n -e '/^#/ || print; print "# instead awk"'

  • @mladentase
    @mladentase 5 років тому +161

    I never thought that I would impulsively watch shell scripting videos

  • @Solarplexus0
    @Solarplexus0 6 років тому +245

    Your meme competency is unmatched. The video titles are always fire

    • @mezcel953
      @mezcel953 6 років тому +1

      bringing that heat

    • @uiopuiop3472
      @uiopuiop3472 3 роки тому

      @@mezcel953 my tervuren is in heat and i will do the thing to it again

  • @Sventimir
    @Sventimir 4 роки тому +22

    These sed commands in the new version are superfluous, you know? Awk can ignore lines based on a regular expression just as easily. Just put a condition "/^[^#]/" before the awk command. This not only saves starting another process, but also piping data between, which also is I/O and therefore a relatively slow operation.

  • @hotscriptgg
    @hotscriptgg 6 років тому +26

    Props for Zdzisław Beksiński folder

  • @dr.mikeybee
    @dr.mikeybee 4 роки тому +7

    Probably the biggest general idea in performance tuning is getting rid of loops. For example, if you're writing SQL, run your biggest filters first.

  • @rexevan6714
    @rexevan6714 6 років тому +39

    Luke Smith on youtube marathon so far.

  • @AdmiralMaur
    @AdmiralMaur 4 роки тому +34

    Everyone else: Embarrassed to look up man cat
    Luke: Embarrassed to waste cpu instructions on his hotkey+alias generation script

  • @sirjofri
    @sirjofri 6 років тому +4

    I would also combine sed and awk into one awk call. With regex you can filter out your comments. It's something like "awk '/^[^#]/{print...}' output", braces are executed for each line that matches the pattern. This code here is untested, maybe you need to adjust the regular expression, but you don't need sed anymore. Combine this with the tee as someone suggested and it's highly efficient. I think I would've written the whole script in awk, the print in awk can be redirected, too. But thank you for sharing your learning process, especially as you are not an IT guy

  • @georgemachappy
    @georgemachappy 6 років тому +66

    Update: see comments below; this just ain’t true! Use $() not ``.
    Great video. -I actually think the improvement is less about loops vs streams (after all, the program processing the stream is looping anyway) and more about the painful cost of repeatedly spawning and destroying child processes. Case study: the difference between $() and ``. I see a lot of idiomatic and style-guide-recommended use of $() because it implicitly acts like " " between the ( ) and therefore makes escaping easier. After all, almost every case where `` works, $() works just fine. This covers up the fact that the real difference is that $() forks a subshell and `` execs it in place. The difference between the two can be night and day.- Of course, not to say that loops vs streams isn't an important idiom --- it is. The shell is there to set up the plumbing not to do too much work by itself.
    Next optimization/video: use of &, wait, and mkfifo to do even heavier lifting? 👍

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому +20

      Good point. It never really hit me the potential performance differences between the $() and ``; I should try and be more mindful of overusing $(). 😎👍

    • @georgemachappy
      @georgemachappy 6 років тому +9

      I forked the script to make some suggested edits, namely 1) not reading the file multiple times and 2) rewriting the awk sections in printf (which is a builtin and, again, doesnt require a child process per invocation). I finished it and then ran the two back to back against a synthetic config file with 2000 token/expansion lines and the speedup is so marginal that the edits are worse than leaving it alone. Lesson #2 of the day: measure, measure, measure. Premature optimization is the root of all evil. Sure you have to spawn awk three times for every line in the file, but if it isn't *too* expensive, sometimes it's not even worth it. Somewhere, someone is rewriting it in C ... 😂

    • @Ryndae-l
      @Ryndae-l 5 років тому +8

      Do you have a source on that ? According to POSIX, `...` and $(...) both create a subshell environment... (Shell command language, section 2.6.3 Command Substitution)
      pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_03

    • @Ryndae-l
      @Ryndae-l 5 років тому +7

      "The shell shall expand the command substitution by executing command in a subshell environment (see Shell Execution Environment) and replacing the command substitution (the text of command plus the enclosing "$()" or backquotes) with the standard output of the command, removing sequences of one or more characters at the end of the substitution."

    • @aaronlippincott7385
      @aaronlippincott7385 5 років тому +3

      keen to see some source, didn't know about that

  • @davidh.4944
    @davidh.4944 3 роки тому +3

    A number of years ago I did a series of tests on text manipulation in bash, and I found that, for text blocks of up to about 100KiB, bash's internal tools were almost always faster to use than spawning a separate process for *awk*, *sed*, *grep*, or whatnot.
    The only exceptions were for things like doing global substitutions over a whole file (in which case *sed* or *ed* were more appropriate), or extracting very specific strings from unpredictable or hard-to-access locations within a text block ( *grep* , *sed* , *awk* , or another tool that handles global regex).
    Since then, unless I have a special need for one of the above, I generally either just slurp the entire input into an array with *mapfile* (for whole lines), or use a *while read* loop (very good for splitting delimited values into multiple variables). Then I will use various parameter substitutions or *printf -v* to reformat them into the output I want. I avoid using external tools as much as possible. Even if the code comes out being a bit longer or more complex to read, it is usually faster to run.
    In this use case, I would probably simply use:
    _while read -r shortcut target ; do ; done

  • @lucasleonardo2111
    @lucasleonardo2111 6 років тому +14

    I think a part of the improvement in performance has to do with processing your data and writing it to disk on each loop, while now you're writing it after you've processed the whole bunch.

    • @dawkot6955
      @dawkot6955 4 роки тому +7

      (1 year later)
      I think he's just not opening and closing the file on each iteration

  • @gametree1307
    @gametree1307 6 років тому +71

    TL;DW be smart about IO.
    Or reading a file 4 times and writing 4 times is a lot faster than reading once and writing a bajillion times.

    • @GooogleGoglee
      @GooogleGoglee 4 роки тому

      What is it TL;DW? Can you make an example?

    • @bradleyhove4177
      @bradleyhove4177 4 роки тому +1

      @@GooogleGoglee It stands for "Too long; didn't watch"

    • @GooogleGoglee
      @GooogleGoglee 4 роки тому

      @@bradleyhove4177 Thanks. However pretty sure he watched almost all the video since the guy was using that word almost at the end.

    • @bradleyhove4177
      @bradleyhove4177 4 роки тому +7

      @@GooogleGoglee No the reason he said "TL;DR" is basically like a summary for people who won't watch the whole video. He watched the video and summarized it for anyone that doesn't want to watch the whole thing

    • @GooogleGoglee
      @GooogleGoglee 4 роки тому

      @@bradleyhove4177 Thanks

  • @SimGunther
    @SimGunther 6 років тому +21

    As I am victim of the Haskell meme, I do parallel computations whenever I can by appending the & character at the end of the command if I know that I don't have shared mutable state while running this command.
    Also, be mindful and store the sed outputs in local variables and use them as arguments in the accompanying awk commands below it instead of pipelining the same sed commands in 2-4 different places.

    • @georgemachappy
      @georgemachappy 6 років тому +7

      I have the same gut tendency as you do here, but I forked Luke's script, rewrote the sed pipeline to only call sed once and rewrote the awk sections with printf (a builtin) and ... the speedup is so small you might as well just leave it. It's legitimately clearer just running it in duplicate. And that was with a synthetic test input file of 2000 lines. I didn't run background jobs but I have a hunch that the parallelism won't start paying off for millions of lines. Always curious to see some test results though ... 😂

    • @AnastasisGrammenos
      @AnastasisGrammenos 6 років тому +16

      Thanks, I was desperate for a way to handle my 3billion shortcuts.

    • @oskarlappi9593
      @oskarlappi9593 6 років тому +4

      You could actually write the whole folder shortcuts as one sed script, since sed can write pattern space to files, this would process it all in one go.
      Since sed works line by line, this feels like the most efficient solution

  • @strakhov
    @strakhov 2 роки тому

    That's actually a pretty cool way of storing general aliases to then connect it with "entr" utility for bashrc autoupdate.
    Thanks for the video!

  • @KingZero69
    @KingZero69 6 років тому +4

    his paused song is called “Infected Mushroom”...

  • @natemaia9237
    @natemaia9237 6 років тому +16

    Instead of using sed ''command "file" | awk 'operation' you can just do it all with awk alone and avoid pipelines (|) ... awk 'command' "infile" >> "outfile"
    You can pattern match with awk just like you are with sed... awk '!/^#/ {print}' .. this // acts as a pattern match of the current line just like sed, the ! just negates it, you can also chain them together with && and || .. awk '!/^#/ && !/^$/ {print}' prints all lines except commented and blank ones.
    Cheers

  • @tobias-arturnegrui7330
    @tobias-arturnegrui7330 6 років тому

    I was actually learning about the bash shell and after messing around multiple files (bash_aliases and bash_functions), the sourcing became noticeably slower. You drop in with the videos in the perfect moment. xd

  • @norcal6181
    @norcal6181 6 років тому +2

    The for loop way is reminiscent of how many college classes and text books show how to do things. Definitely not efficient.

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому +4

      Yeah. I think that some of that is needed in the beginning to make programmatic thinking understandable, but unfortunately a lot of people don't move on. It's sort of like how everyone learns Python in CS 101 nowadays, imprints on it, and then want to write full video games, websites and huge programs in it only to find out that Python doesn't have the speed for it, etc.

  • @NicholasMaietta
    @NicholasMaietta 6 років тому

    Very nice video. I'm starting to pay more attention to how I write bash scripts so this video is nice to see. Thank you.

  • @wesleyrm
    @wesleyrm 3 роки тому

    GREAT! Thanks! You and Fireship are the keepers of true UA-cam. I will keep bloating the web with JavaScript though lol. Unless someone instructs me on an alternative... Web development can't just be halted.

  • @gcoolbud
    @gcoolbud 6 років тому +1

    Hey Luke, love the vids man... One thing tho.. why dont you use ZSH? combined with oh-my-zsh its a bomb... just advising... not starting shell war.. do check it out..

  • @lordadamson
    @lordadamson 6 років тому +13

    tl;dr reduce io as much as you can. keep it in memory.
    thnx luke

    • @GooogleGoglee
      @GooogleGoglee 4 роки тому

      Can you make an example? What it is TL;DR?

    • @coffeedude
      @coffeedude 4 роки тому +3

      @@GooogleGoglee it means too long didn't read

  • @Swipe650
    @Swipe650 6 років тому +9

    House tour next vid

  • @wil7vin
    @wil7vin 6 років тому +3

    Thanks bruh I have been a brainlet, I used python like a pleb

  • @christiansacks9198
    @christiansacks9198 4 роки тому +1

    Hi Luke, how did you go from the old script to the new one? Like what was your thought process and how did you discover the better way?

  • @apolloapostolos5127
    @apolloapostolos5127 Рік тому

    What you see here is the logic applied when using NIC firmware to filter packets instead of firewall-software. Filtering packets via the hardware itself (not a firewall) is quicker.
    .
    That’s a Firewall method I learned on UA-cam.
    .
    You edit the stream; not a buffer.

  • @leonardocafferata6697
    @leonardocafferata6697 6 років тому +53

    TL;DR: C programs are faster than shell ones...

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому +21

      I didn't read that anywhere from this video.

    • @leonardocafferata6697
      @leonardocafferata6697 6 років тому +1

      Luke Smith haha. nice comeback

    • @0morq0
      @0morq0 6 років тому +4

      this shell script just calls c programms :)

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому +37

      Lol well it should be "calling C programs efficiently is better than calling C programs inefficiently."

    • @leonardocafferata6697
      @leonardocafferata6697 6 років тому +17

      Luke Smith actually, the issue here is the main issue with most bash scripts( or python scripts, matlab scripts, etc). You need to do most of the work with the 'primitives', leaving the script language with the least amount of code. In the case of bash, that would be the actual programs, in this case awk. Instead, ppl whi dont knowing the programs very well, pass that extra load of work to a bash algorithm, which ofc is several times slower than the command(awk).

  • @VincentM_01
    @VincentM_01 6 років тому

    Influencing me to applying programming concepts in bash.

  • @233kosta
    @233kosta 2 роки тому

    I mean the loop is still there, you can't really get away from it, but now it's being done by very efficient compiled code rather than the shell interpreter

  • @redactedredacted6784
    @redactedredacted6784 5 років тому +3

    Zdzisław Beksiński! Hella!

  • @repomansez
    @repomansez 6 років тому +22

    I'm better off not writing shell scripts
    It helps with my mental sanity

  • @wemusthavechannelstocommen619
    @wemusthavechannelstocommen619 2 роки тому

    >has a Beksinski directory
    muy basado

  • @TzLTriiCkzZ
    @TzLTriiCkzZ 5 років тому

    I'm not proficient in sed but surely there's a way to just get the start of the line to the first # (Or one that can allow for escaped hashes like those in a string or escape them in another command) this way you can have comments at the end of a line.

    • @LukeSmithxyz
      @LukeSmithxyz  5 років тому

      maybe someone demonstrated that in another video:
      ua-cam.com/video/QaGhpqRll_k/v-deo.html

  • @leeroyescu
    @leeroyescu 6 років тому +3

    You know something doesn't sit well with me on the topic of working with computers. If coupling to other parts of the software is a code smell, coupling to your own brain must by definition be even worse! What I'm getting at is we must cease this practice of setting up, remembering and maintaining shortcuts and folder structures. The minute something happens to your system it's like you've lost a limb.
    The organization structure ought to be something external, published, established collectively or by authorities (think ontologies and vocabularies), and the retrieval flexible, fuzzy. Creating an experience of things just turning up where you expect them. Of course remembering *a → ~/Articles* isn't really a problem but we can get really lost in internalizing the wrong information - arbitrary tooling boilerplate instead of problem domain meat & potatoes.

  • @Linuxdirk
    @Linuxdirk Рік тому

    I think I have to update some of my scripts now ...

  • @TheRealFaceyNeck
    @TheRealFaceyNeck 6 років тому +2

    Damn, really impressive video, dude!
    You sure make a lot of great content related to computers for someone who doesn't like computers.
    I'm a musician, and I often said I didn't "like" to play them, it was just what I did. I wouldn't be a musician if I didn't play instruments and write music. Seems like you must've spent quite some hours learning how to do all of this. Do you consider computers a requisite for your life, and hence more of a chore than as a thing to enjoy using/hacking/customizing?

    • @AnastasisGrammenos
      @AnastasisGrammenos 6 років тому

      If you do any information processing at all, linux and a working knowledge of it is a must!

    • @TheRealFaceyNeck
      @TheRealFaceyNeck 6 років тому

      I agree!
      ...I also didn't put out a video with the title of 'I Hate Computers!...'
      Like a certain **ahem LUKE ahem** did, for example.
      I love computers. I love Linux. There is simply no user-interface feature in Windblows that I enjoy. I just fucking hate everything about it.
      ...but I don't hate computers. I just hate Windblows.

  • @rasix86
    @rasix86 6 років тому +8

    26 times... gazillion times. whats the difference :)

  • @assombranceanderson6175
    @assombranceanderson6175 4 роки тому

    Infected Mushroom, good stuff

  • @patrickprucha5522
    @patrickprucha5522 10 місяців тому

    nicely done!!!!

  • @mattcargile
    @mattcargile 2 роки тому

    There has to be a loop somewhere in those programs or at least reading the file one time.

  • @JeremySmithBryan
    @JeremySmithBryan 4 роки тому

    Nice. I would imagine the sed and awk commands could be combined, but would have to look that up.

  • @taimurpathan5837
    @taimurpathan5837 5 років тому +1

    Whats the terminal emilutar you're using?

  • @maikarusan5098
    @maikarusan5098 5 років тому +1

    you gotta fix your colors in i3 so you don't get that weird blue line on the right when you only have 1 window open my dude

  • @ahmadalwazzan384
    @ahmadalwazzan384 6 років тому +8

    Rule number 1 of System Administration: Don't do premature optimization. before you optimize ask yourself, who is going to benefit if my script runs a little faster?

    • @alex2143
      @alex2143 2 роки тому

      Is that a rule for sysadmins or for developers?

  • @greob
    @greob 6 років тому +2

    Can't you do what `sed` does with `awk` too anyway?

  • @kygr
    @kygr Рік тому

    in the current state of your script, you traded execution speed vs readability.
    i never watched at awk but i will know, so thanks for that, but your new script is less maintainable.
    So i would try to improve there, but still a good and informative video

  • @sharktamer
    @sharktamer 6 років тому +1

    Is this still i3? I just don't understand how the split window works in terms of sizing and stuff, gotta figure it out.

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому

      By default, i3 has a "resize mode" that temporarily rebinds keys to resize commands. Check the i3 config for these. I remap them to mod+shift+Y/U/I/O because I don't want to bother with the different mode.

    • @sharktamer
      @sharktamer 6 років тому

      Ah thanks. It's not something I do too often so I've just been using the mouse.
      I guess what I should have been asking is how have you organised in a grid layout? I thought i3 could only do either rows, columns or tabs.

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому

      I'm not sure what you mean. The "grid" is just two columns with two rows in each.

    • @rexevan6714
      @rexevan6714 6 років тому +3

      It's i3-gaps tho. Gaps will take care of the grid ifI understand your question correctly

  • @hammerheadlemon
    @hammerheadlemon 5 років тому

    Really useful, thanks.

  • @nxxxxzn
    @nxxxxzn 4 роки тому +2

    omg dont `cat | grep`

  • @mtothem1337
    @mtothem1337 6 років тому

    Will we get a "Don't be a Bash scrub (Coding is all about programming!"?

  • @danke5356
    @danke5356 6 років тому +3

    *Lit* 👌🏻

  • @fatalshore5068
    @fatalshore5068 4 роки тому

    UA-cam brought me hear after watching videos from the youtube channel 'brainlet'. Lol....

  • @vkb967
    @vkb967 3 роки тому

    If you place your comments in a third column, you could get rid of the sed command.

  • @jessewilson8042
    @jessewilson8042 6 років тому +1

    If you're already committed to minimizing your use of control structures in your shell scripts, you may want to consider writing portable shell with #!/bin/sh as your shebang, and switching your /bin/sh symlink over to dash, the Debian Almquist shell
    en.wikipedia.org/wiki/Almquist_shell
    wiki.archlinux.org/index.php/Dash
    You'll lose a lot of high level shell conveniences that you weren't planning on using anyway, and you'll get huge speed gains:
    unix.stackexchange.com/questions/148035/is-dash-or-some-other-shell-faster-than-bash
    This may even boost startup times (Ubuntu thinks it does) and the icing on the cake is that your shell scripts will be more universal - So I think this should be right up your alley. The only real downside is that your #!/bin/sh scripts won't be internationalized - but most of your bash scripts probably aren't internationalized anyway.

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому +1

      I've been gradually converting my scripts to be strictly posix compliant and sh compatible. If I ever do switch to dash it'll happen later though. There are some bashisms I feel it hard to live without.

    • @jessewilson8042
      @jessewilson8042 6 років тому

      Which ones?

    • @LukeSmithxyz
      @LukeSmithxyz  6 років тому

      here-strings, for example, which I don't think exist in pure POSIX standards, but I may be wrong.

    • @jessewilson8042
      @jessewilson8042 6 років тому +1

      Yes, it seems that here-documents are portable, but here-strings are not. I just bought the book "Classic Shell Scripting" with the intention of learning portable shell. After I finish reading that, I hope I'll be able to answer the question of whether there is anything truly indispensable about bashisms. My suspicion is that there is not, and that writing portable shell will turn out to be a win-win in >95% of cases.

    • @yash1152
      @yash1152 Рік тому

      > _"After I finish reading that, I hope I'll be able to answer the question of whether there is anything truly indispensable about bashisms"_
      hey jesse! any updates?

  • @geoffl
    @geoffl 4 роки тому

    1. make it work
    2. make it fast

  • @morra82
    @morra82 3 роки тому

    i'm a bit late to the party and possibly someone already suggested this or you changed your approach or whatever but you could check for comments in awk instead of piping through sed by "$0~!/^#/{}" < $folders >> $output

  • @marinacabrera9319
    @marinacabrera9319 6 років тому +1

    bring the forum back

  • @GodforsakenHorizon
    @GodforsakenHorizon 5 років тому

    I'm kinda late to this discussion but why do you use awk at all?
    You could do all this with sed and something like -nE '/^#/d;s/^(\S*)\s*(\S*)$/\1 \2/gp' which deletes all comments and matches first and second column in different groups which you can directly use to for the output the same way you do with awk and $1 and $2

  • @yallaoui
    @yallaoui 5 років тому

    Hi Luke
    How could you remove all comments from a latex file with a script? Many thanks

  • @bobus_mogus
    @bobus_mogus 6 років тому

    What OS you use?

  • @okdoomer620
    @okdoomer620 4 роки тому +1

    So of all the qualities some peace of code can have... You're picking speed as important for shell scripts?

    • @yash1152
      @yash1152 Рік тому +1

      speed _is_ important for shell scripts - cz they are used repetitively more than any thing. They are building blocks.

  • @fuzzybyte
    @fuzzybyte 5 років тому +6

    Bash is an abomination of a language

  • @ChozoSR388
    @ChozoSR388 9 місяців тому

    Did this dude really just call LaTeX "Lah-tek" Yes, let me just put on my lah-tek gloves.

  • @or2kr
    @or2kr 5 років тому

    Tfw someone complains about slow scripts but doesn't bother using time !!

  • @michallasan3695
    @michallasan3695 2 роки тому

    Seeing this, I would not give you a job, writing such wide lines is against good conventions and readability, also you know little about performance optimization, many of your calls could be shrunk together: sed into awk, cat into grep, not even mentioning that you do not use coproc instead of subshells. You seem not to understand how big performance burden creating a subshell is. I also wonder: is your tab indeed wide enough?

  • @xBZZZZyt
    @xBZZZZyt Рік тому

    me who use python3 scripts:

  • @hacker2ish
    @hacker2ish 6 років тому

    When will you ever need this script to be fast?

    • @flambo1500
      @flambo1500 6 років тому +3

      That's what microsoft probably said when they were designing... anything really

  • @iLiokardo
    @iLiokardo 5 років тому

    I think it's difficult for you to talk because your keyboard has such heavy switches.

  • @junfever3657
    @junfever3657 6 років тому

    It seems honest people like me cannot make good comments if I've no memes.... 😭

  • @overclucker
    @overclucker 5 років тому

    Now rewrite it in a different language.

  • @peumofran2278
    @peumofran2278 6 років тому +2

    I have an abnormally sensitive perception of time and your new script was already too slow. This is meaningless.