Piping `find` to `while` - You Suck at Programming

Поділитися
Вставка
  • Опубліковано 29 кві 2024
  • You Suck at Programming
    Website → ysap.daveeddy.com
    #programming #devops #bash #linux
    #unix #software #terminal #shellscripting #tech #stem
  • Наука та технологія

КОМЕНТАРІ • 26

  • @declansnyder2281
    @declansnyder2281 Місяць тому +12

    Key takeaway here: don't put newline in file name, what psychopath does that

    • @yousuckatprogramming
      @yousuckatprogramming  Місяць тому +4

      newline characters in file names on unix

    • @mehmetnejataydin6776
      @mehmetnejataydin6776 27 днів тому +2

      But someone might do it unintentionally or maliciously. Shell script writers must take into account those corner cases.

    • @lhpl
      @lhpl 23 дні тому

      ​@@mehmetnejataydin6776 So true! I would hate to be a user on a system where the sysadmin didn't take things like spaces, newlines and other weird stuff in filenames seriously.
      This includes Unicode UTF-8 normalization btw.

  • @extrageneity
    @extrageneity Місяць тому +7

    Episode 4, Piping `find` to `while`.
    Key concepts from this episode:
    1. Continuing to show the "unintended consequences of word splitting" that previous episodes have centered around.
    2. Showing that one workaround for this is changing the delimiter your input command uses, and then specifying that delimiter in the 'read' command via the -d switch.
    3. Using the null chararacter (which is the default Unix end-of-string marker) as a delimiter.
    4. Revisiting the 'pipe to a bash construct' example from episode 3, this time showing one of the common pitfalls of doing so: data modifications made in a subshell cannot be seen either in other subshells or in the parent shell. This problem isn't unique to bash, it actually happens between parent and child processes in any programming language that uses a fork() based model for threading. fork() and what it does is discussed in Unix/Linux manuals alongside exec(), because a fork() often immediately precedes an exec().
    fork() is a system call which causes the Linux operating system to make an exact duplicate of the running process, with all the same memory pointers, address table, code images, call stack, everything. Some key differences are: that the duplicated process has the original process as its parent; that the duplicated process has a duplicated address table with copy-on-write memory; that in the parent process, the fork() system call returns the child PID, and in the child process, the call returns 0. (You often see in C or Perl code something like: if(pid = fork()) { ... parent process logic, often including a wait() on the pid } else { child process logic }.)
    Copy-on-write memory is what we're seeing in play here. The child process--the bash subshell executing both the while loop, and the one read command per iteration of that loop--has inherited the $i variable from the parent process, but has initialized it to 0. In the bash subshell, when you modify $i, the Unix/Linux kernel causes that child process to get a fresh copy of the memory underlying $i, and any modifications to that variable are only applied to the new copy of $i which exists in the subshell itself. When the subshell terminates and the parent process resumes execution, the original $i, initialized to 0, is still intact, having never changed at all.
    Dave's main point here is that choosing the right approach in a bash script (in this case, getting a list of files via glob instead of via shell-out) lets you have a simpler script than you need to make with the wrong approach. The theme of this channel is You Suck at Programming. Recognizing when it's time to refactor toward simplicity is one of the ways you can learn to suck less.
    But, for fun, let's talk about some of the other ways we could have preserved that find+while read solution while absorbing Dave's new requirement to count the printed files:
    1. Instead of pipelining the find command into stdin of the child process, we could have instead redirected stdin from a subshell containing the find command. This looks like:
    while read -d ''
    do
    echo "file is "
    done <

    • @killua_148
      @killua_148 Місяць тому +1

      Thank you! In the '''while read ... f)''' code I think you forgot the -print0 argument

    • @extrageneity
      @extrageneity Місяць тому

      @@killua_148 You're right, I will edit the comment to reflect this.

  • @sampson623
    @sampson623 Місяць тому +3

    Good channel. Thank you UA-cam recommended

  • @lollllloro
    @lollllloro Місяць тому +3

    I used to use for loops for a lot of this stuff, but find is *so much more flexible* and you can also eliminate the problem mentioned by executing the find in the subshell instead of the loop:
    while IFS= read -rd '' file;do ...; done <

    • @lollllloro
      @lollllloro Місяць тому

      That was meant to be ls [hyphen]od [hyphen][hyphen] "${files[@]}"
      The double hyphen is there to terminate ls's arguments so it still works if something in the files array starts with a hyphen.

    • @yousuckatprogramming
      @yousuckatprogramming  Місяць тому +2

      i love all of this - welcome to the channel. your comment is exactly the kinda knowledge i want to spread here

  • @marcsfeh
    @marcsfeh Місяць тому +3

    very based and bash pilled. You made my gyatt programmaxx

    • @extrageneity
      @extrageneity Місяць тому

      I gave this comment a like just because, as a bash enthusiast, I appreciate a little good mind poison. 🙂

    • @yousuckatprogramming
      @yousuckatprogramming  29 днів тому +1

      we love the brainrot here

  • @BenjaminWheeler0510
    @BenjaminWheeler0510 29 днів тому

    Ooh, that subshell trick caught me. Bash is certainly one of the shell scripting of all time.

  • @user-sd4dw6eb5i
    @user-sd4dw6eb5i Місяць тому +5

    ok but find goes through nested subdirectories and the for loop does not

    • @extrageneity
      @extrageneity Місяць тому

      The find will also match files beginning with a dot and the glob will not, unless you set the right shell option (dotglob, IIRC)

  • @lizard450
    @lizard450 Місяць тому

    Fun series.

  • @md2perpe
    @md2perpe 24 дні тому

    Here you only have ordinary files directly inside a directory. For that case `find` is overkill. But when you want to find all files matching some condition in a directory structure `find` is good, and then you can not use `files/*`. Give an example of how to write that in the best way instead!

  • @andreichicu2799
    @andreichicu2799 28 днів тому

    useless but I love it
    also, aren't bashisms bad?

    • @yousuckatprogramming
      @yousuckatprogramming  26 днів тому

      not inherently. they’re bad if you’re trying to be maximally portable to systems that may not have bash.. but they aren’t bad in themselves