Why Files Become Bigger in Emails - Computerphile

Поділитися
Вставка
  • Опубліковано 13 чер 2024
  • To send binary files via a text based system, they'll need encoding. Dr Steve Bagley takes us through the attachment system used in email.
    name change, formerly "Why Attachments are Larger in Emails"
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

КОМЕНТАРІ • 190

  • @deeef13
    @deeef13 3 роки тому +126

    Haha I experienced this exact problem many years ago when attaching a 20 MB file, clearly within the limits of Gmail's 25 MB limit, and it kept getting rejected. Fast-forward hours of debugging, sending myself test e-mails, and analyzing headers... I discovered how grossly antiquated e-mail systems are, even to this day.

    • @otm646
      @otm646 3 роки тому +23

      It's not antiquated, it's perfectly fit for purpose. The designers never imagined you would be sending such large files, it wasn't designed for that. Plus there have always been better, easier and faster ways for transferring large files even 20+ years ago.

    • @EVIL0033
      @EVIL0033 3 роки тому +5

      Yea sometimes if the attachment is too big, I would generate a S3 read only signed url in the email instead of sending the attachment

    • @EVIL0033
      @EVIL0033 3 роки тому

      @@otm646 I really hope email can support full css set and html elements sigh...right now style formatting in email is table in table in table

    • @glitchy_weasel
      @glitchy_weasel 3 роки тому +6

      Worst case perhaps you can use some compression program that can split the file in different chunks and send them in different emails. Of course, the recipient would have to use that same program.

    • @uganasilverhand
      @uganasilverhand 3 роки тому

      Try writing your own mime64 partitioning decoder....

  • @jeromesnail
    @jeromesnail 3 роки тому +113

    I've always wondered what were those equal signs at the end of a base64 encoded string :)

    • @markmcdonnell
      @markmcdonnell 3 роки тому +2

      This ☝️

    • @jamess1787
      @jamess1787 3 роки тому +5

      That's the only way I know that its B64 encoded....
      Now I know!!!

    • @citizendot1800
      @citizendot1800 3 роки тому +3

      New line is encoded as "Cg=="

    • @katanasteel
      @katanasteel 3 роки тому +1

      they are padding, and signals that the "decoded byte" are 0 and are to be discarded

    • @katanasteel
      @katanasteel 3 роки тому +1

      @@citizendot1800 correction it is 'Cg=='
      Because you need 3 bytes to start the encoding and end off with 4 bytes
      000010 10.0000 0000.00 000000
      C g = =
      On the otherhand if you started out with a newline and 2 bytes with value zero as your input. Which has the same bit pattern ad above, the beas64 encoded string would be:
      'CgAA'

  • @pasan.
    @pasan. 3 роки тому +45

    Engineer - use email for text, ftp for files. User - I will put my cat videos in email thank you!

    • @Fiech00
      @Fiech00 3 роки тому +6

      Well what you can take away is to make your system as agnostic as can be. You'll never know what users want to use it for and what lurks in the future. Limiting themselves to text was of course easy in their respective framework, but basically made everything harder beyond the "indented use". And I put this into quotation marks for a reason. Convenience always wins. If I want to want to send cat videos via mail that is what I am going to do. I'm not waiting for a new service to emerge that can also accommodate other file types.

  • @ZedaZ80
    @ZedaZ80 3 роки тому +8

    I learned how to encode emails last week so that I could write a program to make it easier to attach files when sending emails from our server :D It was so satisfying to send a pdf and have it come out right.

  • @pev_
    @pev_ 3 роки тому +30

    Yeah, I remember using uuencode/decode for some things in emails and newsgroups in my university years in the 90's. The Unix email programs did not separate between main text and attachments, so the uuencoded part usually was just seen after the end of the natural language part as a large block of random-looking characters of equal line length :)

    • @phizc
      @phizc 3 роки тому

      The UU encoded part should have started with
      "begin ♤♡◇ filename" where ♤♡◇ is the Unix file mode (execute, read, write for owner group and others), e.g. 740.
      It should end with "`○end○" where ○ is new-line (Carriage Return, followed by Line Feed)
      But yeah, it does look like random characters 🙂.

  • @AlanCanon2222
    @AlanCanon2222 3 роки тому +18

    This is a real conundrum. I've generated email attachments programmatically recently enough that I'm not sure I need this trip down memory lane... oh, who am I kidding? [presses play].

  • @mtheos
    @mtheos 3 роки тому +9

    Base64 encoding is 4/3 times (33%) larger. It bothers me that they wouldn't just say you can have attachments up to 20Mb in size but really allow 27Mb (or say you can have 25Mb and allow 34Mb).

  • @EddyGurge
    @EddyGurge 3 роки тому +7

    I could really have used this video some 30 or so years ago! I still really enjoyed it :)

  • @iammaxhailme
    @iammaxhailme 3 роки тому +30

    I wish emails would go back to being 78 characters. I'd get a lot more work done if I didn't have to read so much junk.

  • @martinbean
    @martinbean 3 роки тому +1

    I love these videos. Watch for one thing and accidentally learn how the Base 64 algorithm works. Used it for nearly 15 years, but never bothered to look into the algorithm under the hood. Summed up in an easy to understand explanation in seconds in this video!

  • @barrotem5627
    @barrotem5627 3 роки тому +2

    I'm so glad I'm subscribed.

  • @orlovsskibet
    @orlovsskibet 3 роки тому +6

    I use Base64 a lot in my daily job, but never cared to find out the details about it, or where it came from. Another nice video - thanks a lot.

  • @idjles
    @idjles 3 роки тому +9

    I used base64 back in 1989, and I still use it everyday in my job today to push binary via JSON or XML.

    • @iabervon
      @iabervon 3 роки тому +3

      Ironically, given the title, I use base64 primarily when someone embeds a large XML file in JSON. We're doing it specifically for space reasons, because their file compresses to a binary much less than 3/4 the size, so it's very worthwhile to do that and then expand it by a factor of 4/3. It's also ironic that we're using base64 in a format with many many more than 256 characters available, but base64 is more compact than using characters above 127 in utf-8.

    • @idjles
      @idjles 3 роки тому

      @@iabervon and today is just saw they my private key for Amazon Web services is also base64 encoded.

    • @qwertyTRiG
      @qwertyTRiG 3 роки тому

      @@iabervon Would gzip and BSON work for your use case?

    • @iabervon
      @iabervon 3 роки тому

      @@qwertyTRiG It looks like it would be possible. On the other hand, it's part of a REST API, and it's nice that I can just look at the rest of the fields in it, and copy/paste the response to etherpad and back, and extract the file contents if I really want to.

    • @josephgaviota
      @josephgaviota 2 роки тому

      @@qwertyTRiG _Would gzip and BSON work for your use case_
      NO, because it's not 7-bit clean; the whole POINT of the video.

  • @banananaa
    @banananaa 2 роки тому +1

    Could you do a video about the time when napster was used to share software by encoding files into mp3

  • @Omnifarious0
    @Omnifarious0 3 роки тому +1

    You need to talk about the matter transport mime type. Also, you have to worry about the email going through some really bizarre re-encodings. Sometimes, for example, email would go through a system that used one of the variants of EBCDIC. And you went on to mention that. 🙂

  • @An.Individual
    @An.Individual 3 роки тому +7

    5:39 it really looks like he is reading the binary from a sheet in his lap

  • @_ingoknito
    @_ingoknito 3 роки тому +7

    "don't make eye contact, don't make eye contact ..." 😆

  • @olivier2553
    @olivier2553 3 роки тому +5

    Cool, I did not know about the = at the end, but I never cared to look either:) Thanks

  • @willemvdk4886
    @willemvdk4886 3 роки тому +5

    Then how does Unicode work if mail only supports 7 bit ASCII? Same way? Encoding the UTF16 into Base64 and send the entire message as attachment?

    • @Bozacar
      @Bozacar 3 роки тому +3

      It usually uses utf-7 encoding

    • @willemvdk4886
      @willemvdk4886 3 роки тому

      @@Bozacar haha no I mean UTF16 it higher. I mean special characters. Nog the usual ASCII characterset that's encoded in UT7/8 (which retains backward compatibility with ASCII, same values)

    • @Whelmed.
      @Whelmed. 3 роки тому

      Works the same way. A 2-byte UTF-16 char, for example, is 16 bits. Those 16 bits are grouped into 6-bit chunks (as shown in the video) and converted to ASCII characters. Both the original UTF-16 char and the Base64 encoding are just representations of binary data.

    • @max_kl
      @max_kl 3 роки тому +2

      I just tried it: There's a header "Content-Transfer-Encoding: quoted-printable" and the text then looks like this: "T=C3=A4st" (instead of "Täst"). It looks like the two UTF-8 bytes are encoded with an equals character and the byte value in hex.
      You can easily observe that yourself since most email programs/apps have something like "Show raw data", "Show headers" or similar

    •  3 роки тому +1

      @@willemvdk4886 special characters can be encoded using UTF-7, like Bozacar said. UTF-16 encoding can't really be used in email (in theory you can, but in practice it's difficult (certainly at least one more encoding layer, like base64, would be needed, at which point it's starting to get wasteful)), generally you either use UTF-7, or you use UTF-8 encoded as Quoted-Printable or base64.

  • @BasedPeter
    @BasedPeter 3 роки тому +4

    Phillip Seymour Hoffman still alive and even younger i see!

  • @thisisthefoxe
    @thisisthefoxe 3 роки тому

    Question: Does todays email still have a character limit? And if so, is your text auto truncated somehow?
    Answer: Yes. In my test it was 75 characters and then the a "=" sign, followed by a line-break and in the next line it just continues. The client just displays it as one paragraph and word-breaks automatically depending on window and text size.
    I would've loved to hear tho why? Why not just create a newer standard and have everyone adapting? Sure, backwards-compatibility but that didn't stop countless other improvements and updates.. why do we still use this email format?

  • @Fiech00
    @Fiech00 3 роки тому +2

    Ok, but what's kind of missing here is the information that SMTP is transmitting 7bit ASCII chars (but every char only uses 6 bit of the original binary data), which is actually what increases the file size by about 30%. The padding bytes alone would only account for at the most 2 bytes. Unless I missed something in the video...

  • @geralt9036
    @geralt9036 3 роки тому +8

    Understandable, have a great day!

  • @amaarquadri
    @amaarquadri 3 роки тому +1

    How does the receiving email client know that a file has been sent in the first place? How would it distinguish the base64 encoding from the actual text of the email?

    • @RonJohn63
      @RonJohn63 3 роки тому +4

      Part of the email hidden from the users specifies where the attachment starts, the name of the file, etc.

    • @Faladrin
      @Faladrin 3 роки тому +2

      Ahhh, the magic of e-mail.
      The question isn't related to this video at all. This video is about the format of e-mails rather than the sending and receiving of e-mail.
      E-mail is a very flexible setup which could easily be it's own video or maybe a few. For instance a program that wants to send an e-mail typically will use a protocol called SMTP. Normally such a program would connect to the user's e-mail server and send the mail. That server would then connect to the destination user's e-mail server and pass the e-mail along and then that server would store the message. The destination user's e-mail client would use another protocol to retrieve the e-mail (the standard protocol is POP3).
      The scheme technically allows the source program to connect directly to the destination user's mail server to send the message. In practice today this isn't usually allowed. Most mail servers will only accept mail from other mail server, and usually only ones which can identify themselves via certificates (look up the videos on SSL as they are basically the same thing). Mail servers also keep track of incidents where a particular foreign mail server has been sending lots of junk or malicious mail and will block them. This puts a real world cost behind sending too much bad mail as it costs real money to purchase new certificates from third the trusted third party vendors like Verisign. I'm not sure if Verisign and Thawte etc eventually will refuse to sell new certificates to known bad actors or not, but at least it does add some cost to that kind of behavior.
      Also, the sending program or the sender's mail server could connect to some other mail server completely and send the mail through them. The protocols allow for that, but it isn't typically allowed by how mail servers are coded. It might have made sense to do that in the past when mail servers might go up and down a lot or network connection could be unreliable, you could setup mail servers to check with each other for mail they should retrieve for their users and so if you had an outage of a mail server you only really needed to get the mail to some mail server and it would eventually get to the right destination, but that isn't a very scalable setup and it is prone to abuse, so again servers don't really allow for that anymore.

    • @qwertyTRiG
      @qwertyTRiG 2 роки тому

      Pick any email you've received which contains formatting or attachments, and view the mail source. (This may be under an option called "Show Original".) MIME is the magic which divides an email into sections and labels each section with a file name and type.

  • @Mr.Derpus
    @Mr.Derpus 3 роки тому

    awe yeah

  • @chelseablues9980
    @chelseablues9980 3 роки тому

    Bagley the Wise 🧙‍♂️

  • @misophoniq
    @misophoniq 3 роки тому +2

    Ha, I already knew this! Feeling so smart right now! ;-) Don't worry, the feeling will probably be gone with the next Computerphile video... :-P

  • @angaj
    @angaj 3 роки тому +2

    I understood there video. Is there any way I could test this out in person?
    Also in future videos of you could add a small real world example it would be so great.

  • @chicoktc
    @chicoktc 3 роки тому +3

    If H is 8 and K is 10, which letter did he kill? I or J?

    • @DavidLindes
      @DavidLindes 3 роки тому

      I mean, since 10 is normally J (which I’m a little surprised he (in particular) didn’t know (I don’t generally expect this, I just somehow imagine his experience to overlap enough with my own that it’s surprising when I find differences), given my tendency to type ^J as a way to get a
      in some contexts), I figure J is the one that got killed. Though if it was killed with severe prejudice (SIGKILL), I suppose that might mean I? (Because of the 9s involved)? 🤣
      Glad I wasn’t the only one to notice this. :)

    • @chicoktc
      @chicoktc 3 роки тому +1

      @@DavidLindes I did not understand a thing you said haha
      But was a funny blunder

    • @DavidLindes
      @DavidLindes 3 роки тому

      @@chicoktc basically, on Unix and Unix-like operating systems, the control characters (e.g. control-C, which I’ll shorten as ^C, among others) are mapped to the ASCII values 1-26... so ^C is ASCII 3, ^D is ASCII 4, etc. that means ASCII 10, the newline character, is ^J. And ASCII 13, carriage return, is ^M. Etc. because of this, and my many years of working on (sometimes-poorly-configured) such systems, somewhere along the line, I came to know that J was 10, without having to think about it or count letters or anything. Does that help? If not, just know this is probably unimportant for most people to understand, so it’s fine if you don’t. Just trying to make sure I’m being as clear as possible. :)

    • @chicoktc
      @chicoktc 3 роки тому

      @@DavidLindes it helps hahaha. I'm just happy my joke had more meaning than I anticipated

    • @ZipplyZane
      @ZipplyZane 3 роки тому +1

      I just assume that his base32 example started with A=1, while his base64 example started with A=0. Or he made a mistake in the base32 section.

  • @MaxDiscere
    @MaxDiscere 3 роки тому +214

    So sum up a 18 min video in 2 words: it's base64

    • @omkhard1833
      @omkhard1833 3 роки тому +1

      is it identity content-encoding or gzip base64

    • @TheJamesM
      @TheJamesM 3 роки тому +22

      I mean, if you want to be a smartarse, sure, but this also gives some background and explains how it works. Plenty of people don't know about base64, and of those who are aware of it there's got to be a good amount who don't know how it works.

    • @williamrutherford553
      @williamrutherford553 3 роки тому +12

      I mean, even I know what base64 is but this video doesn't just say that, they go into why they chose it, how it works, etc. Still very helpful, even if you already have used base64 in the past. You might use PDF files all the time, doesn't mean you know the intricacies of how it's encoded.

    • @landsgevaer
      @landsgevaer 3 роки тому +4

      Sum up years of computerphile: it's base2.
      Please move to the next channel, nothing to see here...
      ;-)

    • @IIARROWS
      @IIARROWS 3 роки тому

      You just changed the title of the video XD

  • @Seltyk
    @Seltyk 3 роки тому +1

    Is most base64 encoding A-Za-z0-9+/ or is most 0-9A-Za-z+/

    • @phizc
      @phizc 3 роки тому +2

      The first one. There are variants that have different symbols for the last two. If you had to make a filename with a base64, you couldn't use / and + is iffy, so you'd use _ and - respectively. But all standard base64 variants use A-Za-z0-9 for values 0-61.
      UU encoding is different. It uses the the ASCII symbols with numerical values 63-127 instead. It avoids lookup tables at the cost of having a lot of hard to type characters instead, but since you'd use a program to encode and decode it's OK.

  • @juneguts
    @juneguts 3 роки тому

    wait they are?

  • @NoEgg4u
    @NoEgg4u 3 роки тому +6

    gpg has an:
    --armor
    option.
    How does that work?

    • @glitchy_weasel
      @glitchy_weasel 3 роки тому

      Bump.

    • @fllthdcrb
      @fllthdcrb 3 роки тому +1

      Base64 encoding. What, you can't tell? It doesn't even use a different alphabet (i.e. the set of characters in the lookup tables; some systems do change it, for example by changing the + and / to something else, etc.) compared to MIME. There are also some headers unique to PGP, but I don't think those are hard to understand.
      BTW, I don't think bumping is useful on UA-cam. It doesn't sort by age of last reply.

  • @DarkLight748
    @DarkLight748 3 роки тому

    So that's why when I wrote an email bot the messages had random new lines everywhere.

  • @buffuniballer
    @buffuniballer 3 роки тому +12

    E-mail systems sent 7bit text. So binaries are (or were, I've not kept up with this) converted to 7 bit text characters. Three 8bit bytes become 4 7 bit text characters. Therefore, attachments expand when sent via e-mail.
    But that might be some "okay Boomer" as this is how it was back in the 1990s when I administered Unix based sendmail SMTP gateways.

    • @autohmae
      @autohmae 3 роки тому +2

      Nothing much changed, but it's good to mention MIME and multipart-mime and bas64. They are all extensions on what came before. After that add SSL/TLS for encryption and that's it. All the other stuff which is done is all for spam/virus/scam filtering.

    • @Dsiefus
      @Dsiefus 3 роки тому +5

      Every 3 bytes (24bits) is converted to 4 groups of 6 bits (not 7). And yes, this is still how email sends binary files.

    • @buffuniballer
      @buffuniballer 3 роки тому +4

      @@Dsiefus I was thinking of the end result which is 3 bytes turn into 4 bytes before transmission, but didn't say it well. Bottom line, attachments grow by about 33%
      Thanks

    • @romainpwn
      @romainpwn 3 роки тому

      @@Dsiefus There's the BDAT extension that can be used in coordination with BINARYMIME to transfer binary data but a lot of servers and middle boxes don't like it and disable it.

  • @PrivateSi
    @PrivateSi 3 роки тому +1

    What happens when you build standards up from legacy standards instead of starting from scratch.. Having said that, my 'Uni-Text' protocol wish is a compressed, tokenised format that extends Unicode by sacrificing control characters. I know Unicode tried to reuse them for glyphs and rightfully failed, but I reuse them to indicate the next few bytes until a byte >127 is found form a standard, global dictionary index.
    --
    The ironic thing is, all internet end user devices have built in dictionaries that take up less space than a 'font family' that includes all unicode glyphs. Unicode hugely bloated text transmissions, UNITEXT solves this, while enabling far faster word matching, spell checking and thesaurus in a GLOBAL TEXT STANDARD.. It's easier to encrypt too, and can still benefit from the predictive compression built into some network subsystems.. A local dictionary extension is also possible for more compression, by using extra control codes (0..31).
    --
    8 local, 8 global control characters provide the first 3 bits of the index
    5 extra used to indicate:
    'No Space' (as default is to add a space)... means 'overlay next glyph over last' if preceding a glyph code, not a word index
    Caps First, ALL CAPS, and No Space+Caps First, No Space + ALL CAPS..
    This is appended to the start of the word and stored in the local dictionary as a word index + char code.
    Words are stored backwards in a fast dictionary lookup tree which keeps the size of the tree down.

    • @qwertyTRiG
      @qwertyTRiG 2 роки тому

      Do you have a full spec for UNITEXT somewhere?

    • @PrivateSi
      @PrivateSi 2 роки тому +1

      I dug it out.. This extends / replaces the Unicode UTF-8 spec (same bit to indicate if the last/only char or one of a string.. It's cooler than I remembered.
      --
      0: Null / New Line (LF+CR or CR+LF of standard ANSI can be used as an alternative)
      1..16: Dictionary Word
      17..28: Latin diacritic overlays
      29: Tab
      30: Overlay over last
      31: RLE start (next byte is count, then code)
      Rest follows Ansi-128.
      --
      There are other versions that fit 'Tab' in better but they are more complicated and split the section up. This is more idealised.

    • @qwertyTRiG
      @qwertyTRiG 2 роки тому

      @@PrivateSi Huh. I do read tech specs for the fun of it sometimes, but it's been a while since I've dug into Unicode. I'm not sure I'm following this. But then, I'm half asleep today.

    • @PrivateSi
      @PrivateSi 2 роки тому

      @@qwertyTRiG .. Use 16 slots for dictionary slots, 12 for latin accent overlays and 4 modifiers I mentioned then 96 standard ANSI alphas, chars and symbols.. The first 31 control codes are wasted except Tab and New Line (that I forgot to say is Null, in the idealised version above).. If using Null terminated strings with a New Line in then a standard double End of Line is used (CR+LF, 13, 10 or whatever the codes are)... RAMMED..

  • @matiasm.3124
    @matiasm.3124 3 роки тому +2

    Well most of the attachments now is encoded in base64.. like 33% more size aprox.
    Edit : someone correct me the % of the size.. but i don't see the reply in here.

    • @user-qf6yt3id3w
      @user-qf6yt3id3w 3 роки тому +1

      Maybe UA-cam has decided that criticism of Base64 by deniers and conspiracy theories is too harmful to the public to be hosted on their platform and banned the comments. Base64 should be enough for anyone!

    • @Richardincancale
      @Richardincancale 3 роки тому

      16:09 “it’s going to increase by about a third in size”. I managed to convert 1/3rd to 33% in my head!

    • @sodiboo
      @sodiboo 3 роки тому +1

      Yeah, base64 is 33.33333...% more massive than the original file, since every 3 bytes of the original file is converted to 4 characters (= 4 bytes) of base64
      It's not always exactly 1/3, because if the original file is not a multiple of 3 bytes in size, then the last 1 or 2 bytes will always be converted to 2, 3, or 4 bytes (upwards of a 300% increase in size... for the last byte of the file) depending on if there's padding at the end or not (which is useful for concatenating base64), so yeah approximately 33%

    • @jimbolino
      @jimbolino 3 роки тому +3

      And because of the 80 character per line limit in the spec, every 80 chars a linefeed has to be added. adding 1.25%
      Also on small files, the 130+ bytes of multipart boundary + Content-Type + Content-Transfer-Encoding headers would also increase the size %

    • @fllthdcrb
      @fllthdcrb 3 роки тому +3

      @@jimbolino _Carriage return and_ linefeed. It *must* be that sequence of two characters for every newline in email, regardless of what OS you use. So, actually 2.5%, at least. (For transmission, anyway. I suppose a local system could convert newlines for its own storage.)

  • @barrybrevik9178
    @barrybrevik9178 3 роки тому +4

    I feel that this video is perhaps useful, because base64 with MIME encoding is *still*, in 2021, the way that email attachments are sent.

  • @xxgn
    @xxgn 3 роки тому

    I'll note that even without padding characters, base64 is unambiguous. A base64encoded strings always has 4N characters, so the padding characters could have been made implicit. The benefits of padding are small...but the cost of padding is also small.

    • @ZipplyZane
      @ZipplyZane 3 роки тому

      I've never understood what the benefits are at all. I always strip the padding from any data URIs I use, as I've already spent the time optimizing the files themselves (usually PNGs) that it seems silly to then include the extra bytes.

  • @TheBuilder
    @TheBuilder 3 роки тому

    now I know everything

  • @maschwab63
    @maschwab63 3 роки тому +1

    IBM punch cards had 80 columns.

  • @pierreabbat6157
    @pierreabbat6157 3 роки тому +2

    I've noticed a bug in email handling: if a line in an email begins with "From ", a greater-than sign is prepended, even though I use Maildir, which stores emails in separate files. Every email begins with "From ", which is used in mbox format to mark the beginnings of messages stored together in one file.

    • @josephgaviota
      @josephgaviota 2 роки тому

      That's not a bug.
      That is _required_ because POP requires a linespace followed by a new line beginning with From represents a new email message.

  • @souravjha2146
    @souravjha2146 2 роки тому

    Bring computerphile on linkedin please

  • @thy_lyson0573
    @thy_lyson0573 Рік тому

    I actually thought emails can handle 8bit stuff, until I downloaded my email and realized it was encoded in base64

  • @akuunreach3260
    @akuunreach3260 3 роки тому

    JfzJ 10 32 52 10 not sure why they values given are all off by 1, maybe he's counting from 0

  • @AmnonSadeh
    @AmnonSadeh 3 роки тому +2

    Were you trying to hit some target of video length? (trying to satisfy the YT algorithm which favors videos with certains properties)
    I love when presenters go into detail, but this one felt excessively long.

  • @watchlistsclips3196
    @watchlistsclips3196 3 роки тому

    Subtitles??

  • @ethanc94
    @ethanc94 3 роки тому

    If we are blasted back into the Stone Age from ww3 I really would hope somebody at intel or tsmc has a physical instruction set in how to build pc parts from scratch. Because there is no chance that the average undergrad student at uni could ever come close to being as efficient and effective as we are currently…

    • @onground330
      @onground330 Рік тому

      Well, when all Chip producing machines and blueprints get destroyed, then we can start almost from the beginning.

  • @SteveMacSticky
    @SteveMacSticky 2 роки тому

    J is the tenth letter

  • @paulvijoi
    @paulvijoi 3 роки тому +2

    Just saying, Mike Pound is still the best at explaining computer stuff

    • @Henry-wk2zc
      @Henry-wk2zc 3 роки тому +3

      I guess each to his own? I thoroughly enjoy all the videos on this channel, and I was so happy to see a new video by Bagley on my feed today :)
      Have learned a lot from him (and Pound too!) on my commutes.

  • @Max_Flashheart
    @Max_Flashheart 3 роки тому +1

    Binary Solo

    • @AlanCanon2222
      @AlanCanon2222 3 роки тому +1

      00000001
      00000011
      00000111
      00001111
      00010001

  • @kaylinfroehlich3293
    @kaylinfroehlich3293 3 роки тому

    "the number '13' actually means ... *AD BEGINS* Surprise!"

  • @trading-university.
    @trading-university. 3 роки тому +2

    in early. nice video!

  • @angeldude101
    @angeldude101 3 роки тому

    Wow. A system designed for ASCII broke when run on a system that didn't use ASCII. You'd think they consider a better format in this case.
    Base64 seems to be the standard for maximizing compression when converting binary data to ASCII. There are also other systems with different tradeoffs like base16 to more closely match the underlying hardware, Bitcoin's choice of base58 to get a similar level of compression, but to avoid characters that appeared to similar to each other, and base32 which IPFS has chosen for subdomains since they're case insensitive.

  • @Trisks
    @Trisks 3 роки тому

    you got mail

  • @williamrutherford553
    @williamrutherford553 3 роки тому +3

    Want to see an example of base64? Look in your URL bar! After the watch?v= is the base64 encoded value, telling you/youtube what video you're watching. It's the URL bar, so it's a perfect example of needing to transmit data using writable characters!

    • @Doct0r0710
      @Doct0r0710 3 роки тому +5

      Except it's a bit modified. This video has a - in its ID, which is not defined by the Base64 implementation shown in the video. It's probably a replacement for / which has a whole other meaning in URLs. + also gives a bit of trouble, (as it's also one way to represent spaces, try searching for something with a space in it) which is replaced in the video IDs by a _ character. Or the other way around.

    • @31redorange08
      @31redorange08 3 роки тому

      An ID usually isn't binary.

    • @max_kl
      @max_kl 3 роки тому +1

      that's not base64

    • @xGOKOPx
      @xGOKOPx 3 роки тому

      @@31redorange08 Literally every piece of data that exists within computers is represented with binary digits. And an ID is literally a number so it's not even weird. UA-cam video IDs are 66-bit numbers encoded in base64, except / is replaced with - and + is replaced with _.

    • @ZipplyZane
      @ZipplyZane 3 роки тому

      @@max_kl It is, though. It contains exactly 64 characters, the same ones used in base64 for URLs.

  • @Henrix1998
    @Henrix1998 3 роки тому +7

    Seems like a huge waste of bandwidth just to keep it backwards compatible

    • @otm646
      @otm646 3 роки тому +11

      That backwards compatibility is critical, you have to remember that corporate infrastructure is almost by definition running on legacy software. There's no reason you should be sending large files over email anyway, There are so many other secure services out there make it effortless.

    • @RonJohn63
      @RonJohn63 3 роки тому

      Never underestimate the inertia built into a large installed base.

    • @oldtwinsna8347
      @oldtwinsna8347 3 роки тому

      @@otm646 That's the irony, the conservative agencies (usually government but could be commercial) wall off applications to the point that email is the only method of file transfer for much of the user activities.

  • @trejkaz
    @trejkaz 3 роки тому

    5322

  • @Danny-hj2qg
    @Danny-hj2qg 3 роки тому +1

    Ten or Eleven? (5:46) (The value of 11 is written in the table.)

    • @VeProducctions
      @VeProducctions 3 роки тому +1

      No, it isn't. 8 + 4 + 1 = 13 and 8+2 = 10

    • @DrSteveBagley
      @DrSteveBagley 3 роки тому +6

      No, I just have very bad handwriting! :)

  • @spc67h
    @spc67h 3 роки тому

    Why using decimal numbers when describing a BINARY file? Why not use hexadecimal (or octal)? Way much easier IMHO

  • @IIARROWS
    @IIARROWS 3 роки тому

    10:51 worst choice ever for a standard...

    • @MCLooyverse
      @MCLooyverse 3 роки тому

      Yeah. I like '?' and '!'. But if you're gonna have '+', why not '-' or '*'?

    • @IIARROWS
      @IIARROWS 3 роки тому

      @@MCLooyverse The question mark is a problem too, for the same reason, as it's a special character in url.

  • @jbcallv
    @jbcallv 3 роки тому

    Fourth!

  • @ClayAlchemist
    @ClayAlchemist 3 роки тому

    I believe J is the tenth letter.

  • @Ukitsu2
    @Ukitsu2 3 роки тому

    Already liked the video, but to TRULY undestand it I'll wait for subtitles (Google's or otherwise) because the accent kills me xD

  • @bluebird1422
    @bluebird1422 3 роки тому

    why this dude got white hairs?

  • @blablubb1234
    @blablubb1234 3 роки тому +1

    The 10th letter of the alphabet isn’t K, but J 😅

    • @opotime
      @opotime 3 роки тому

      IT Count 0,1...
      Not 1,2...
      Have a nice Day
      Greatz from Germany

    • @spc67h
      @spc67h 3 роки тому

      @@opotime They made that same mistake themselfves... it should have been F and I instead of E and H when using 5 bits (cf 09:00)

    • @spc67h
      @spc67h 3 роки тому

      @@opotime BTW: he should have said eleventh instead of tenth (14:58)... or that K represents the 6 bit code 001010

  • @-zer122
    @-zer122 3 роки тому +1

    10th letter of alphabet .... Come on guys

    • @Whelmed.
      @Whelmed. 3 роки тому +4

      0 = A
      ...
      10 = K

    • @kellerkind6169
      @kellerkind6169 3 роки тому

      @@Whelmed. While thats actually pretty logical and probably how the system works it does'nt sit well with H = 8 as stated earlier in the video

    • @spc67h
      @spc67h 3 роки тому +1

      @@kellerkind6169 it should have been F and I instead of E and H (09:00)

  • @aylaselimova8932
    @aylaselimova8932 3 роки тому

    RJVX12 - best investment if you want to earn crypto

  • @sasuke2910
    @sasuke2910 3 роки тому +1

    What a horrible hack, I guess I see why Gmail uploads to Drive.

    • @RonJohn63
      @RonJohn63 3 роки тому +1

      It's a damned clever way to get a tool designed for one purpose to do something else.

  • @propayknesst7489
    @propayknesst7489 3 роки тому

    Yes Yes! Read everything, and then say that you did not know RJVX12 algorithm!

  • @GT-tj1qg
    @GT-tj1qg 3 роки тому +1

    Bruh the title is about the least interesting part of this video

  • @vedanshuseedwan7095
    @vedanshuseedwan7095 3 роки тому +3

    First. :D

  • @bowerhalls1990
    @bowerhalls1990 Рік тому

    Nerd

  • @TheGreatAtario
    @TheGreatAtario 3 роки тому +1

    Not "forward slash". Just "slash"

    • @IIARROWS
      @IIARROWS 3 роки тому +1

      It's not incorrect... It's to differentiate it from the back slash.

    • @TheGreatAtario
      @TheGreatAtario 3 роки тому +1

      @@IIARROWS Slash _is_ different from backslash

    • @angeldude101
      @angeldude101 3 роки тому

      @@TheGreatAtario forward slash and backslash are both slashes. One is forward, the other is backward.

    • @TheGreatAtario
      @TheGreatAtario 3 роки тому +1

      @@angeldude101 No. There is no such thing as a forward nor a backward slash. There is only slash and backslash. Please stop doing this.

    • @angeldude101
      @angeldude101 3 роки тому +1

      @@TheGreatAtario forward slash tends to be more common, so most people just call it "slash."
      What is a slash that is forward if not a forward slash?

  • @andljoy
    @andljoy 3 роки тому +3

    RTF and HTML emails should be banned. plain text only or welcome to the spam folder

    • @RonJohn63
      @RonJohn63 3 роки тому

      That war was lost as soon as AOL implemented an Internet gateway.

  • @springspxrkle
    @springspxrkle 3 роки тому

    Why are you watching this!? Read about the RJVX12 algorithm!

  • @haticenzmhaticenzm8367
    @haticenzmhaticenzm8367 3 роки тому

    is there really still a person who does not know about the existence of RJVX12 algorithm?

  • @ismailyt-brawlstars3848
    @ismailyt-brawlstars3848 3 роки тому

    RJVX12 algorithm is my choice, i dont worry about BTC rates at all