Text is pretty small. PDFs can be hugely bloated but text with basic formatting doesn't require much at all. The text itself and a little bit of metadata about formats and positions and pages.
@@GreatMossWater even without huffman codes / similar compression methods text is really small, each character is in 1 of 256 states, whereas with an image every pixel has 3 colour values which are each in 1 of 256 states. This means that with no compression a single pixel of an image is equivalent to 3 characters.
@hedgeearthridge6807 I used to print Excel files to PDF - my invoices were commonly about 50kb. I then read the Adobe PDF v1.6 specification doc and worked out why these files were so large. Wrote my own library in Javascript, removed Excel from the process entirely, reduced human-interaction to just data-entry (thus reducing the scope for human-error in the process) while speeding up the production of and reducing the final size of my pdfs. My invoices are now around 2kb and are produced in about 1/4 of the time. I think I enjoyed reading that spec document far too much! 😆
@@nathansnail ignoring that Unicode is a bit bigger, most people write in only a fraction of that in repeating pattern we call words and considering that PDFs are compressed files using older zip deflate or "newer" lz4, zips usually take not that much space even when fonts are included in the file since most fonts are vector graphics which take up very little space and many PDFs only contain a fraction of the font
I loved the tacit recommendation of Sumatra PDF reader. I'm always on the lookout for quick "light weight" apps for everyday stuff. I'd love a video of about light weight apps
FoxIt had the chance to be the go-to Adobe alternative, but they blew it. 🤦 Sumatra ftw (I keep both versions 2 and 3; I hate when a program changes too much, hence also versions 1 and 2 of CDisplay/EX).
I recently started down this lite program rabbit hole again. Right now I'm testing out lite spreadsheet programs for when I need a little more than the calculator to quickly compare some things (mostly for video games), but not open a full-blown program like Excel/Google Sheets.
I've made PDFs for my work that were on the order of 200,000+ pages that i ended up having to break down the file because adobe bogged down and ran way too slow with that many pages. If you are wondering, they were conformity packages that contained the entirety of certs, inspections and testing for every single part that went on a plane for a small airplane manufacturer.
I remember at my school library there were a bunch of choose your own adventure books called "Combat Heroes" which had book based maze adventures, and even allowed you to cross-reference pages and fight someone reading the opponent's book (White Warlord vs. Black Baron, or Scarlet Sorcerer vs. Emerald Enchanter). This PDF version is a smart recreation of the system.
There's a 1.6GB PDF file on our server. Discovery in a legal case. The best part is that the idiot who put it together attempted to redact things by drawing black boxes over things. Thank you kind person for showing us where the juicy stuff was, and allowing us to see it!
this is exactly the kind of stuff I like hearing about. it reminds me that the job I do isn't JUST expertise, or passion, or smarts, but the intersection of all of those things at once. the fact that someone's job can be "computers" in any capacity, and they're able to have such a misunderstanding of even the basic tech they're working with that they're able to *put a black box over some text without removing said text* and think "that oughta do it", it serves as a small, albeit powerful reminder for me, that even the most automatic, zero-thought professional impulses I have every day on the job are things that are absolutely worth being paid for. the best part is that experience just builds upon all that, and if I'm worth it as I am now, it's not going to get any less true. rambling aside, long story short, I wish I could tell every self-taught IT kid out there the same. if you care about your work, and you're always seeking to expand your knowledge, and can accept when you're wrong and learn better, then you're already better than everyone your age who's just doing it because they think it's an easy salary.
i used to use IrfanView as my picture viewer app, because the one that came with windows was slow even on super fast hardware... but right now i don't think there is much of a difference....
I love how you have 78k pages of info on biztalk and are still like, "idk what it does really. Something about businesses talking to each other, I guess. No real info out there on it." I feel that.
@@TriglycerideBeware It's kinda weird LEO/Businesses have access to a webportal version that splits everything out into manageable indexed pages. The .pdf is a public facing version of the full archive thats built once a week when changes are made to the main system.
The largest PDFs i've ever actually kinda sorta read were the PDFs for the C++ specification and the Vulkan specification. They're only in the ballpark of 3,000 pages! Anyway, i happen to know that the largest _page_ size that one can make with Adobe Acrobat is 381km × 381km.
I remember getting my rental history from movie gallery in PDF as an email I could download that was a little more than 25,000 pages then swapped from AOL to yahoo and lost it. It started out as a Mom & Pop then was bought by Movie Gallery, and I was on an account with my parents' and sibling's. The history went back to the late 70's early 80's with countless movies and games rented and bought from there.
8:08 If that Character Info actually works on every screen (and the game is not 100% scripted/predicted), than this not only doubles the pages. There is HP, MP and Gold and you would need a page for every (at this stage in the game) possible combination of those three values.
4:48 Old-school sysadmins would call this one a zip bomb, xml bomb, or similar depending on the exact filetype on context used to generate this sort of recursive expansion of a file. Can cause plenty of havoc when your parser isn't designed to detect and shortcut cases like this.
Years ago I moved to SumatraPDF because I was working with a 100,000 page PDF that always crashed acrobat after a couple of seconds. It was the technical reference manual for a system-on-a-chip.
My only gripe with sumatra pdf is that it can't edit forms, which i found it is apparently somewhat proprietary, but it will happily open massive textbooks etc
The printer settings and preview are much better in Adobe Reader or nonexistent in Sumatra. For everything else that can be done with PDFs, Sumatra is probably the best choice.
The thickness of a book page is 0.12mm, if you print a pdf that contains 1 million pages in a book, that book would be 60 meters thick. That's just insane
I found a pdf of every character in Unicode that was 2,700 pages long and thought that was the longest document ever, but that’s nothing compared to these.
I work for a company that has a product that does transformation and viewing of PDFs, and I can say that many companies love to combine (needlessly) many documents into a single PDF. I really doubt that they will ever be seen or even searched. In my tenure I have seen files with well over 1 million pages, and with few pages but well over 500GB in size.
I had a legal PDF the other day that was just shy of 4GB that was several hundred thousand pages long. strangely enough the 32bit version of adobe kept crashing trying to open it. the 64bit version worked fine, once opened it was over 24Gb in ram.
I want u to go through every popular image editing software (Paint, GIMP, Photoshop, etc) and see how high a resolution the software can go. I remember making an approx. 72000x36000 image in MS Paint 😅
The game-book is just a Choose-Your-Own-Adventure book in PDF form, which already exist, but this one is just bigger and more tedious to make because of the graphics and interface requiring a lot more pages. 🤷 (CYOA aren't the only ones like this, they were just the biggest and most famous; there was also the popular _Fighting Fantasy_ series, but also plenty of other game-books, including official D&D ones. I think I had one called DragonQuest-not the Anne McCaffrey book.)
I wrote a script to write out the permutation of 10 for my math class for extra credit. I tried to put it all into a text file then to a pdf but i accidentally made 1 page per number. I literally couldn't open the file so i just kept it as a txt file instead
Largest pdf that wasn't a gimmick for me was a 7200 page, 265 MB medical records pdf spanning 11 years. It was scans of scans of printed records lol. The OCR processing was a nightmare.
The comment on the print planet forum was from Dov Isaac’s- he was one of the creators and scientists from Adobe who made the PDF format. Also, when preparing direct mail campaigns with variable data for print, it is quite common to make PDFs in the thousands- or tens of thousands of pages, but it is literally repeated content with address details that change.
Once upon a time, early 1980s, I wrote a "one-line" BASIC program that was a Text Adventure game. The program had a one-line processing engine, and as many DATA statements as you like. Each DATA statement contain numerous fields and represented a room / situation.
I have a 10,0000-page PDF that is full of SCANNED pages. It's construction documents (bue prints and spec sheets for components etc) for a large building, and the file is just over a gigabyte. It's ridiculous. I cannot share it for various reasons.
My company has a tool which creates a PDF with GTIN barcode labels to stick on fashion items. One (tiny) page per item, for a big purchase order (i.e. what we are ordering from our supppliers, to later sell to customers). So when you order like 200 articles, and 1000 items for each, you easily get 200000 pages. For smaller orders, people do print them (on a label printer). (For the big orders we usually got some third-party printing service where we deliver the data in a different way.) I think we should have used the pagegroup trick to make these less huge files, as many pages are just duplicated, but we didn't yet.
Well, not a pdf but, I generated a file that output to a text file 0 to 2,147,483,647 with "" for translation strings. Took forever to generate but, sadly, I cannot open it because the only program I could partially manage to open it was notepad++. (There is a 2 GB filesize limit on opening.) The file is "22.9 GB (24,658,692,654 bytes)" I saved it to my external drive just to say I have it. :P
What would be a better way to store information? My company runs reports that scour our entire database (which is spaghetti code) which literately freezes production.
Doesn't your database solve that problem? And alternatively, making a program that reads and displays some serialized data, like FlatBuffers, Protocol Buffers, or even something like JSON if it's not that large
The largest pdf i've ever seen is the control flow of a game represented as a directed graph (i thought it would be useful when i made it) the graph has dimensions of 1km by 1m. Entirely impractical but quite funny when my code finished processing and generated the graph.
that game pdf take me back to "choose your own adventure" book. time before I know about internet that was the most surprising book I found in the school library.
4,140-page DaVinci Resolve reference manual (for version 18.5, July 2023), 169 MB. It does not include the DaVinci Fusion reference manual, which is a separate additional 1,650-page PDF. I wanted to read those two PDFs on my tablet so I imported them into (of all the unlikely things) my tablet's sheet-music PDF reader (MobileSheets) so I could easily add removeable scribbles, notes, bookmarks etc - works surprisingly smoothly with no hiccups, guess the app just thinks they're really long music scores! (Lol I need all the help I can get with trying to learn DaVinci stuff.) Anyway, thanks for the entertaining video!
The absolute biggest PDF file I have ever seen is the full on HTML specification . It is a massive monster that won’t even open on my phone. I don’t know where to find it any more, but it was a full documentation of the state machine and everything.
The fact that the HTML spec has gotten so big is why the web is such a bloated mess. So much unneeded crap got added to HTML over the years (although some of it absolutely makes sense like the ability to embed audio and video into HTML pages)
Thank you for adding audio tracks for the video, I hope this will allow many people to get acquainted with your interesting content. Greetings from Russia Спасибо за добавление звуковых дорожек для видео, надеюсь это позволит многим людям ознакомится с твоим интересным контентом.Привет из России
It's not really surprising to have a tiny PDF file generating an obscene amount of pages, because PDFs are Postscript, and Postscript is a Turing-complete programming language, so you basically have a loop that generates zillions of pages...
The largest PDF file I've ever seen is 174GB. It was generated by a bug in a PDF printer which just kept printing out a single PDF file endlessly, with the user completely unaware. It would've been bigger, but 174GB was all that remained on the file server at the time. We were able to respond and solve the problem in 15-20 minutes, but oh my god I've never seen anything like it before
While not a large document in page count at under 650 pages, the largest PDF I have in my fast archive is NTRS 19730024039 "Study of Alternate Space Shuttle Concepts Volume II Part I Concept Analysis and Definition" from Lockheed for NASA at over 820 megabytes. Its companion (Volume II Part II) is another 649 pages at nearly 250 megabytes.
Ah, I was waiting at the end to see if you gave those large technical document PDFs to your custom GPT and how it got on with processing them - any plans for a follow up video on that and custom GPTs in general?
At work we create PDFs of letters we want sent. Then give them to our letter vendor and they mail it out. Each page is 1 letter and the most that I know of that we sent out in 1 day is 400k. It is private though as it contains sensitive and personal data.
I recently downloaded the new edition of the (US Fed. Hwy. Adm.) Manual on Uniform Traffic Control Devices. It's something like 600 pages, with a lot of graphics. I think the file is 50 to 100 MB. I haven't looked at it in a couple weeks so I don't remember exact figures.
The larges i've seen and actually used was a report from a certification lab for a complex standard. The short version of the report was around 100 pages, and the detailed version was around 10 000 pages.
I still remember the biggest PDF I encountered in the wild. It was a database of prime numbers with special properties. In total, it was more than 300,000 numbers over 19,148 pages. Why it was distributed as a PDF, I'll never know.
When I was about 6, I remember using Word on a school computer and trying to see how big i could make a document by just [ctrl+A] [ctrl+c] [ctrl+v] over and over until I accidentally crashed the computer. I was so impressed because I was so curious and youn but anyways this video reminded me so thought i'd share
Funnily enough I tried to do the same thing with the Unreal Engine Github repository since it was just small enough to fit within GPT's upload limits, and GPT kinda sucks for Unreal when compared to other programs since a lot of the documentation is restricted from viewing prior to agreeing to Epic's developer agreements. Needless to say it didn't really work since GPT would just give up after a few attempts when trying to search through such a massive repository to find the answers to rather broad topics.
If we're talking filesize, the biggest PDFs will be basically image files. I scanned some of my books and the filesizes are larger than most of the ones mentioned in this video despite having a fraction of the pages.
I was legit thinking you were going to go out and surprise us by instead going to page size. Pdfs support ridiculous page dimensions. In Acrobat 7.0, you can change the UserUnit size of a pdf to 75000 at most, giving you a page dimension size of 15 million by 15 million inches at most. That's enough to cover a sizable chunk (about half) of Germany
That PDF game made me think of the old "Choose Your Own Adventure" and the "Which Way" books that I saw in the library when I was in elementary school....
@PWingert1966 My biggest ones were my computer science assembly class and my neurobiology class. I concatenated all the lecture slides for open note tests when Zoom classes were mandatory. ctrl + f is my hero lol
They aren't public, so I can't share them, but I work with multi-thousand page PDFs at work every week. One projects we have at work has surveys that get mailed out and I have to parse the PDF before printing. Those survey PDFs are anywhere between 3000 and 12000 pages depending on how many recipients we have that week.
So... a 200,000 page text adventure, cool. I gotta look into Undying Dusk. I have fond memories from many years ago of playing text adventures on my Commodore 64 (yes I'm that old)
Attorneys deal with large pdfs regularly. When you request production of communications between people, you often get pdf export of emails from 10s of relevant people for years and years of entries. It's a nightmare to having to sort through to find evidence so I always prefer to handle such large evidence in pst or msg or other native email files. I'm sure there are legal reference books that are millions of pages long out there. It's not uncommon for a legal refence series to spank 10s of THICK books spanning multiple bookshelves.
A lot of textbooks easily reach the order of the thousands of pages. I think 10000 is the upper limit before they start to split them into volumes rather than a single book.
I once downloaded a txt file which was over 1gb in size. it contained the first 1 billion digits of pi. I'm not sure how many oages it was, but it was quite long.
Coming in hot with the useful practical tutorials 😤🤡
wait what 9 hours ago? this was 4 mins ago
Hi thio!
@@thatonehenward4275 you can make a video available to channel members earlier than normal viewers
@@thatonehenward4275people who Join ThioJoe's membership can acces his videos early, that's how😂
Largest PDF I've used was the encyclopedia of industrial chemistry at around 28000 pages
Now all we need is someone to port Doom to PDF.
That's what I was thinking too.
impossible
Do it. Please 😂
Pdfs are actually written in postscript which is Turing complete. Some varients aren't though. That's how people used to hack printers with pdfs
cant wait for bad apple on pdf
The size efficiency of PDFs is always astounding to me. A small book with full typesetting and everything can be smaller than a phone camera jpeg.
Text is pretty small. PDFs can be hugely bloated but text with basic formatting doesn't require much at all. The text itself and a little bit of metadata about formats and positions and pages.
Books do have a lot of the same words that can be repeated with a character or symbol, like using abbreviations but in code.
@@GreatMossWater even without huffman codes / similar compression methods text is really small, each character is in 1 of 256 states, whereas with an image every pixel has 3 colour values which are each in 1 of 256 states. This means that with no compression a single pixel of an image is equivalent to 3 characters.
@hedgeearthridge6807
I used to print Excel files to PDF - my invoices were commonly about 50kb.
I then read the Adobe PDF v1.6 specification doc and worked out why these files were so large.
Wrote my own library in Javascript, removed Excel from the process entirely, reduced human-interaction to just data-entry (thus reducing the scope for human-error in the process) while speeding up the production of and reducing the final size of my pdfs.
My invoices are now around 2kb and are produced in about 1/4 of the time.
I think I enjoyed reading that spec document far too much! 😆
@@nathansnail ignoring that Unicode is a bit bigger, most people write in only a fraction of that in repeating pattern we call words and considering that PDFs are compressed files using older zip deflate or "newer" lz4, zips usually take not that much space even when fonts are included in the file since most fonts are vector graphics which take up very little space and many PDFs only contain a fraction of the font
What's funny is that Adobe invented the PDF format yet Sumatra does better than their own flagship product
Adobe is like McAfee, they've devolved into malware. 😒
Adobe in a nutshell.
Also, in my experience, Dropbox viewer and GDrive viewer handle certain languages better than the official adobe pdf app
The thing about adobe is that it's massively bloated. It has a shit load of features, but most of them are useless to most of the users.
Many years ago I got tired of Adobe being so bloated and been using SumtraPDF for years now :)
My largest PDF was 180GB due to me accidentally putting the entire photo gallery into it (863.727) Pages
I wonder how you did....
I loved the tacit recommendation of Sumatra PDF reader. I'm always on the lookout for quick "light weight" apps for everyday stuff. I'd love a video of about light weight apps
FoxIt had the chance to be the go-to Adobe alternative, but they blew it. 🤦 Sumatra ftw (I keep both versions 2 and 3; I hate when a program changes too much, hence also versions 1 and 2 of CDisplay/EX).
Yes, this. A video on useful lightweight, open source programs for every day stuff would be nice.
I recently started down this lite program rabbit hole again. Right now I'm testing out lite spreadsheet programs for when I need a little more than the calculator to quickly compare some things (mostly for video games), but not open a full-blown program like Excel/Google Sheets.
zathura with poppler on top
A PDF game, now that's pretty unique!
People will try to make games on absolutely anything they can ^^
i've seen a ppt game too !
I've seen a game on desmos graphing calculator
they should turn it into a PowerPoint game next
Ever heard of Choose Your Own Adventure?
I've made PDFs for my work that were on the order of 200,000+ pages that i ended up having to break down the file because adobe bogged down and ran way too slow with that many pages. If you are wondering, they were conformity packages that contained the entirety of certs, inspections and testing for every single part that went on a plane for a small airplane manufacturer.
Cool!
"I ain't reading all that"
That's called tiktok brain.
@@runed0s86 read a 70k page pdf real quick
@@runed0s86 That's called a joke.
@@runed0s86 legit it's """I ain't reading all that""
I remember at my school library there were a bunch of choose your own adventure books called "Combat Heroes" which had book based maze adventures, and even allowed you to cross-reference pages and fight someone reading the opponent's book (White Warlord vs. Black Baron, or Scarlet Sorcerer vs. Emerald Enchanter). This PDF version is a smart recreation of the system.
that's so epic omg
There's a 1.6GB PDF file on our server. Discovery in a legal case. The best part is that the idiot who put it together attempted to redact things by drawing black boxes over things. Thank you kind person for showing us where the juicy stuff was, and allowing us to see it!
this is exactly the kind of stuff I like hearing about. it reminds me that the job I do isn't JUST expertise, or passion, or smarts, but the intersection of all of those things at once. the fact that someone's job can be "computers" in any capacity, and they're able to have such a misunderstanding of even the basic tech they're working with that they're able to *put a black box over some text without removing said text* and think "that oughta do it", it serves as a small, albeit powerful reminder for me, that even the most automatic, zero-thought professional impulses I have every day on the job are things that are absolutely worth being paid for. the best part is that experience just builds upon all that, and if I'm worth it as I am now, it's not going to get any less true.
rambling aside, long story short, I wish I could tell every self-taught IT kid out there the same. if you care about your work, and you're always seeking to expand your knowledge, and can accept when you're wrong and learn better, then you're already better than everyone your age who's just doing it because they think it's an easy salary.
The biggest PDF I ever seen was intel manual, almost 5k pages
That's what I was thinking
i used to use IrfanView as my picture viewer app, because the one that came with windows was slow even on super fast hardware... but right now i don't think there is much of a difference....
The full vulkan documentation is over 5k pages long too
When I was in the uni, it had under 4k pages if I remember correctly. But of course I didn't read it all, just searched for instructions I needed.
I love how you have 78k pages of info on biztalk and are still like, "idk what it does really. Something about businesses talking to each other, I guess. No real info out there on it." I feel that.
Biggest I've seen is the RCMP's(Canadian FBI) firearms reference table. 225MB and 105205 pages and growing.
It chokes out most PDF readers.
Yeah I would imagine some legal documents can be MASSIVE.
Crikey!!
Hey, I've seen this one on Libgen! Although they claim it's only 104,931 pages
@@TriglycerideBeware It's kinda weird LEO/Businesses have access to a webportal version that splits everything out into manageable indexed pages.
The .pdf is a public facing version of the full archive thats built once a week when changes are made to the main system.
@@antiKhaos I see, interesting. That explains the discrepancy
The largest PDFs i've ever actually kinda sorta read were the PDFs for the C++ specification and the Vulkan specification. They're only in the ballpark of 3,000 pages!
Anyway, i happen to know that the largest _page_ size that one can make with Adobe Acrobat is 381km × 381km.
That's really useful for when I was to plan a new country in a single pdf page
Damn, next find the top amount of pages acrobat reader can comfortably support and calculate what's the most information a pdf can hold in km2.
I remember getting my rental history from movie gallery in PDF as an email I could download that was a little more than 25,000 pages then swapped from AOL to yahoo and lost it. It started out as a Mom & Pop then was bought by Movie Gallery, and I was on an account with my parents' and sibling's. The history went back to the late 70's early 80's with countless movies and games rented and bought from there.
That interactive PDF RPG game is the coolest thing I've seen in a long time. Thanks.
8:08 If that Character Info actually works on every screen (and the game is not 100% scripted/predicted), than this not only doubles the pages.
There is HP, MP and Gold and you would need a page for every (at this stage in the game) possible combination of those three values.
that explains the ridiculous size
4:48 Old-school sysadmins would call this one a zip bomb, xml bomb, or similar depending on the exact filetype on context used to generate this sort of recursive expansion of a file. Can cause plenty of havoc when your parser isn't designed to detect and shortcut cases like this.
3:14 No one man can ever know the depth of Visual Studio.
Years ago I moved to SumatraPDF because I was working with a 100,000 page PDF that always crashed acrobat after a couple of seconds. It was the technical reference manual for a system-on-a-chip.
My only gripe with sumatra pdf is that it can't edit forms, which i found it is apparently somewhat proprietary, but it will happily open massive textbooks etc
Acrobat is as bad as McAfee at this point, installing all kinds of virus-like garbage.
The printer settings and preview are much better in Adobe Reader or nonexistent in Sumatra. For everything else that can be done with PDFs, Sumatra is probably the best choice.
The thickness of a book page is 0.12mm, if you print a pdf that contains 1 million pages in a book, that book would be 60 meters thick. That's just insane
I found a pdf of every character in Unicode that was 2,700 pages long and thought that was the longest document ever, but that’s nothing compared to these.
I work for a company that has a product that does transformation and viewing of PDFs, and I can say that many companies love to combine (needlessly) many documents into a single PDF.
I really doubt that they will ever be seen or even searched. In my tenure I have seen files with well over 1 million pages, and with few pages but well over 500GB in size.
weird flex but ok
I had a legal PDF the other day that was just shy of 4GB that was several hundred thousand pages long. strangely enough the 32bit version of adobe kept crashing trying to open it. the 64bit version worked fine, once opened it was over 24Gb in ram.
Under normal circumstances, a 32 bit process can only access 4GB of RAM at most, so it isn't that strange that a 32 bit program would fail
I want u to go through every popular image editing software (Paint, GIMP, Photoshop, etc) and see how high a resolution the software can go. I remember making an approx. 72000x36000 image in MS Paint 😅
Somehow trying to print or export from IrfanView I got an error >10k px.
paint can load larger ones than paint 3d to my annoyance, i was trying to deal with a huge image but it was also transparent lol
The game-book is just a Choose-Your-Own-Adventure book in PDF form, which already exist, but this one is just bigger and more tedious to make because of the graphics and interface requiring a lot more pages. 🤷 (CYOA aren't the only ones like this, they were just the biggest and most famous; there was also the popular _Fighting Fantasy_ series, but also plenty of other game-books, including official D&D ones. I think I had one called DragonQuest-not the Anne McCaffrey book.)
I remember the Lone Wolf series written by Joe Dever. Still own "The Cauldron of Fear" in German.
I wrote a script to write out the permutation of 10 for my math class for extra credit. I tried to put it all into a text file then to a pdf but i accidentally made 1 page per number. I literally couldn't open the file so i just kept it as a txt file instead
Can you, perhaps, add hundreds of massive pdfs to chatgpt and ask it to make smth with them?
Largest pdf that wasn't a gimmick for me was a 7200 page, 265 MB medical records pdf spanning 11 years. It was scans of scans of printed records lol. The OCR processing was a nightmare.
4:56 I've seen that PDF before. It actually said "Hello World" on all the pages, just most PDF viewers can't display that.
Wait until someone finds the 1.9 billion page Excel file
Chip hardware reference manuals over 10k pages are pretty common.
The comment on the print planet forum was from Dov Isaac’s- he was one of the creators and scientists from Adobe who made the PDF format. Also, when preparing direct mail campaigns with variable data for print, it is quite common to make PDFs in the thousands- or tens of thousands of pages, but it is literally repeated content with address details that change.
Once upon a time, early 1980s, I wrote a "one-line" BASIC program that was a Text Adventure game. The program had a one-line processing engine, and as many DATA statements as you like. Each DATA statement contain numerous fields and represented a room / situation.
I have a 10,0000-page PDF that is full of SCANNED pages. It's construction documents (bue prints and spec sheets for components etc) for a large building, and the file is just over a gigabyte. It's ridiculous. I cannot share it for various reasons.
"getting started with Visual Studio"
>40,000 pages long document
i found one that is like 700 pages only
Fun fact. TI Sitara errata sheet is longer than their datasheet. Of course, both in PDF
Porsche workshop manuals (the full dealer service manual) is 58,000 pages. Mostly torque to yield specs 😂
No way, I was also thinking of the ARM CPU architecture reference manual! I needed to use it in one of my computer science classes.
Try TriCore :D Part 1 and part 2 technical reference manual, both with over 4500 pages.
That file singlehandedly made buy another RAM stick
The most pages I ever found was about 4k, it also was a reference manual for micro embedded stuff
My company has a tool which creates a PDF with GTIN barcode labels to stick on fashion items. One (tiny) page per item, for a big purchase order (i.e. what we are ordering from our supppliers, to later sell to customers). So when you order like 200 articles, and 1000 items for each, you easily get 200000 pages. For smaller orders, people do print them (on a label printer). (For the big orders we usually got some third-party printing service where we deliver the data in a different way.) I think we should have used the pagegroup trick to make these less huge files, as many pages are just duplicated, but we didn't yet.
I got the Undying Dusk PDF Game down to ~27MB and compressed it with Power Archiver under 6MB
I wanna beat the PDF game!! How cool of an idea!
Well, not a pdf but, I generated a file that output to a text file 0 to 2,147,483,647 with "" for translation strings. Took forever to generate but, sadly, I cannot open it because the only program I could partially manage to open it was notepad++. (There is a 2 GB filesize limit on opening.) The file is "22.9 GB (24,658,692,654 bytes)" I saved it to my external drive just to say I have it. :P
Try vim. I use it at work to look at log files and stuff (it doesn't limit the filesize you can open and tends to perform better on very large files).
Notepad++ only has 2GB size limit on the 32-bit version. Download the 64-bit version and you should be able to open it
A worthy contender to my Architect's documentation at my work
Libgen Nonfiction has 5 English PDFs that are 100k+ pages. The largest one (by page count) is 144,218
What would be a better way to store information? My company runs reports that scour our entire database (which is spaghetti code) which literately freezes production.
Doesn't your database solve that problem?
And alternatively, making a program that reads and displays some serialized data, like FlatBuffers, Protocol Buffers, or even something like JSON if it's not that large
Turns out they have a public GitHub with the source code, so tgey did indeed do this programmatically. Quite an achievement!
The largest pdf i've ever seen is the control flow of a game represented as a directed graph (i thought it would be useful when i made it) the graph has dimensions of 1km by 1m. Entirely impractical but quite funny when my code finished processing and generated the graph.
I wonder if we could upload the right documents to perfect emulators.
The performance of Sumatra is insane!
I was recently preparing pdf with ~32000 pages, including maps and images, with resulting file size of 32 gigs. It was unusable at all.
the longest pdf i’ve seen is the 11k, which is now 27000 pages
that game pdf take me back to "choose your own adventure" book.
time before I know about internet that was the most surprising book I found in the school library.
The largest one I remember was about 500 pages for a piece of software.
4,140-page DaVinci Resolve reference manual (for version 18.5, July 2023), 169 MB. It does not include the DaVinci Fusion reference manual, which is a separate additional 1,650-page PDF. I wanted to read those two PDFs on my tablet so I imported them into (of all the unlikely things) my tablet's sheet-music PDF reader (MobileSheets) so I could easily add removeable scribbles, notes, bookmarks etc - works surprisingly smoothly with no hiccups, guess the app just thinks they're really long music scores! (Lol I need all the help I can get with trying to learn DaVinci stuff.) Anyway, thanks for the entertaining video!
just when i thought pdf viewer couldn't possibly a game engine
The absolute biggest PDF file I have ever seen is the full on HTML specification . It is a massive monster that won’t even open on my phone. I don’t know where to find it any more, but it was a full documentation of the state machine and everything.
The fact that the HTML spec has gotten so big is why the web is such a bloated mess. So much unneeded crap got added to HTML over the years (although some of it absolutely makes sense like the ability to embed audio and video into HTML pages)
the PDF for my 73 Buick Centurion is pretty damn big... In total. I don't remember how big though.
Btw "SQL" is typically pronounced as _SeQueL_ by those who use it -- since it is both shorter and less clumsy to try to say than S-Q-L 😉
Thank you for adding audio tracks for the video, I hope this will allow many people to get acquainted with your interesting content. Greetings from Russia
Спасибо за добавление звуковых дорожек для видео, надеюсь это позволит многим людям ознакомится с твоим интересным контентом.Привет из России
The amount of work that goes in making chips is mind blowing. Definitely some of the smartest individuals on earth
Time to print those PDFs!
It's not really surprising to have a tiny PDF file generating an obscene amount of pages, because PDFs are Postscript, and Postscript is a Turing-complete programming language, so you basically have a loop that generates zillions of pages...
The largest PDF file I've ever seen is 174GB. It was generated by a bug in a PDF printer which just kept printing out a single PDF file endlessly, with the user completely unaware. It would've been bigger, but 174GB was all that remained on the file server at the time. We were able to respond and solve the problem in 15-20 minutes, but oh my god I've never seen anything like it before
While not a large document in page count at under 650 pages, the largest PDF I have in my fast archive is NTRS 19730024039 "Study of Alternate Space Shuttle Concepts Volume II Part I Concept Analysis and Definition" from Lockheed for NASA at over 820 megabytes. Its companion (Volume II Part II) is another 649 pages at nearly 250 megabytes.
You can just merge multiple PDFs together so as long as there is no defined limit, you can create PDFs as large as you like
Ah, I was waiting at the end to see if you gave those large technical document PDFs to your custom GPT and how it got on with processing them - any plans for a follow up video on that and custom GPTs in general?
At work we create PDFs of letters we want sent. Then give them to our letter vendor and they mail it out. Each page is 1 letter and the most that I know of that we sent out in 1 day is 400k. It is private though as it contains sensitive and personal data.
7:44, they did make it programmaticly using PyFPDF/fpdf2.3.4, I used a metadata viewer
I recently downloaded the new edition of the (US Fed. Hwy. Adm.) Manual on Uniform Traffic Control Devices. It's something like 600 pages, with a lot of graphics. I think the file is 50 to 100 MB. I haven't looked at it in a couple weeks so I don't remember exact figures.
A million pages is a small library.
The larges i've seen and actually used was a report from a certification lab for a complex standard. The short version of the report was around 100 pages, and the detailed version was around 10 000 pages.
I still remember the biggest PDF I encountered in the wild. It was a database of prime numbers with special properties. In total, it was more than 300,000 numbers over 19,148 pages. Why it was distributed as a PDF, I'll never know.
When I was about 6, I remember using Word on a school computer and trying to see how big i could make a document by just [ctrl+A] [ctrl+c] [ctrl+v] over and over until I accidentally crashed the computer. I was so impressed because I was so curious and youn but anyways this video reminded me so thought i'd share
Biggest pdf i've seen is areoind 5m pages. Containing printing work for a infustrial printer. Works better than you would expect
Still shorter than many companies' terms and conditions
Funnily enough I tried to do the same thing with the Unreal Engine Github repository since it was just small enough to fit within GPT's upload limits, and GPT kinda sucks for Unreal when compared to other programs since a lot of the documentation is restricted from viewing prior to agreeing to Epic's developer agreements.
Needless to say it didn't really work since GPT would just give up after a few attempts when trying to search through such a massive repository to find the answers to rather broad topics.
If we're talking filesize, the biggest PDFs will be basically image files. I scanned some of my books and the filesizes are larger than most of the ones mentioned in this video despite having a fraction of the pages.
I definitely will check out undying dusk, looks like a cool game 😎
I think the biggest PDF I’ve ever worked with is the 900 page DC employee salary list
Which of course now is somehow leaked?
...and half of them are still working from home playing video games when they should be working.
@@johanponken nope, it’s public information. Looking up “DC Public Body Employee Salaries” should bring up the page.
Man you need to see my course lectures!
I was legit thinking you were going to go out and surprise us by instead going to page size. Pdfs support ridiculous page dimensions.
In Acrobat 7.0, you can change the UserUnit size of a pdf to 75000 at most, giving you a page dimension size of 15 million by 15 million inches at most. That's enough to cover a sizable chunk (about half) of Germany
My complete medical history is coming up on 1k pages soon 😅
I love how you credit the stock images and the AI prompts.
That PDF game made me think of the old "Choose Your Own Adventure" and the "Which Way" books that I saw in the library when I was in elementary school....
Those were so enjoyable back then. I liked a but of control.
It kinda got out of control in school text books.jeje.😊
@@v.prestorpnrcrtlcrt2096
I wonder if people still read them, and even write them??
Now I want to port Myst to PDF...
and i thought my 300 page lecture slide/notes pdf was huge
What subject?
@PWingert1966 My biggest ones were my computer science assembly class and my neurobiology class. I concatenated all the lecture slides for open note tests when Zoom classes were mandatory. ctrl + f is my hero lol
They aren't public, so I can't share them, but I work with multi-thousand page PDFs at work every week. One projects we have at work has surveys that get mailed out and I have to parse the PDF before printing. Those survey PDFs are anywhere between 3000 and 12000 pages depending on how many recipients we have that week.
So... a 200,000 page text adventure, cool. I gotta look into Undying Dusk. I have fond memories from many years ago of playing text adventures on my Commodore 64 (yes I'm that old)
at work we have some pdf's with 140k pages, they literally take ages to load up and it crashes like 7/10 times
Cool, a game in a PDF!
I bet Boeing or Airbus must have some really huge pdfs. The complexity of a modern plane is unimaginable.
I knew as soon as i saw the thumbnail that this had to be an export from MSDN 😂😂😂
Attorneys deal with large pdfs regularly. When you request production of communications between people, you often get pdf export of emails from 10s of relevant people for years and years of entries. It's a nightmare to having to sort through to find evidence so I always prefer to handle such large evidence in pst or msg or other native email files. I'm sure there are legal reference books that are millions of pages long out there. It's not uncommon for a legal refence series to spank 10s of THICK books spanning multiple bookshelves.
But if they need to produce all these documents for say a NTG or NTP, wouldn't they be individual files, not one massive combined pdf?
5 thousand pages. It's my Cpp Textbook.
A lot of textbooks easily reach the order of the thousands of pages. I think 10000 is the upper limit before they start to split them into volumes rather than a single book.
I once downloaded a txt file which was over 1gb in size. it contained the first 1 billion digits of pi.
I'm not sure how many oages it was, but it was quite long.
I’m interested in hearing about chat gpt’s troubleshooting performance with thee specialised dataset? Worth it?