Every time I master a new topic and feel proud of myself, I see people like this and wonder how I can even tie my own shoes in the morning. Brilliant people!
This is the most fanscenating computerphile video I have seen so far because it directly relates to research I have been doing for over a year. I have the privelege to go to a University in the states that focuses on learning by application and provides lots of internship opportunities. I have been working with an unnamed corporation for over a year focusing on this very topic. Code modernization. From COBOL to Python, and several other legacy languages such as NATURAL, to more modern languages. We take a different approach, using NLP models, finetuned with our code translation data for our process, but still, different means to the same goal. I'm so happy that Dr.Lano took some time to explain the process of this, so more people can know about the beauty in old languages. It's astounding to me that I've had to opportunity to work on this stuff as an undergrad.
20 years ago I was forced to manually read some old cobol code for modernizing it, would loved a tool that would helped automate part of that process back then
I mean this in the best of ways I genuinely do but I love when peoples appearances match their professions. If you have an IT problem and this guys shows up you'd immediately feel better
This is really fascinating! I'd attempted to do almost exactly this a few years back in my org - trying to convert our legacy PL/SQL code to Java, using Antlr for parse tree generation. As a side project I didn't go too far but it's fascinating to see where this concept is going.
Probably a better fit would be pl/sql to Pascal 😏 Interesting concept though... 4gl to 3gl. Having said that, pl/sql for Oracle Db data manipulation is awesome, but you lose db platform dependence.... being able to do both can be tricky, basically Implementing abstracted SQL calls and sql translations between db platforms. But where pl/sql really shines is db stored procedures/packages... that's an even bigger task to implement externally.
@@custardtart1312 Of course SQL/PLSQL isn't legacy per se - it was our code that was "legacy". We had loads of business functionality written directly in PL/SQL which was a holdover from ancient days of Oracle Forms and D2K. Apart from the obvious platform dependence issue, using PL/SQL prevented development of capabilities that we take for granted with modern Java code (containerization and creation of REST APIs are the first points that pop into my head right now but I can probably name at least 15 more). I must say, though, that having code so closely coupled with the DB made it *blazing* fast for some specific cases.
Around Y2K, I worked as a contractor doing some code translation. The company had a parallel activity to rework incomprehensible COBOL to equally incomprehensible SAP. There also were some date routines whose source code had been lost. I prefer APL, which is (was) considered write-only.
I was just wondering if he's got a universal description of APL. I know of several million lines of APL (! LOTS of lines in some other language) that could use converting.
One of my first programming jobs after I graduated university was translating an actuarial program written in APL into (I think) Pascal. I had never seen APL before, and of course its programming paradigm was radically different from Pascal's imperative style. Fun times!
I'm still puzzled though, whether he conflated BASIC with Visual Basic, or just meant BASIC. I imagine there are billions of lines of code of "Visual Basic" from the 90s, when PCs were replacing the older mainframes and minicomputers. I imagine much was written from scratch, at cheapest rates possible, rather than getting experienced professionals to re-write code the right way. The first "Visual Basic" that Microsoft put out was for DOS. I don't think Windows Visual Basic was backwards compatible, but I'm not sure. And maybe there's just a lot more of the older dialects of BASIC than I know.
Maybe he means COBOL from the sixties and then also visual basic code (from the 90s). I mean both are definitely a problem, we've migrated quite some systems from VB to for example C# on .NET (which is sadly the current default language/platform specified by our company...).
1:1 translation is OK for some cases, but it is often smarter to start over and ditch old outdated business processes along the way... gone through that multiple times in my career.
Yes, this is why we're seeing modern startups like Monzo bank do so well. They're able to build systems faster, and be more flexible, that the older banks can't do
tbh I'd imagine something like this will be mostly useful for like, getting rid of crusty hardware first and foremost, but also for unit testing its replacement. That's kinda the most daunting part of 'just rewrite it' imo is making sure that the core functions are even the same, or if there's load-bearing off-by-one errors speckled around lol
@@ShinyQuagsire A big issue that I have encountered in legacy conversion projects is that there's sometimes no clear definition of what the core functions are and how they work precisely. It's all shoestrings and bubblegum mashed together after 30+ years of adding and removing features.
Nice to see that these type of people still exist: balanced, experienced and thrustworthy. At least that is what I feel listening to the accurate, simple and efficient explanations of this problem that as profesionals in the IT world are dealing with almost daily. As Dr. Kevin Lano is stating the environment overall is changing and that's forcing the creation of more powerfull programming languages from aproximation to the abstraction and human language perspective. The enterprises "resistance" to the change is only making it worst as the time lapsed between would be nice to be done and has to be done is fast pacing consumed. The real added value is to understand this and act on time and not get confused by trends. Thanks for the insightfull video.
When I started my current job, the first big yeah was I converting over 100k lines of legacy code and it was fun. I cheated by writing a translator and then manually reviewing/fixing the resultant code. One great thing that came out of it was finding and fixing a lot of long-standing bugs
I got an uncle who told a story he optimized his cobol code by replacing function calls with GOTO's. It was a 10x speed up (late 80s). Never complaining since when working on bad code ;)
My job is MBSE and while it’s not the most thrilling engineering, if you can’t characterise a system, it’s hard to verify and maintain the configuration going into the future. A lot of this have to be designed with the aim of future proofing them out 10-20 years.
Write your code to be testable (automatically testable), which means coding to interfaces, using SOLID principles, and using and inventing design patterns. Having unit test, integration tests,, user acceptance test, and a full set of test for every use case in a user story, and to ensure every user story is captures for the requirements can help with the lowering cost and raising the maintainablity of a software system. Software should be written as plug and play and not have tight coupling to other parts of a software system, including a particular language. Programming is not just about a language; but rather, it's about designing a high quality robust software system that is easy to maintain and is extendable. The benefit is a trusted system that can be update and changed at low cost.
This advice only applies to very high level languages and general purpose software. The moment you have to write a hardware driver, programs embedded in tiny devices all of that has no meaning.
I don't believe for a second that the Cobol or basic equivalent in Java or any other language would be any easier to maintain and develop than its original form. Autogenerated things are usually a horrible mess and even if it works it just don't make much sense for programmers. It is hard to make out from video but the 10 or so lines of Cobol exploded into at least 50 lines of weird looking Java.
Auto generated code usually is a mess, but at least now you get some of the benefits of using a modern language. Like garbage collection, the ability to write tests, and a bigger pool of developers. So I guess at that point, it's time to start the traditional refactor job.
It’s really all about the tooling, now replacing parts and writing tests is much easier. Development can be done on much newer platforms. There aren’t really any modern COBOL IDE’s you can use. But the amount of tooling for Java is nearly endless.
This! The legacy part is not due to COBOL, the legacy is using a datatype that overflows above 10k customers. Also the natural spaghettification of code if one does not refactor regularly and has a thought out design.
You should give ChatGPT a try. I was quite impressed how easily readable the translated code was. Might take a few iterations to tweak the output but I was able to go from Cobol to c++20 back and forth.
@@progammler That misses the point of this project because ChatGPT will give you no guarantee at all that the output is doing the same thing as the input. This project is about guarantees. By assumption there are no tests and so there is also no easy way to verify the result, it doesn't really matter whether the rewrite was done manually or by ChatGPT.
Before taking early retirement I spent 20 years working for a UK bank writing banking systems in IBM Mainframe Assembler (HLASM). We had ~13 million lines of code and there had been attempts in the past to convert to C, this resulted in a few C modules while the rest was still HLASM. We had problems because we were all assembler developers (several hundred of us) who knew the assembler code well and never had problems supporting and maintaining the code, the C however we couldn't maintain because there was so little of it that cross training wasn't practical. In the end the C was converted to assembler (success!!). This highlights the a major issue with translation of code in one language to another. While it might be easy to get someone who can develop in the target language it is much harder to to get prople who understand what a program actually does, reading the code is only part of the solution. The knowledge of the orignal program, it structure and in depth functionality is lost when translated to a new language.
I once worked on a system where some of the code had been machine translated from PL/1 to Ada, I made sure I never went anywhere near that. Instead I was reverse engineering a system that was originally written in PL/C and rewriting in in c so that could interact with the translated Ada code. We did not have the source so I had to inspect the data flows in and out. I got it working, too.
Wow this has been part of my day job for years, for VB to Java there are commercial tools. The biggest problem I find is UI code so that is just redesigned. Unit Tests are a life saver in these endeavours.
This is why documentation, comments, flowcharts, and additional information on functionality is crucial. The source code + docs + data has to be passed on. "It works because it works" is no excuse for an institution like government or the finance sector.
I've worked on software where they didn't even have all the source code. Literally ended up rewriting the whole thing in a more modern language to make a minor change.
Uncommented code really bothers me. I don't care what language comments are written in as long as it can be reasonably translated, there needs to be comments.
@@DrewTNaylor JavaScript coders don’t know it’s possible to add comments to their code. Or they think it will be too big and run too slow, while the actual code is poorly optimised and badly designed.
I changed from engineering to programming jobs by my first moonlight assignment translating legacy systems from HP and Data General to an IBM PC. I underestimated how long it would take just to proof read 250,000 lines of code! Although they were 'just' versions of Basic they were actually very different on the display (single line v full screen), Fortran style print statements and plug-in cartridges that solved matrix equations and sorting. I soon realised I could write an interpreter of the source and write out a new program that could be tested on the original inputs and outputs luckily one line at a time because that is how BASIC worked. At least GW Basic had functions instead of GOTOs! It was insanely successful in preserving the intellectual property of the companies developed over years. I did one job for $10,000 and I later found the next quote to rewrite from scratch was $250,000.
One feels this fellow is reinventing the wheel. If you want to modernize COBOL, PL/I and other enterprise software assets then talk to the experts at Micro Focus and/or Heirloom Computing. For example, Micro Focus’ COBOL compilers can convert to C, Javen and .NET already.
I feel this is a much more formal and more scientific research like approach (hence the UML output), more about understanding compared just making a transpiler for cobol as a tool.
@@EraYaN Micro Focus has tools that do this sort of detailed analysis within their product suite. In fact, these tools go beyond just a program and look at the whole system (code, job control, data, forms etc.). To be fair, Micro Focus started this sort holistic approach as part of their Y2K solution in the 1990s.
Corporates and the like (such as a university) don't like you installing random open source programs, because it requires effort to verify it's not a threat to their network. Unlikely that Grady could get them to allow it for a UA-cam video.
@@MrGeekGamer If it's windows you can record the screen just by pressing win, alt and R. For Linux you'll need Kazam or similar but if you're a Linux user you're installing six hundred and thirty three apps a day and changing your distro once a fortnight so I'm sure you'll get away with one more.
@@MrGeekGamer It's installed by default. The question is whether corporations actively go in and spend time and effort uninstalling it. I'm sure there are some that do, the vast majority wouldn't bother. It's considered safe and turning background stuff like this off in windows can even cause issues and give you a non-standard installation which is just a headache. Although, in a high security environment where you want to prevent recording of sensitive internal zoom meetings and the like. Sure, that would make sense.
@@jklax COBOL was created back in the 50s as a business oriented, supposedly manager-readable language that would make plain what the mysterious computer geeks where doing inside their cubicles and therefore make their product more manageable and maintainable. Unfortunately it didn't address the problems of poor specification, poor documentation, bad coding habits, and other issues that were more the cause of the difficulties. When these concepts were taught in colleges, it was in connection with languages that enforced some of the better coding practices, but many of the businesses were loaded with mountains of black-box COBOL code that wasn't much clearer than the assembly that preceded it, and everybody was afraid to touch the jury-rigged edifice and maybe bring it crashing to the ground. Short answer - COBOL didn't solve the problems it was created to address, and was abandoned by academia in favor of more robust languages and codes of practice. Business didn't follow the trend because updating their software structure presented too much risk. Same reason airports use equipment and code that is sometimes decades out of date.
Early in my software career the bane of my life was being given reams of uncommented spaghetti code in Basic, Pascal, etc. to somehow comprehend and then modify to integrate new features. Sometimes it was easier to just rebuild it from scratch in a better language. Tools like this reverse engineering engine would have been so useful back then, but at that time even UML was in its infancy.
This is super interesting. I was always told that COBOL was well suited for describing business logic because it was fairly abstract. But i guess that was by the state of the art half a century ago. The COBOL I've seen, comes over as barely above machine code to my modern sensibilities.
I think this might depend a lot on the age/ver of COBOL as well as the programming style used to write the application in question. In the company I work at, we have a fairly large, mainframe-based, vendor legacy COBOL system used to administer financial products. The system was originally written using a post COBOLII version of the language and has since been upgraded to IBM's latest ver of the compiler. The system is made up of about 3000 programs with the size of the procedure divisions ranging anywhere from 5000 to 15,000 lines of code. The licenses with the vendor allows my company to make significant changes to the source code in order to support the types of financial products being developed. One key feature of this legacy system is how it was originally designed. It uses a very strict structured programming philosophy and takes advantage of the cleaner COBOL coding style that became possible after (I think) COBOLII was introduced (way back in the 80s I think). . As a result, structured programming language concepts are prevalent throughout all programs within this system - i.e. in-paragraph loops are accomplished through PERFORM/UNTIL/END-PERFORM stmts; conditional stmts are always bound by END-IF clauses, and complex case logic is handled by EVALUATE/WHEN/END-EVALUATE statements (some of these structures were not available in much older versions of COBOL). Because of the strict coding standard that was applied to this system, spaghetti-like/"GOTO rich" logic is entirely missing from the system (GOTOs in this application are strictly only used for abnormal paragraph exits in the event of errors being detected). This makes the code much easier to read in a top-down fashion.....(the programs are literally designed to mirror the business flow that describes how the financial products should work. As a result, it makes it easier to identify which programs need to be changed (and where the change should be applied) when new bus requirements get sent to the development team. However, I completely understand that not all legacy systems are written this way. In my career I've also encountered legacy mainframe COBOL programs that were designed/written using much older versions of the compiler that did not support some of the syntax described above....and as a result in these other older systems you end up having to deal with code that might be much harder to follow.
@@DC-id2ih that's super interesting, thank you for sharing. The COBOL war stories I know, were mostly from my dad, who worked in EDP/IT-Auditing in 70s, 80s, and onwards. COBOL wasn't exactly a new language at that time either, but it was clear that a well maintained COBOL-application, wasn't the worst thing his auditing team could encounter (like APl, for which apparently need a special keyboard to even write it) It makes sense that it would have evolved and improved, as you say. One of the first languages I was first taught, around the 2000s, was Java, and modern Java is a different beast altogether, with many new features. Same deal with COBOL, I'm sure.
@@jens256 A quick search says, "Classes and interfaces have been in COBOL since 2002. Classes have factory objects, containing class methods and variables, and instance objects, containing instance methods and variables. Inheritance and interfaces provide polymorphism." The Wikipedia article on COBOL is worth a quick look. Lots of interesting stuff, both tech and cultural.
Actually COBOL is quite the opposite of abstract. It's also very, very, very, far from machine code. COBOL is an attempt to make something similar to the English Language and this has long since been determined to be a bad idea. It is indeed primarily intended for business programs. The term business logic just means that it's describing arbitrary rules such as tax code as opposed to something not arbitrary like the laws of physics like one might do in scientific languages.
I'm quite interested in this topic as it always bugs me that there are so many targets for code and languages which aren't cross-usable. Often it would be amazing to be able to integrate two things without having to resort to separate services or processes.
this is something to consider when writing anything mission critical, choose languages and frameworks that will be around for the long-term - or at least supported for the long term. otherwise your grand-children will be left with a major headache.
This is why I chose JavaScript, unironically. It's event driven by nature, asynchronous by design, and used by literally everyone who's ever bothered to learn programming in the last 30 years. It is the most widely developed language, and the one with the most money thrown at it. I laughed at myself when I came to the conclusion that my core applications should be written in the language I scoffed at for most of my life, but it is what it is. It's the best language available today for general maintainability, and ease of access to talent.
Part of my full time job is to convert legacy C++ code to modern Typescript front end code. It's challenging sometimes since as you know it isn't always one to one, you have to be clever and understand what the C++ code is doing and think how would JavaScript or TypeScript do it.
For those wondering, the app is written in C++ and the company wants a more modern look and feel to it and be browser based. So I take the modules that were written in C++ and I go through it and write the Angular/typescript code based on what the C++ does in the legacy app. Sometimes the C++ does things of course TypeScript can't do, so with that I will sometimes need to write C# code and reference that with the TypeScript via a controller. It could be a DLL wrapper or something like that, that of course TypeScript can't do.
Can this handle assembly language as input? If so, it might be useful for making open-source firmware replacements. Dump the ROM from a device, run the binary through a disassembler, run that through the specification-generator, then use the resulting specification to write a functional replacement.
Anybody, even someone who's never seen COBOL before, can look at those few lines and instantly see what it's doing. Yes, it's an old language but it is not some lost cryptic notation that only an archaeologist could decipher. It's damned simple, even if it requires more discipline to avoid spaghetti code than modern languages do. You can still write spaghetti code in any language if you want to.
This is why the Y2K bug was potentially a huge problem. People don't understand this and think it was all a big hoax but a lot of work goes into forward compatibility that non-technical people can't even begin to appreciate
If y2k was not warned so broadly in advance, significant amount of software would probably have failed. Tech people understand but what about C-suite people who probably first ask how much the fix costs and then forget it until sh.. hits the fan.
There's no evidence that not addressing y2k would have caused any real issues. Lots of money wasted all over the world, checking that the receptionist's 486 was y2k compliant. There was some legit issues, but 98% of it was computer companies scamming government entities. 2038 will cause much more problems. But that's decades away right?
@@tbird-z1r Just because there was a lot of overexaggeration and fraud doesn't mean that it wasn't a real problem. There was massive fraud during the last health crisis and lots of overcounting and overexaggeration but I don't think any sane person would deny that it was a real problem. We don't know for sure what could have happened if some of those systems were not corrected. Software has a tendency to break in really unexpected ways when something as simple as a buffer overflow can cause an entire system to crash. I used the word "potentially" simply because I don't have the infinite wisdom or some alternate dimension to know for sure that it was but I'm on the side of an abundance of caution when it comes to dealing with critical systems. The fraud issues are something else entirely that has to do with how screwed up our political system is. But that's outside the scope of computerphile
Very interesting! 😊Would love to know more about the text format mentioned that ANTLR parses into. I also wonder if doing an AST based translation would not result in bugs being ported from source language into the destination language? 🤔
I say you actually *want* to port bugs in case they are required for the function of some systems, like API consumers who use a workaround that would break if the buy were fixed.
I use OCL (Object Constraint Language), a standard language that is part of the UML, so a kind of universal software specification language. And yes, bugs are translated unchanged - it would be an extra step to check and remove them, ideally before target code generation.
Don't know about Cobol, but in BASIC, you have no structured control flow. You can basically (no pun intended) jump around from anywhere to anywhere which makes it quite hard if not impossible to automatically transfer it into a modern language with structured control flow without making a huge mess. I would be interested to see how they deal with that issue.
I think cobol allows gotos (I think you can see it in the video itself), but most of the time large systems will have most of their gotos structured as de-facto loops or other modern control flow structures, just because it made writing these systems much much easier. Our modern control flow largely derives from these folk practices anyway, and they became formalised once it was proven that loops and if-else statements were sufficient to enact any control flow required.
I remember reading some sci-fi novel where there were some sort of "digital archaeologists"... In the far future they replaced real programmers... Instead of writing new code they were looking for ancient code that did the job...
@@ChristopherHailey The silly thing is that the interactive elements were originally written in Javascript. People kept telling me to convert it all to Flash / Actionscript as it would "look better". I deleted the original Javascript thinking I wouldn't need it.
I've recently got a job with legacy Java code 20+ years old (Jsp/Servlet era). There is no a slightest chance of rewriting it, making it more modern :)
When I saw legacy code in the title, I wasn't expecting legacy languages, but the more common usage meaning poorly written/convoluted/unmaintainable OG code, usually written by someone in another department who didn't know better before hiring proper devs, which everyone's afraid to touch. A tip for future devs: the most important thing in maintainability is readability and ease of understanding. Organize your classes to make sense to humans. Name your functions and variables so they read close to natural language, ie IF (operationHasFinished) { . . .
I mean those are basically the same thing in this context. It's just that by unmaintainable OG code, we mean REAL OG written back before people stopped using gotos, having a string type was decadent, and you had to worry about your variable names taking up too much disk space in the source code.
I guess it's always there (on windows systems), and it might have some features, like saving to a specific character encoding scheme, that notepad doesn't have. It might have the same frankenstein pattern engine for search and replace that's used in Word.
I used WordStar too at one time, for assembly code. It caused a bug that took a long time to track down - it set b7 of a character which looked and printed out just fine but the assembler took one look, said "Invalid character" and simply ignored the line - no error or warning!
It is so clear to me that some programmers do what they do for entirely different reasons than I. Which is great! But man... as a games programmer I program to invent stuff. Not to program. Programming is a mean to an end. To create. This, taking another persons creation, and then translating it into another language...? He is a clever translator. I would not know how to do what he does.
They're a different breed of person. Hyper turbo autismos, but talented beyond compare. I've met several of them, and they are extremely strange creatures. Nearly one dimensional except for the odd esoteric hobby. For most of them, the idea of sport, or outdoorsmanship is horrifying, and socially they're abysmal. But give them a technical task, a mundane one that must be solved procedurally, and they excel in ways that make my skin crawl. I'd say they were fictional had I not met many of them.
99% of the code that current legacy business applications consist of wasn't written by people with passion; they were written by people (many of them, spanning generations) who just needed to pay their bills. How much passion can you have for your task to write COBOL that generates a monthly bank account statement? Add numbers in a column, add summary at the end, and format it all in a simple ASCII (or rather EBCDIC) table?
Can confirm on all counts. 2,000 lines of VB6 is big enough to make a nice proof-of-concept yet small enough for a programmer to validate the result independently, but business-types will remain skeptical. Yes, py2 -> py3 takes an unreasonably long time when you're worried about stability at all costs. And yes, banks still employ COBOL programmers. IDENTIFICATION DIVISION.!
I would never write COBOL code like this. And it's not even safe or maintenance friendly. Even if there was source, I would still transform the object code because (as we all know) the source rarely matches the object and so, will always give the "actual spec". If the compiler is known, then recompiling either the generated or original source should result in the same functionality. I believe, therefore, that a Cobol Dis-Assembler for each machine the code is to run on would be a better investment. Most compilers have this ability nowadays. Plus this provides the ability to compare code and convert the object for different compilers on the same machine. I used to use this method to show people that ICL 2900 used hardware BCD arithmetic and the data didn't need to be converted into a COMP for processing arithmetic.
I was about to comment the same. I still support and develop VB6 applications to this day for my day job, they're far too big and complex to completely rewrite or translate. We do use modern languages for components where possible though.
@@Wiggs1979 yep, it's still completely interoperable with .NET, even the IDE still runs on modern Windows with some tweaks. there are even one or two alternative compilers (far from complete)
you can think of semantics as the WHAT you can express in a language as opposed to syntax which is the HOW you express it. For example in Python compared to C, classes are a semantic feature that the C language does not have.
@@Kersich86 _I_ know what semantics are; I’d like to see a video explaining why they’re important in a manner that a wider audience can understand and appreciate. Particularly with regard to how they relate to the notion of program specification and proof.
the syntax is the structure of the language and what it looks like: - in English for example every sentence must end with a "." and you can only use words in the dictionary and made up names - in C for example every statement must end with a ";" and you can only use variable names that you declared i recommend the Comupterphile video *"Parsing Explained - Computerphile"* about syntax the semantics is what the syntax means or does: - in English for example you can lookup the meaning of words in a dictionary. words have a different meaning depending on the language - in C for example you could look up the meaning of a statement in a book like "The C Programming Language" i recommend Udacity video *"Syntax Vs Semantics - Programming Language"* about semantics i don't remember any video from Computerphile about both, but it would be cool to see such a video.
@@xybersurfer i have a copy of The C programming language and can assure you it does not have a description of the semantics of the language. The C language specification has some, but not really enough to guarantee a single consistent meaning to even simple programs written in C.
@@xurtis i am aware that it falls short. i was never really a fan of the book to be honest. i'm not really expecting OP to go read that book. it was merely meant as an example, to give a rough idea. sorry for making it look like i was endorsing that book
Said with all respect, Dr Lano has such distinguishing features that he could have, or maybe still could, make a killing as a performance character! What a unique face!
I'd be curious to know if this technique could be used to more easily change code dependencies. My job would be a lot easier if, for example, this could automatically replace a legacy framework with a more modern one, e.g. CORBA with gRPC.
Back in the 1990s it was easier to teach a C expert FORTRAN than it was to maintain C code generated by F2C. (In my experience, YMMV, etc.) Hopefully the tools are better now.
We still use visual fox pro at our company and would like to convert it to a new language, but the effort would be so great starting new would be cheaper than trying to rewrite almost 30 years of code with many many bad design decisions and bugs ..
The great thing about old code is often it is doing things for reasons entirely invisible. We used to say COBOL was an Italian coding language, because it always looked like spaghetti 🍝
Goto's are great - when used sparingly, in limited situations. Well placed GOTO can increase readability of the code, for example as an exit to a common destination from different layers of multiple nested loops . Also goto=jmp is used all over the place in assembly. Don't dis goto, just don't abuse it.
Do these tools preserve comments? I don't know how it is in COBOL, but in many programming languages comments can go anywhere and aren't part of the AST.
This didnt happen overnight, these companies have this problem from years of deliberate underinvestment in the people and depts that manage/maintain their code and systems, all this does is empower them to do it all over again but not to worry soon A.I will maintain it for them.
I maintain a heterogeneous environment of Windows and Linux servers, and if I'm on Windows, I often use WordPad because it's always there and it seamlessly handles UNIX-style line endings (unlike Notepad).
It's like any other format, decimal numbers (zoned or packed) represent numbers, internal and external representations are up to the program. A common example would be money, e.g. USD. You can represent dollar amounts as pennies, e.g. $1.23 would be "123". When you format the output you would insert a decimal point to represent as dollars, "123" would display as "1.23". This would be fixed point and is used because it's exact. On the other hand, floating point is an approximation, and COBOL supports fp data types.
I agree, sometimes its pretty awkward to get out of an switch statement by cascading through and break, sometimes you want different endpoints. Its still pretty nasty though if people start misusing it way out of scope.
@@rickyrico80 I've seen some horrors lol. I certainly get what you mean. :D It's a last resort tool when other options have been exhausted. Kinda like the volatile keyword. Rarely is it actually needed :)
Languages that have fewer lines of code do not necessarily run faster. You might find that one line translates to 100 lines of machine language but in the other language is tends to represent 10 lines of machine language. COBOL tends to represent numbers in decimal which is no longer directly handled by most modern hardware so COBOL tends to run very, very slowly on modern machines. IBM computers like the Series Z do support decimal for this reason.
Yes, you're not a *real* programmer unless you know how to indent (or outdent) source code using an IBM 029 card punch. :) (Hint for the youngsters: it involves deploying a wet finger.)
Not taking anything away from the man, but as a very honest question: could AI also analyze legacy code, convert it to a modern language, and perhaps even optimize the same program, too?
Key thing to remember is Artificial Intelligence isn't Real Intelligence. It just looks so much like real intelligence that we can fool ourselves into giving it more credit than it deserves.
I cant get over using WordPad as a code editor.
lol
(this man clearly knows more than most of us about legacy code, just a decision that made me double take)
It's a step up from using VIM!
@@carlospulpo4205 it really isn't...
@@jens256 notepad++ for the win
Every time I master a new topic and feel proud of myself, I see people like this and wonder how I can even tie my own shoes in the morning.
Brilliant people!
This is the most fanscenating computerphile video I have seen so far because it directly relates to research I have been doing for over a year. I have the privelege to go to a University in the states that focuses on learning by application and provides lots of internship opportunities. I have been working with an unnamed corporation for over a year focusing on this very topic. Code modernization. From COBOL to Python, and several other legacy languages such as NATURAL, to more modern languages. We take a different approach, using NLP models, finetuned with our code translation data for our process, but still, different means to the same goal. I'm so happy that Dr.Lano took some time to explain the process of this, so more people can know about the beauty in old languages. It's astounding to me that I've had to opportunity to work on this stuff as an undergrad.
The bigger legacy conversion that needs to happen here is the code editor :)
20 years ago I was forced to manually read some old cobol code for modernizing it, would loved a tool that would helped automate part of that process back then
Behold, WordPad XD
This could only get better if he used a actual notepad to do code editing…
which we did when i started programming....
I continue to this day, to code in vi.
wordpad is worse, in my opinion
Notepad? You mean PAINT
dude has 30 boxes of printing paper, so maybe he does?
1 second in and I see code in WordPad 👌
Or "modern tooling" as its known in Cobol environments.
I mean this in the best of ways I genuinely do but I love when peoples appearances match their professions.
If you have an IT problem and this guys shows up you'd immediately feel better
This is really fascinating! I'd attempted to do almost exactly this a few years back in my org - trying to convert our legacy PL/SQL code to Java, using Antlr for parse tree generation. As a side project I didn't go too far but it's fascinating to see where this concept is going.
SQL and PL/SQL is not legacy, and performs certain tasks beautifully.
Probably a better fit would be pl/sql to Pascal 😏
Interesting concept though... 4gl to 3gl.
Having said that, pl/sql for Oracle Db data manipulation is awesome, but you lose db platform dependence.... being able to do both can be tricky, basically Implementing abstracted SQL calls and sql translations between db platforms.
But where pl/sql really shines is db stored procedures/packages... that's an even bigger task to implement externally.
@@custardtart1312 Of course SQL/PLSQL isn't legacy per se - it was our code that was "legacy". We had loads of business functionality written directly in PL/SQL which was a holdover from ancient days of Oracle Forms and D2K. Apart from the obvious platform dependence issue, using PL/SQL prevented development of capabilities that we take for granted with modern Java code (containerization and creation of REST APIs are the first points that pop into my head right now but I can probably name at least 15 more). I must say, though, that having code so closely coupled with the DB made it *blazing* fast for some specific cases.
Around Y2K, I worked as a contractor doing some code translation. The company had a parallel activity to rework incomprehensible COBOL to equally incomprehensible SAP. There also were some date routines whose source code had been lost. I prefer APL, which is (was) considered write-only.
I was just wondering if he's got a universal description of APL. I know of several million lines of APL (! LOTS of lines in some other language) that could use converting.
One of my first programming jobs after I graduated university was translating an actuarial program written in APL into (I think) Pascal. I had never seen APL before, and of course its programming paradigm was radically different from Pascal's imperative style. Fun times!
The superior IDE
At 1:32 he says Visual Basic is from "the early sixties". Visual Basic is actually from 1991, I guess he meant BASIC (not Visual) which is from 1964
Came here to say exactly this.
He's so professional he uses word to edit code and was already using visual basic back in the 60s. What a pro.
I'm still puzzled though, whether he conflated BASIC with Visual Basic, or just meant BASIC. I imagine there are billions of lines of code of "Visual Basic" from the 90s, when PCs were replacing the older mainframes and minicomputers. I imagine much was written from scratch, at cheapest rates possible, rather than getting experienced professionals to re-write code the right way. The first "Visual Basic" that Microsoft put out was for DOS. I don't think Windows Visual Basic was backwards compatible, but I'm not sure. And maybe there's just a lot more of the older dialects of BASIC than I know.
@@squirlmy dont overthink it, it just means he doesnt really know what he's talking about.
Maybe he means COBOL from the sixties and then also visual basic code (from the 90s). I mean both are definitely a problem, we've migrated quite some systems from VB to for example C# on .NET (which is sadly the current default language/platform specified by our company...).
that is an interesting dude i have to say ... a true character in this blunt time ... bro got real passion
1:1 translation is OK for some cases, but it is often smarter to start over and ditch old outdated business processes along the way... gone through that multiple times in my career.
Yes, this is why we're seeing modern startups like Monzo bank do so well. They're able to build systems faster, and be more flexible, that the older banks can't do
tbh I'd imagine something like this will be mostly useful for like, getting rid of crusty hardware first and foremost, but also for unit testing its replacement. That's kinda the most daunting part of 'just rewrite it' imo is making sure that the core functions are even the same, or if there's load-bearing off-by-one errors speckled around lol
@@ShinyQuagsire A big issue that I have encountered in legacy conversion projects is that there's sometimes no clear definition of what the core functions are and how they work precisely. It's all shoestrings and bubblegum mashed together after 30+ years of adding and removing features.
Visual Basic was first released in 1991. BASIC dates from the 60's
I'm glad someone else spotted this.
I was thinking "no way VB existed in the 60's?"
not just me that noticed that then. Remember Visual Basic v1?...for DOS, I remember having to fix some of the "GUI" code due to bugs
Tells you something about how bad VB is.
Nice to see that these type of people still exist: balanced, experienced and thrustworthy. At least that is what I feel listening to the accurate, simple and efficient explanations of this problem that as profesionals in the IT world are dealing with almost daily. As Dr. Kevin Lano is stating the environment overall is changing and that's forcing the creation of more powerfull programming languages from aproximation to the abstraction and human language perspective. The enterprises "resistance" to the change is only making it worst as the time lapsed between would be nice to be done and has to be done is fast pacing consumed. The real added value is to understand this and act on time and not get confused by trends. Thanks for the insightfull video.
When I started my current job, the first big yeah was I converting over 100k lines of legacy code and it was fun. I cheated by writing a translator and then manually reviewing/fixing the resultant code.
One great thing that came out of it was finding and fixing a lot of long-standing bugs
I got an uncle who told a story he optimized his cobol code by replacing function calls with GOTO's. It was a 10x speed up (late 80s). Never complaining since when working on bad code ;)
My job is MBSE and while it’s not the most thrilling engineering, if you can’t characterise a system, it’s hard to verify and maintain the configuration going into the future. A lot of this have to be designed with the aim of future proofing them out 10-20 years.
"How do you like to organize your code?"
"Why, In a bunch of single WordPad windows, of course!"
Write your code to be testable (automatically testable), which means coding to interfaces, using SOLID principles, and using and inventing design patterns. Having unit test, integration tests,, user acceptance test, and a full set of test for every use case in a user story, and to ensure every user story is captures for the requirements can help with the lowering cost and raising the maintainablity of a software system. Software should be written as plug and play and not have tight coupling to other parts of a software system, including a particular language. Programming is not just about a language; but rather, it's about designing a high quality robust software system that is easy to maintain and is extendable. The benefit is a trusted system that can be update and changed at low cost.
This advice only applies to very high level languages and general purpose software. The moment you have to write a hardware driver, programs embedded in tiny devices all of that has no meaning.
I don't believe for a second that the Cobol or basic equivalent in Java or any other language would be any easier to maintain and develop than its original form. Autogenerated things are usually a horrible mess and even if it works it just don't make much sense for programmers. It is hard to make out from video but the 10 or so lines of Cobol exploded into at least 50 lines of weird looking Java.
Auto generated code usually is a mess, but at least now you get some of the benefits of using a modern language. Like garbage collection, the ability to write tests, and a bigger pool of developers.
So I guess at that point, it's time to start the traditional refactor job.
It’s really all about the tooling, now replacing parts and writing tests is much easier. Development can be done on much newer platforms. There aren’t really any modern COBOL IDE’s you can use. But the amount of tooling for Java is nearly endless.
This! The legacy part is not due to COBOL, the legacy is using a datatype that overflows above 10k customers.
Also the natural spaghettification of code if one does not refactor regularly and has a thought out design.
You should give ChatGPT a try. I was quite impressed how easily readable the translated code was. Might take a few iterations to tweak the output but I was able to go from Cobol to c++20 back and forth.
@@progammler That misses the point of this project because ChatGPT will give you no guarantee at all that the output is doing the same thing as the input. This project is about guarantees. By assumption there are no tests and so there is also no easy way to verify the result, it doesn't really matter whether the rewrite was done manually or by ChatGPT.
Before taking early retirement I spent 20 years working for a UK bank writing banking systems in IBM Mainframe Assembler (HLASM). We had ~13 million lines of code and there had been attempts in the past to convert to C, this resulted in a few C modules while the rest was still HLASM. We had problems because we were all assembler developers (several hundred of us) who knew the assembler code well and never had problems supporting and maintaining the code, the C however we couldn't maintain because there was so little of it that cross training wasn't practical. In the end the C was converted to assembler (success!!).
This highlights the a major issue with translation of code in one language to another. While it might be easy to get someone who can develop in the target language it is much harder to to get prople who understand what a program actually does, reading the code is only part of the solution. The knowledge of the orignal program, it structure and in depth functionality is lost when translated to a new language.
Imagine converting code that was converted from assembler into COBOL (auto generated!), then converting that to PL/SQL. Now that was a pain.
I once worked on a system where some of the code had been machine translated from PL/1 to Ada, I made sure I never went anywhere near that. Instead I was reverse engineering a system that was originally written in PL/C and rewriting in in c so that could interact with the translated Ada code. We did not have the source so I had to inspect the data flows in and out. I got it working, too.
Fascinating. The thought of so much VB6 legacy code is terrifying. But then I remembered I wrote some even older VB business code. A big problem.
Ah yes, ms wordpad, the premiere IDE for based gigachads
I mean is there something better for COBOL? It's not like you just get JetBrains Coboland and off you go 😅
Wow this has been part of my day job for years, for VB to Java there are commercial tools. The biggest problem I find is UI code so that is just redesigned.
Unit Tests are a life saver in these endeavours.
This is why documentation, comments, flowcharts, and additional information on functionality is crucial. The source code + docs + data has to be passed on. "It works because it works" is no excuse for an institution like government or the finance sector.
I've worked on software where they didn't even have all the source code. Literally ended up rewriting the whole thing in a more modern language to make a minor change.
Uncommented code really bothers me. I don't care what language comments are written in as long as it can be reasonably translated, there needs to be comments.
@@DrewTNaylor JavaScript coders don’t know it’s possible to add comments to their code. Or they think it will be too big and run too slow, while the actual code is poorly optimised and badly designed.
And how often do those docs reflect the actual state of the code?
# TODO: fix this bit properly
I changed from engineering to programming jobs by my first moonlight assignment translating legacy systems from HP and Data General to an IBM PC. I underestimated how long it would take just to proof read 250,000 lines of code! Although they were 'just' versions of Basic they were actually very different on the display (single line v full screen), Fortran style print statements and plug-in cartridges that solved matrix equations and sorting. I soon realised I could write an interpreter of the source and write out a new program that could be tested on the original inputs and outputs luckily one line at a time because that is how BASIC worked. At least GW Basic had functions instead of GOTOs!
It was insanely successful in preserving the intellectual property of the companies developed over years. I did one job for $10,000 and I later found the next quote to rewrite from scratch was $250,000.
One feels this fellow is reinventing the wheel. If you want to modernize COBOL, PL/I and other enterprise software assets then talk to the experts at Micro Focus and/or Heirloom Computing. For example, Micro Focus’ COBOL compilers can convert to C, Javen and .NET already.
I feel this is a much more formal and more scientific research like approach (hence the UML output), more about understanding compared just making a transpiler for cobol as a tool.
Whooosh
@@EraYaN Micro Focus has tools that do this sort of detailed analysis within their product suite. In fact, these tools go beyond just a program and look at the whole system (code, job control, data, forms etc.). To be fair, Micro Focus started this sort holistic approach as part of their Y2K solution in the 1990s.
0:30 It's very hard to read the code. Could you screen record in future please? OBS works well, dead easy to use.
That would be a good upgrade!
Corporates and the like (such as a university) don't like you installing random open source programs, because it requires effort to verify it's not a threat to their network. Unlikely that Grady could get them to allow it for a UA-cam video.
@@MrGeekGamer If it's windows you can record the screen just by pressing win, alt and R. For Linux you'll need Kazam or similar but if you're a Linux user you're installing six hundred and thirty three apps a day and changing your distro once a fortnight so I'm sure you'll get away with one more.
@@tophat593 You're joking right? You think that "XBox Game Bar" is installed on corporate installations?
Spoiler: It is not.
@@MrGeekGamer It's installed by default. The question is whether corporations actively go in and spend time and effort uninstalling it. I'm sure there are some that do, the vast majority wouldn't bother. It's considered safe and turning background stuff like this off in windows can even cause issues and give you a non-standard installation which is just a headache.
Although, in a high security environment where you want to prevent recording of sensitive internal zoom meetings and the like. Sure, that would make sense.
COBOL is still going strong in the banking industry, I develop mostly payment processing applications running on a IBM z/OS mainframe.
Why isn't COBOL taught these days?
@@jklax It is very application specific I was taught on the job I have more experience programming in C++
@@jklax COBOL was created back in the 50s as a business oriented, supposedly manager-readable language that would make plain what the mysterious computer geeks where doing inside their cubicles and therefore make their product more manageable and maintainable. Unfortunately it didn't address the problems of poor specification, poor documentation, bad coding habits, and other issues that were more the cause of the difficulties. When these concepts were taught in colleges, it was in connection with languages that enforced some of the better coding practices, but many of the businesses were loaded with mountains of black-box COBOL code that wasn't much clearer than the assembly that preceded it, and everybody was afraid to touch the jury-rigged edifice and maybe bring it crashing to the ground.
Short answer - COBOL didn't solve the problems it was created to address, and was abandoned by academia in favor of more robust languages and codes of practice. Business didn't follow the trend because updating their software structure presented too much risk. Same reason airports use equipment and code that is sometimes decades out of date.
Early in my software career the bane of my life was being given reams of uncommented spaghetti code in Basic, Pascal, etc. to somehow comprehend and then modify to integrate new features. Sometimes it was easier to just rebuild it from scratch in a better language. Tools like this reverse engineering engine would have been so useful back then, but at that time even UML was in its infancy.
I think Wordpad has been deprecated as code editor and Microsoft suggests to use Office Word 2022
That would be a problem, as WordPad is free to use and Word is not.
@@DrewTNaylor I think your sarcasm detector might not be working.
@@ChristopherHailey Oh, I didn't know it was sarcasm.
Ah, I remember those days
"Hello, it looks like you're writing a mission critical business application", Clippy would say.
Please use screenshots or screen capture software. It would be nice to read the code along with the video.
This is super interesting. I was always told that COBOL was well suited for describing business logic because it was fairly abstract. But i guess that was by the state of the art half a century ago. The COBOL I've seen, comes over as barely above machine code to my modern sensibilities.
I think this might depend a lot on the age/ver of COBOL as well as the programming style used to write the application in question. In the company I work at, we have a fairly large, mainframe-based, vendor legacy COBOL system used to administer financial products. The system was originally written using a post COBOLII version of the language and has since been upgraded to IBM's latest ver of the compiler. The system is made up of about 3000 programs with the size of the procedure divisions ranging anywhere from 5000 to 15,000 lines of code. The licenses with the vendor allows my company to make significant changes to the source code in order to support the types of financial products being developed. One key feature of this legacy system is how it was originally designed. It uses a very strict structured programming philosophy and takes advantage of the cleaner COBOL coding style that became possible after (I think) COBOLII was introduced (way back in the 80s I think). . As a result, structured programming language concepts are prevalent throughout all programs within this system - i.e. in-paragraph loops are accomplished through PERFORM/UNTIL/END-PERFORM stmts; conditional stmts are always bound by END-IF clauses, and complex case logic is handled by EVALUATE/WHEN/END-EVALUATE statements (some of these structures were not available in much older versions of COBOL). Because of the strict coding standard that was applied to this system, spaghetti-like/"GOTO rich" logic is entirely missing from the system (GOTOs in this application are strictly only used for abnormal paragraph exits in the event of errors being detected). This makes the code much easier to read in a top-down fashion.....(the programs are literally designed to mirror the business flow that describes how the financial products should work. As a result, it makes it easier to identify which programs need to be changed (and where the change should be applied) when new bus requirements get sent to the development team. However, I completely understand that not all legacy systems are written this way. In my career I've also encountered legacy mainframe COBOL programs that were designed/written using much older versions of the compiler that did not support some of the syntax described above....and as a result in these other older systems you end up having to deal with code that might be much harder to follow.
@@DC-id2ih that's super interesting, thank you for sharing. The COBOL war stories I know, were mostly from my dad, who worked in EDP/IT-Auditing in 70s, 80s, and onwards. COBOL wasn't exactly a new language at that time either, but it was clear that a well maintained COBOL-application, wasn't the worst thing his auditing team could encounter (like APl, for which apparently need a special keyboard to even write it) It makes sense that it would have evolved and improved, as you say. One of the first languages I was first taught, around the 2000s, was Java, and modern Java is a different beast altogether, with many new features. Same deal with COBOL, I'm sure.
@@jens256 A quick search says, "Classes and interfaces have been in COBOL since 2002. Classes have factory objects, containing class methods and variables, and instance objects, containing instance methods and variables. Inheritance and interfaces provide polymorphism." The Wikipedia article on COBOL is worth a quick look. Lots of interesting stuff, both tech and cultural.
Actually COBOL is quite the opposite of abstract. It's also very, very, very, far from machine code. COBOL is an attempt to make something similar to the English Language and this has long since been determined to be a bad idea. It is indeed primarily intended for business programs. The term business logic just means that it's describing arbitrary rules such as tax code as opposed to something not arbitrary like the laws of physics like one might do in scientific languages.
Used cobol a little bit maintaining inventory system, it's not bad, and very easy to read.
I'm quite interested in this topic as it always bugs me that there are so many targets for code and languages which aren't cross-usable. Often it would be amazing to be able to integrate two things without having to resort to separate services or processes.
this is something to consider when writing anything mission critical, choose languages and frameworks that will be around for the long-term - or at least supported for the long term. otherwise your grand-children will be left with a major headache.
This is why I chose JavaScript, unironically. It's event driven by nature, asynchronous by design, and used by literally everyone who's ever bothered to learn programming in the last 30 years. It is the most widely developed language, and the one with the most money thrown at it. I laughed at myself when I came to the conclusion that my core applications should be written in the language I scoffed at for most of my life, but it is what it is. It's the best language available today for general maintainability, and ease of access to talent.
There simply wasn't the choice back then. If COBOL is all you had, the future was COBOL.
Part of my full time job is to convert legacy C++ code to modern Typescript front end code. It's challenging sometimes since as you know it isn't always one to one, you have to be clever and understand what the C++ code is doing and think how would JavaScript or TypeScript do it.
For those wondering, the app is written in C++ and the company wants a more modern look and feel to it and be browser based. So I take the modules that were written in C++ and I go through it and write the Angular/typescript code based on what the C++ does in the legacy app. Sometimes the C++ does things of course TypeScript can't do, so with that I will sometimes need to write C# code and reference that with the TypeScript via a controller. It could be a DLL wrapper or something like that, that of course TypeScript can't do.
Can this handle assembly language as input? If so, it might be useful for making open-source firmware replacements. Dump the ROM from a device, run the binary through a disassembler, run that through the specification-generator, then use the resulting specification to write a functional replacement.
Anybody, even someone who's never seen COBOL before, can look at those few lines and instantly see what it's doing. Yes, it's an old language but it is not some lost cryptic notation that only an archaeologist could decipher. It's damned simple, even if it requires more discipline to avoid spaghetti code than modern languages do. You can still write spaghetti code in any language if you want to.
My man is using a chisel while everyone else has a CNC machine at home
I guess it also makes sense to have some kind of blue green deployment in order to make sure the application still does the same.
We are in the middle of re-writing programs at work that were in Algol-60, and Z80 Assembler.
This is why the Y2K bug was potentially a huge problem. People don't understand this and think it was all a big hoax but a lot of work goes into forward compatibility that non-technical people can't even begin to appreciate
Year 2038 bug, here we come!
If y2k was not warned so broadly in advance, significant amount of software would probably have failed. Tech people understand but what about C-suite people who probably first ask how much the fix costs and then forget it until sh.. hits the fan.
It's like someone with epillepsy taking medication and complaining that it was a waste of money because you didn't have a seizure.
There's no evidence that not addressing y2k would have caused any real issues.
Lots of money wasted all over the world, checking that the receptionist's 486 was y2k compliant.
There was some legit issues, but 98% of it was computer companies scamming government entities.
2038 will cause much more problems. But that's decades away right?
@@tbird-z1r Just because there was a lot of overexaggeration and fraud doesn't mean that it wasn't a real problem.
There was massive fraud during the last health crisis and lots of overcounting and overexaggeration but I don't think any sane person would deny that it was a real problem.
We don't know for sure what could have happened if some of those systems were not corrected. Software has a tendency to break in really unexpected ways when something as simple as a buffer overflow can cause an entire system to crash. I used the word "potentially" simply because I don't have the infinite wisdom or some alternate dimension to know for sure that it was but I'm on the side of an abundance of caution when it comes to dealing with critical systems.
The fraud issues are something else entirely that has to do with how screwed up our political system is. But that's outside the scope of computerphile
Fascinating, would like to see more of Dr Kevin Lano Reader.
@@MenaceInc "Reader in X" is a common academic position in UK universities (and perhaps elsewhere but I mainly know the UK)
Zarquan! Back in the 00s the GNU people had GnuCOBOL which was a transcompiler to C. Not sure what his does different other than making it "java".
Very interesting! 😊Would love to know more about the text format mentioned that ANTLR parses into. I also wonder if doing an AST based translation would not result in bugs being ported from source language into the destination language? 🤔
I say you actually *want* to port bugs in case they are required for the function of some systems, like API consumers who use a workaround that would break if the buy were fixed.
I use OCL (Object Constraint Language), a standard language that is part of the UML, so a kind of universal software specification language. And yes, bugs are translated unchanged - it would be an extra step to check and remove them, ideally before target code generation.
Don't know about Cobol, but in BASIC, you have no structured control flow. You can basically (no pun intended) jump around from anywhere to anywhere which makes it quite hard if not impossible to automatically transfer it into a modern language with structured control flow without making a huge mess. I would be interested to see how they deal with that issue.
I think cobol allows gotos (I think you can see it in the video itself), but most of the time large systems will have most of their gotos structured as de-facto loops or other modern control flow structures, just because it made writing these systems much much easier. Our modern control flow largely derives from these folk practices anyway, and they became formalised once it was proven that loops and if-else statements were sufficient to enact any control flow required.
visual basic 1.0 came out in 1990-91 - Dartmouth BASIC (quite a different language) is from the 60s
I remember reading some sci-fi novel where there were some sort of "digital archaeologists"... In the far future they replaced real programmers... Instead of writing new code they were looking for ancient code that did the job...
Right now I'm converting one of my old websites from Actionscript 2 to Javascript.
A medal is on its way in the post.
That's only slightly more modern than Cuneiform
@@ChristopherHailey The silly thing is that the interactive elements were originally written in Javascript. People kept telling me to convert it all to Flash / Actionscript as it would "look better". I deleted the original Javascript thinking I wouldn't need it.
@@combatking0 That's pretty funny.
I didnt know Petelgeuse was a programmer!
I've recently got a job with legacy Java code 20+ years old (Jsp/Servlet era).
There is no a slightest chance of rewriting it, making it more modern :)
This seems like an excellent example of using an LLM to modernise and rewrite code
When I saw legacy code in the title, I wasn't expecting legacy languages, but the more common usage meaning poorly written/convoluted/unmaintainable OG code, usually written by someone in another department who didn't know better before hiring proper devs, which everyone's afraid to touch.
A tip for future devs: the most important thing in maintainability is readability and ease of understanding. Organize your classes to make sense to humans. Name your functions and variables so they read close to natural language, ie IF (operationHasFinished) { . . .
I mean those are basically the same thing in this context. It's just that by unmaintainable OG code, we mean REAL OG written back before people stopped using gotos, having a string type was decadent, and you had to worry about your variable names taking up too much disk space in the source code.
First time I have seen someone code in Microsoft Word.
I guess it's always there (on windows systems), and it might have some features, like saving to a specific character encoding scheme, that notepad doesn't have. It might have the same frankenstein pattern engine for search and replace that's used in Word.
It is not "Microsoft Word", he is using WordPad.
39 years ago, we were preparing our COBOL code in wordstar.
I used WordStar too at one time, for assembly code. It caused a bug that took a long time to track down - it set b7 of a character which looked and printed out just fine but the assembler took one look, said "Invalid character" and simply ignored the line - no error or warning!
It is so clear to me that some programmers do what they do for entirely different reasons than I. Which is great! But man... as a games programmer I program to invent stuff. Not to program. Programming is a mean to an end. To create. This, taking another persons creation, and then translating it into another language...? He is a clever translator. I would not know how to do what he does.
They're a different breed of person. Hyper turbo autismos, but talented beyond compare. I've met several of them, and they are extremely strange creatures. Nearly one dimensional except for the odd esoteric hobby. For most of them, the idea of sport, or outdoorsmanship is horrifying, and socially they're abysmal. But give them a technical task, a mundane one that must be solved procedurally, and they excel in ways that make my skin crawl. I'd say they were fictional had I not met many of them.
99% of the code that current legacy business applications consist of wasn't written by people with passion; they were written by people (many of them, spanning generations) who just needed to pay their bills. How much passion can you have for your task to write COBOL that generates a monthly bank account statement? Add numbers in a column, add summary at the end, and format it all in a simple ASCII (or rather EBCDIC) table?
Can confirm on all counts. 2,000 lines of VB6 is big enough to make a nice proof-of-concept yet small enough for a programmer to validate the result independently, but business-types will remain skeptical. Yes, py2 -> py3 takes an unreasonably long time when you're worried about stability at all costs. And yes, banks still employ COBOL programmers. IDENTIFICATION DIVISION.!
I would never write COBOL code like this. And it's not even safe or maintenance friendly.
Even if there was source, I would still transform the object code because (as we all know) the source rarely matches the object and so, will always give the "actual spec". If the compiler is known, then recompiling either the generated or original source should result in the same functionality.
I believe, therefore, that a Cobol Dis-Assembler for each machine the code is to run on would be a better investment. Most compilers have this ability nowadays. Plus this provides the ability to compare code and convert the object for different compilers on the same machine. I used to use this method to show people that ICL 2900 used hardware BCD arithmetic and the data didn't need to be converted into a COMP for processing arithmetic.
All code is legacy from the moment it was written ...
It's also been said all code is in beta
Fantastic video as always
First version of Visual Basic is from 1991 (VB6 is 1998). He must mean some other BASIC.
Indeed. BASIC was around for decades before Microsoft made it visual.
@@SteveGouldinSpain but he said Visual Basic.
I was about to comment the same. I still support and develop VB6 applications to this day for my day job, they're far too big and complex to completely rewrite or translate. We do use modern languages for components where possible though.
@@Wiggs1979 yep, it's still completely interoperable with .NET, even the IDE still runs on modern Windows with some tweaks. there are even one or two alternative compilers (far from complete)
@@Wiggs1979 How big is too big and how complex is too complex?
visual basic dates back from the early 60's?
It does not, I was very confused by what he meant by this
@@brazni I guess he meant just BASIC - not visual basic. BASIC as such dates back to 1963.
Doing the COBOL to Java conversion for a critical process at the bank I work at now. Still a long ways to go
What hardware is COBOL on at banks?
@@jklax 100% chance he is not allowed to tell you about the system environment.
@@wadu7205 You're most likely right.
Would love to see a video on what is meant by the ‘semantics’ of a program or language
you can think of semantics as the WHAT you can express in a language as opposed to syntax which is the HOW you express it. For example in Python compared to C, classes are a semantic feature that the C language does not have.
@@Kersich86 _I_ know what semantics are; I’d like to see a video explaining why they’re important in a manner that a wider audience can understand and appreciate. Particularly with regard to how they relate to the notion of program specification and proof.
the syntax is the structure of the language and what it looks like:
- in English for example every sentence must end with a "." and you can only use words in the dictionary and made up names
- in C for example every statement must end with a ";" and you can only use variable names that you declared
i recommend the Comupterphile video *"Parsing Explained - Computerphile"* about syntax
the semantics is what the syntax means or does:
- in English for example you can lookup the meaning of words in a dictionary. words have a different meaning depending on the language
- in C for example you could look up the meaning of a statement in a book like "The C Programming Language"
i recommend Udacity video *"Syntax Vs Semantics - Programming Language"* about semantics
i don't remember any video from Computerphile about both, but it would be cool to see such a video.
@@xybersurfer i have a copy of The C programming language and can assure you it does not have a description of the semantics of the language. The C language specification has some, but not really enough to guarantee a single consistent meaning to even simple programs written in C.
@@xurtis i am aware that it falls short. i was never really a fan of the book to be honest. i'm not really expecting OP to go read that book. it was merely meant as an example, to give a rough idea. sorry for making it look like i was endorsing that book
Opening code in Wordpad. Exactly my type of humour.
What are the modern engineering approaches he mentioned (specifications, models, etc.)?
Said with all respect, Dr Lano has such distinguishing features that he could have, or maybe still could, make a killing as a performance character! What a unique face!
GPT models must be revolutionizing legacy code conversion....all of a sudden the output potential of humans is hugely leveraged by machine learning.
@5:59 Dr Kevin Lano talks about CSTL. In case if someone else also confuse it with CSDL as appears in CC caption.
I'd be curious to know if this technique could be used to more easily change code dependencies. My job would be a lot easier if, for example, this could automatically replace a legacy framework with a more modern one, e.g. CORBA with gRPC.
COBOL will be around for another 50 years.
Back in the 1990s it was easier to teach a C expert FORTRAN than it was to maintain C code generated by F2C. (In my experience, YMMV, etc.) Hopefully the tools are better now.
Very interesting
We still use visual fox pro at our company and would like to convert it to a new language, but the effort would be so great starting new would be cheaper than trying to rewrite almost 30 years of code with many many bad design decisions and bugs ..
amazing video, thank you so much for sharing
I wonder if it's possible to dump it in Chatgpt and ask it to rewrite it in whatever language
Amazing work!
The great thing about old code is often it is doing things for reasons entirely invisible. We used to say COBOL was an Italian coding language, because it always looked like spaghetti 🍝
Goto's are great - when used sparingly, in limited situations. Well placed GOTO can increase readability of the code, for example as an exit to a common destination from different layers of multiple nested loops . Also goto=jmp is used all over the place in assembly. Don't dis goto, just don't abuse it.
Do these tools preserve comments? I don't know how it is in COBOL, but in many programming languages comments can go anywhere and aren't part of the AST.
This didnt happen overnight, these companies have this problem from years of deliberate underinvestment in the people and depts that manage/maintain their code and systems, all this does is empower them to do it all over again but not to worry soon A.I will maintain it for them.
I feel sorry for the poor person who eventually needs to reengineer code I have written
Careful - that person might be you.
I've heard of some OG programmers still using notepad.exe but I had completely forgotten wordpad existed.
I maintain a heterogeneous environment of Windows and Linux servers, and if I'm on Windows, I often use WordPad because it's always there and it seamlessly handles UNIX-style line endings (unlike Notepad).
I wonder how you deal with language specific unusual behaviour like code being tied to clock cycles or something weird like that.
How do you deal with COBOL doing arithmetic in decimal fractions?
Make some kinda weird tuple that behaves the same way
It's like any other format, decimal numbers (zoned or packed) represent numbers, internal and external representations are up to the program. A common example would be money, e.g. USD. You can represent dollar amounts as pennies, e.g. $1.23 would be "123". When you format the output you would insert a decimal point to represent as dollars, "123" would display as "1.23". This would be fixed point and is used because it's exact. On the other hand, floating point is an approximation, and COBOL supports fp data types.
Amazing video
is there a lot of cobol software running in production in banks today?
goto still has its uses. Fight me :P
I agree, sometimes its pretty awkward to get out of an switch statement by cascading through and break, sometimes you want different endpoints. Its still pretty nasty though if people start misusing it way out of scope.
@@rickyrico80 I've seen some horrors lol. I certainly get what you mean. :D It's a last resort tool when other options have been exhausted. Kinda like the volatile keyword. Rarely is it actually needed :)
Really good task for LLMs
Man knows the code.
wait...10 lines of COBOL became several pages of Java ? which runs faster ?
Languages that have fewer lines of code do not necessarily run faster. You might find that one line translates to 100 lines of machine language but in the other language is tends to represent 10 lines of machine language. COBOL tends to represent numbers in decimal which is no longer directly handled by most modern hardware so COBOL tends to run very, very slowly on modern machines. IBM computers like the Series Z do support decimal for this reason.
No, the 10 lines had a lot of GOTO, which went to other parts of the program which weren't shown.
I am not a COBOL expert, but that code is the equivalent of:
a = 4
b = 2
Very informing. My thanks.
I've done a few of those...
I actually wrote COBOL programs back in the 80s on punch cards! Never going back.
I wonder how much money you could command with your experience...
Do you still experience PTSD? :)
Did you ever drop the stack?
Yes, you're not a *real* programmer unless you know how to indent (or outdent) source code using an IBM 029 card punch. :)
(Hint for the youngsters: it involves deploying a wet finger.)
Fascinating
…That COBOL program was a fraction the size of the resulting translated versions
Not taking anything away from the man, but as a very honest question: could AI also analyze legacy code, convert it to a modern language, and perhaps even optimize the same program, too?
Key thing to remember is Artificial Intelligence isn't Real Intelligence. It just looks so much like real intelligence that we can fool ourselves into giving it more credit than it deserves.