A Very Basic Background on COBOL
COBOL, or Common Business Operating Language is an old programming language found on IBM mainframes, which powers banking, finance, and airline reservations. It was formed by committee, so that all hardware and software sold to the US government would be more-or-less interoperable. In an effort to make computer programming easier to learn COBOL used "natural language programming", where the programming language is made as close to regular speech as possible. As part of this initiative, periods rather than semicolons are used to end lines. One of the first things you'll probably notice is how easy it is to read. This is important so that people like accountants and auditors without much special training could confirm that the programs were doing what was asked of them.
Common Objections
COBOL doesn't work like modern languages.
Well, sure. COBOL is a procedural language, while now functional and object-oriented languages are more popular. But is it so different from other modern procedural languages?
COBOL can't be object-oriented
Modern versions of COBOL have supported object-oriented programming since the early 2000s.
COBOL is terrible to read, spaghetti code.
COBOL was created so that people without any background in computers could program them, with minimal instruction. Is COBOL code worse than 1st semester python programs?
COBOL code is a crazy mess, with tens or hundreds of thousands of lines!
Imagine, for a moment, that the same 4 people worked on a C project together for 40 years. They knew none of them would leave and that new people wouldn't come in. They knew exactly what they were doing and why. Don't you think that they would write a lot of code? Don't you think they would build on top of old assumptions? Do you really think they'd be writing comments and documentation on anything? For goodness sake, they started coding together before the idea of refactoring was even named, before unit testing was invented, before version control existed.
Being all caps makes COBOL hard to read.
There are modern implimentations of COBOL with the lower-case preference we have now.
You can't work with COBOL on modern computers
You absolutely can. VSCode handles COBOL well, and you can compile to .exe with GNUCobol.
COBOL has no libraries it can use
Many modern COBOL implimentations can make good use of both Java and C libraries. They also have native support for mySQL.
COBOL can't use functions
Nearly all modern implimentations support user defined functions.
COBOL is weird because indexing starts at 1.
COBOL was designed for normal people without a specific programming education. You might not remember it anymore, but when you first learned about indexing, you found it weird and unintuitive to have the first element of an array be the "0" index, the 2nd element to be "1", the 3rd element to be "2" and so on. COBOL relies on the normal human intuition that the first item will be labeled with "1". COBOL is not alone in this, Lua does it too.
COBOL is weird because it displays loops differently.
COBOL was written before loops were a standard architecture. Visually, loops look a bit strange in COBOL, but their use in modern implimentations is pretty straightforward.
COBOL is weird because it ends things with periods, rather than using curly brackets "{}" or a semicolon ";".
When COBOL was written, the semicolon was already in use, but it wasn't yet dominant. There were quite a few other languages at the time that used the period "." to make it easier for untrained people to take up programming quickly.
The 80 character line limit in COBOL is annoying
Many modern implimentations of COBOL do not have 80 character limits. The original reason for the limitation was because COBOL was a physical language -- if you used more than 80 characters per line, the data wouldn't fit on the punch card! COBOL maintained backwards compatibility with that for a long time. But with modern implimentations, there's no reason you cannot use more contemporary line lengths.
There are no good IDEs for COBOL!
Depends on what you mean by "IDE". If you're referring to editors, VSCode supports COBOL. There are actual IDEs available, about half of which are expected to run on mainframes.
There's no native GUI support for COBOL!
There actually isn't any either for Python, but when has that stopped anyone?
COBOL doesn't support standard operators like "*/+-="! You have to type using words instead.
I'm not entirely sure if this was ever true, but it definately hadn't been true since the 1980s. All modern COBOL implementations could accept "CALCULATE C=(A**2+B**2)**0.5"
Quirks
Some of the quirks of COBOL come from the fact that the language is so old that it used to run on punch cards. For this reason, the amount of space from the left-hand margin is important for the computer to understand the instructions you give. Some parts are manditory, while others are completely ignored by the computer. IDEs will make your job easier by displaying vertical lines so that you can place characters in the correct position. Many modern implimentations have "free" mode, where you can ignore these column restrictions completely. This has become somewhat popular.
Characters | Used For |
---|---|
1-6 | Optional Line Numbers |
7 | Indicator, denoting somethings special about the row |
8-11 | Headings and readability |
12-72 | Normal Code |
73-80 | Freespace! |
Note, however, that in many modern implimentations you can have as many characters per line as you please, as long as they begin on the correct part of the line. Other quirks are a result of the fact that many common concepts and ways of doing things in computer programs hadn't even been invented yet, like the standard way of structuring a FOR loop.
Structure
The structure of COBOL is hierarchical, starting with Divisions, then moving down into sections, paragraphs, sentences, and ending with statements.
- Statements: These are simple instructions like ADD A TO B.
- Sentences: A collection of one or more statements ended by a period.
- Paragraphs: A readability feature to help you group sentences that work together. These can be called by name.
- Section: Some sections of paragraphs are user-defined, while others are manditory.
- Divisions: There are 4 manditory divisions for each program, and newer versions of COBOL have a few optional ones.
- Identification: Contains information about the program and who wrote it.
- Environment: What type of computer this will run on, and "mapping".
- Data: All of the data will be used, where it will be stored, and what will happen at the end.
- Procedure: The normal part of the program.
Variables or "Datanames"
All variables and their hierarchy is specified in the data division.
Requirements:- Under 30 Characters
- Can only contain letters, numbers, and dashes
- You must declare the type of variable (ex. letters, numbers, or a mix of both)
- You must specify how many characters the variable takes up. If you don't, it defaults to one.
The requirements for the datanames are set in the picture or "pic" area. For example, "pic 9(4)" would mean a 4 digit number field.
- 9 accepts digits. NOTE: because "." is used to end lines, the letter V stands for decimal places.
- A accepts alphatic characters.
- fresh X accepts any ascii (or ebcdic) characters.
Under normal circumstances, numberic fields only contain 18 digits, while alphabetic ones can reach 255.
Procedure Division
Paragraphs are named: try to give them good, descriptive names to help you understand them when you reference them later.
Paragraphs can be placed before or after what calls them, but they should remain nearby so that they are easy to find.
COBOL can reference different parts of its code, but this is done sparingly. Be careful to avoid using "GO-TO" unless you understand exactly how it works, and you're sure that it is the only option.
Control & Manipulating Control NEW
Probably the single most important thing to understand in COBOL, which does not operate in a way immediately recognizable to programming beginners, is the idea of control. This concept is not really mentioned in any depth by people who attempt to teach about COBOL, and so I'll explain it to the best of my ability.
Control is, basically, the idea of who is causing what to occur. COBOL will start at the very top of the procedure division (the "real" code) and execute every single line without fail, unless told otherwise. The process of "telling it otherwise" is the idea of manipulating control.
One of the most basic forms of manipulating control is the PERFORM paragraph-name command.
code | output |
---|---|
Classic COBOL without manipulation
PROCEDURE DIVISION. DISPLAY "Hello, world!". DISPLAY "What is your name?". ACCEPT YOUR-NAME. DISPLAY "Hello, " YOUR-NAME. DISPLAY "Goodbye, " YOUR-NAME. |
Hello, world! What is your name? >John Hello, John Goodbye, John |
Failed COBOL manipulation: paragraph hello-name called successfully, but since there is no STOP RUN, hello-name is repeated.
PROCEDURE DIVISION. DISPLAY "Hello, world!". PERFORM HELLO-NAME. DISPLAY "Goodbye, " YOUR-NAME. HELLO-NAME. DISPLAY "What is your name?". ACCEPT YOUR-NAME. DISPLAY "Hello, " YOUR-NAME. |
Hello, world! What is your name? >John Hello, John Goodbye, John What is your name? >John Hello, John |
COBOL with failed control manipulation: paragraph hello-name never called, and YOUR-NAME never entered
PROCEDURE DIVISION. DISPLAY "Hello, world!". DISPLAY "Goodbye, " YOUR-NAME. STOP RUN. HELLO-NAME. DISPLAY "What is your name?". ACCEPT YOUR-NAME. DISPLAY "Hello, " YOUR-NAME. |
Hello, world! Goodbye, |
COBOL program with failed control manipulation (GOTO fully transfers control, with no return).
PROCEDURE DIVISION. DISPLAY "Hello, world!". GOTO HELLO-NAME. DISPLAY "Goodbye, " YOUR-NAME. STOP RUN. HELLO-NAME. DISPLAY "What is your name?". ACCEPT YOUR-NAME. DISPLAY "Hello, " YOUR-NAME. |
Hello, world! What is your name? >John Hello, John |
COBOL with successful control manipulation: code performs hello-name then returns control to the previous section.
PROCEDURE DIVISION. DISPLAY "Hello, world!". PERFORM HELLO-NAME. DISPLAY "Goodbye, " YOUR-NAME. STOP RUN. HELLO-NAME. DISPLAY "What is your name?". ACCEPT YOUR-NAME. DISPLAY "Hello, " YOUR-NAME. |
Hello, world! What is your name? >John Hello, John Goodbye, John |
Examples of manipulation control include GOTO, PERFORM paragraph-name, PERFORM UNTIL end condition ... END PERFORM, IF ... ELSE ... END-IF, EVALUATE variable ... WHEN condition ... END-EVALUATE, EXIT PARAGRAPH, STOP RUN, GOBACK.
Reserved Words and Functions NEW
There are a lot of reserved words in COBOL. Between different implementations, the count ranges from 180 to over 300. Understanding how all these work together is important for creating a program that works the way that you intended.
GOTO
More than any other reserved word, GOTO is the least loved, and even banned by many organizations. The details of this argument can be best understood by reading one of the foundational papers to raise awareness on the issue: GOTO statement considered harmful. The short of it is, that GOTO statements, since they do not return control, move control to an entire region of the code that will be executed, line after line, without stopping until a "stop run" is encountered, or the end of the program. This introduces some issues, since when PERFORM is used, the contents of the next paragraph are completely unimportant. However, under GOTO the next paragraph will likely be executed, leading to confusing bugs that are hard to isolate. This was worked around in the past with "exit paragraphs" that would catch stray GOTOs before they got too far. Overlapping GOTOs also made it difficult to safely insert new paragraphs within the code structure, as it might accidentially effect GOTOs that you didn't realize would include the line of code.