Memories of COBOL

5 October, 2025

Tags:

Cobol

This post is sort of a therapy session where I dump all of the weird things that I have seen when working with several huge projects in COBOL 85.

It might be useful if you have a morbid curiosity about this language, or if you (for some reason) are deeply involved with the language and want to see how it looks from an outsider’s perspective.

As of now, I’m a developer, mostly in the web, mostly in the backend/data engineering side of things. While I spend a lot of time in Python, I might have to explain things in terms of other languages, depending on the context. I also expect at least some experience with programming and won’t go into basics of how modern computers and languages work.

record scratch Yep, that’s me

Imagine a young student, who got offered their first SWE job in the third year of university. Accepting a first job is a no-brainer, and most of the conditions can be ignored, even though the impact your first job has can be lasting.

So what was the job?

We were working with old COBOL codebases. Like, “last modification of this file was done in 1992” old. Like, “the first version of this file was written in 1979” old. This automatically means a couple of things:

No VCS, since this concept did not exist at the time. Versioning of file works by commenting the changelog of your program at the top of the file.
No/minimal technical specification. People really treated programming and planning differently back then.
No real way to run the code locally in any capacity*.
Most of the people that wrote the code in the first place are 100% not available, since they likely retired 10+ years ago.

I will explore implications of working with an ancient codebase, in a language that resists being written in, using technologies that can’t really be obtained by normal people. Even though in some sense this is a nightmare, I got a set of skills that have proven to be useful for my whole career.

Bear in mind that I might be forgetful about some details of the language - I’m lucky to not have practiced any of it for about a decade at this point. The general gist of my complaints are true, but details about syntax can have mistakes. I also refuse to compile and check any of my code examples, but will appreciate a feedback to fix mistakes, if any.

Anatomy of a fall COBOL

While this is not a full COBOL tutorial, it would help to briefly describe how a generic COBOL program looks like. Here’s how one would write

      * this is a comment
       identification division.
        program-id. hello-world.
       environment division.
       data division.
       01 USERS OCCURS 5 TIMES.
         02 NAME PIC X(20).
         02 ZIPCODE PIC S9(5).
         02 COUNTRY PIC XXX.
         02 ZIPCODE-TEXT REDEFINES ZIPCODE PICTURE 9(5)V9(5).
       01 USER-IDX PIC 9 VALUE 1.
       procedure division.
        001-main-s section.
        001-main.

            display 'hello world'.
            perform 005-display-user varying user-idx from 1 to 5.

        001-main-exit.
            stop run.

        005-display-s section.

        005-display-user.
            display 'name: ' NAME(USER-IDX)
            display 'country: ' COUNTRY(USER-IDX)
        005-exit.
            exit.
       end.

Now, let’s dissect this part by part:

Basics

A COBOL program is made of divisions. There can only be one instance of each, and most of them are optional (except for identification). The most used ones are:

identification - defines program metadata, author, version etc.
data - defines all the variables that can be used. In 85, you don’t get the privilege of allocating memory dynamically - everything is pre-allocated based on what’s in the data division.
environment - mostly used to define files accessed
procedure - the actual program code that gets executed.

In summary, you get all of the perks of a language so old that it was one of the first to have concept of structures:

everything in a program is global
everything is mutable. You do named constants by using a control division with text replacements, which work similar to C preprocessor.
no concept of functions, only “paragraphs” and “sections”. In the sample above, 001-main is an example of a paragraph. The main usage of a paragraph is as a label for GOTO or PERFORM statement, which is something like GOTO, for, while loop in a trench coat.
you get all of the wonders of C-style pointer arithmetic, more on that later.

We’ve never left the punch card land

Some of you might have already noticed, but the example program had a suspicious indentation to the left of it throughout the whole example. This is because first 7 characters in each line are reserved. The reason for this is extremely simple: in a typical punch card the first 6 characters are reserved for statement number, and 7th character is an optional marker: if it is empty, the line will be processed by the compiler, otherwise, it is treated as a comment. I assume, this was kept for long after the punch cards were not used anymore for backwards compatibility.

In the “modern” code, those symbols were usually left unused, sometimes used for marking specific lines, as additional comment space.

yeah we’ve got structs, no big deal

While not the first language to introduce the concept of “let’s pack a group of variables into one blob that contains those values”, COBOL certainly has some …unique features. Let’s start with basics: every variable has a number called “level”. If the level is larger than level of previous declaration, that field belongs to the previous one, making previous declaration a struct. For example:

       01 USER.
        02 USER-NAME PIC X(10).
        02 USER-SURNAME PIC X(10).

This defines a memory area 20 bytes long, where first 10 bytes are taken by USER-NAME, and second 20 bytes are taken by USER-SURNAME.

The level number should be a positive integer (obviously), with value from 1 to 49(what?). I’m guessing someone decided that 49 levels is enough for everyone.

Additionally, there are special level values: 66, 77, and 88:

66 level

This is the only one that allows to RENAME a variable(s) within a struct. This means that essentially you get a pointer to a field with a different area that belongs to it:

       01 USER.
        02 USER-NAME PIC X(10).
        02 USER-SURNAME PIC X(10).
        02 USER-CREDIT-CARD PIC 9(12).

       66 USER-FULL-NAME RENAMES USER-NAME THRU USER-SURNAME
       66 USER-BLOB RENAMES USER

This means that USER-FULL-NAME points to a 20 byte-range into USER struct, while USER-BLOB points to a whole USER structure. Yes, including the number - which in this case will be a 12-byte-long character sequence, but more on that later.

77 level

For most purposes, this is the same as 01 level, but it forbids creating underlying fields, so this forces having just one top-level variable.

88 level

This is a curious one. Similarly to 66, this is a fake variable, and it sort of reminds me of enum in a more modern language. 88 levels always have boolean type. They work as a get* method, that looks at the variable one level above, and returns true if it matches the predefined constant:

       01 USER-CARD-TYPE PIC(3)
           88 USER-DEBIT-CARD VALUE 'DEB'
           88 USER-CREDIT-CARD VALUE 'CRE'

If in the program you try to get value from USER-DEBIT-CARD, it will return TRUE only if the USER-CARD-TYPE has a specific string. As far as language feature goes, this is actually useful, and interesting: I can easily imagine having something similar in a modern program, just implemented without all of the substructuring nonsense. Good job, COBOL!

insert a picture pun here

Another thing I’ve omitted up until now are pictures, usually written as PIC. Picture is a byte layout configuration of any individual variable, they can only be defined on leaf nodes, but not on 66/88 levels. Pictures define both the underlying memory layout, as well as string representation of a variable if it’s being printed in a report, for example. Here are some highlights of what can be done with them:

XXX, X(3) - alphanumeric fields with 3 length
99V99 - numeric field with 2 digits before and after fixed point. Value is stored as EBCDIC (of course it wouldn’t be ASCII) representation of a digit, each taking 1 byte, with 1 byte for representing a comma separator in the middle. Importantly, there’s no loss of precision here as it happens for IEEE-754 floats, and you can freely store up to ~10 digits before and after the point.
99.99 - same as above, but the . does not actually take any space in memory, the variable now is 4 bytes long. You can also use , for same purpose.
99 COMP-3, or PACKED-DECIMAL - numeric value that is stored as binary-encoded decimal integer (see wiki. In short, this means that this value is a 1 byte with layout: AAAABBBB, where first 4 bits are first digit A, and last 4 bits are digit B. Yes, endianness is a factor here, and it’s usually different from what we have in current systems.
COMP-1/2 under the hood are single and double-precision floating point, as
There are also COMP-4/5/6/7/8 because why wouldn’t here be. To my memory, they didn’t do anything, except were possibly aliases for other computational types.
S can be used to save a sign for the variable, but also sign can be specified by a separate clause, and you can specify if it should be saved at the start or the end of byte layout.

well that does not seem that bad right?

The overarching theme of COBOL’s problems is that every feature of the language has to be implemented as its’ special little case in a special statement with 30% of the grammar for that feature being optional. As for what this means for structure definitions, here’s a short list of things you can do right there, at the declaration:

Repeat any of the structures on any level, making arrays
- At the same time, you can also add another integer variable to specify it will be used as an index for this array.
Define structs that specify layout of one record in a file. Reading from a file fills that structure by default.
Declare varying-length array, the length of which depends on a different variable. No, you still don’t allocate memory: you specify a range of values for length, and you allocate the maximum.
Specify that empty values in a string have to be filled by leading or trailing spaces or zeroes
“Redefine” a variable: do a reinterpret_cast of a different variable with any picture you want
Define a pointer variable that can interpret underlying memory as specific layout
say that if a numeric value is zero, display should print it as spaces
similar to how it works in Pascal, you can say that an array is indexed not from 1 to len, but from an arbitrary integer to another arbitrary integer.

The grammar for just a single level declaration takes more than one page. Grammar for the rest of declarations, including file record types (of which there are at least 3) takes another 5 pages.

Well, why?

A lot of bloat in the grammar is taken by the idea that the language should be written by human, so we should allow language to “flow” like English, to allow onboarding of non-technical people. I’ll give a couple of examples of reserved keywords in the language (some of these are equivalent, some are absolutely not):

THROUGH, THRU
END-OF-PAGE, EOP
ALPHABET, ALPHABETIC
COMP, COMPUTATIONAL
PIC, PICTURE
INPUT-OUTPUT, I-O, INPUT, OUTPUT
LIMIT, LIMITS
OVERFLOW, OVERFLOWS
EQUAL, EQUALS
RECORD, RECORDING, RECORDS
REPORT, REPORTING, REPORTS
SPACE, SPACES, BLANK, LOW-VALUE, LOW-VALUES, VALUE, VALUES
END-IF, END-ADD and about 20 other variations of END clause for every statement

and 400+ more!

where are the statements, Lebowski?

It’s not possible to cover the vast landscape of COBOL statements in one blogpost. I’ll just mention that due to non-existence of functions and procedures as we know them, most of things we now expect to work as a part of standard library API, works as a grammar construct and is baked into compiler. This includes:

reading files. This includes statements such as OPEN, READ, WRITE, REWIND, CLOSE,
sorting file records
merging multiple files into one
searching for an item in array
splitting string into parts by separator
gather parts of a string/array into a new string
counting lengths of substrings, after split by previous statement
DIVIDE, ADD, MULTIPLY, SUBTRACT are separate statements, all with their own special cases in grammar - for modular arithmetic, error handling and all that. For cases where we need to combine more than one operation type, we have COMPUTE statement, with its’ own set of grammar and semantics
member-by-member assigning of struct values to other structs. The semantics of cases where you have two structs with different layouts are mysterious and have a ton of edge cases: you can automatically assign a struct to array of strings, array of structs to a single struct, individual members of nested structs can be assigned by name, or by copying memory blobs per-byte. You do that by using MOVE and SET commands, which are obviously completely different commands and have absolutely unique semantics.
huge subset of commands can accept paragraph IDs to work like a callback hook.
a separate set of commands to generate reports - which are somewhat similar to modern HTML template language like Jinja. You can define headers there, print spreadsheets, and do a bunch of other things.

In total, the COBOL-85 standard that describes grammar with a shallow description of semantics, spans for over 800 pages. For more details on what is the logic of individual clauses, how exactly the edge-cases are handled, you need a different ANSII document, that is another 1000+ pages long. I’ll go over some of the features that stand out to me

fall-through behaviors

Let’s look again at the example from the beginning:

        001-main-s section.
        001-main.

            display 'hello world'.
            perform 005-display-user varying user-idx from 1 to 5.

        001-main-exit.
            stop run.

The important detail is that the flow of the program naturally goes to the first section, first paragraph, and then falls through into the next paragraph.

This makes paragraphs similar to labels in assembly. However, unlike assembly, you can do PERFORM <paragraph-or-section>, which automatically returns you to your control point after EXIT statement is reached. Or until the paragraph ends. Or you can specify a list of paragraphs in sequence and execute them instead like PERFORM 005-do-a THRU 005-do-b. All of these have slightly different semantics about where exactly the control flow will return if it reaches EXIT statement in the middle of a loop, like if you do PERFORM <something> 5 TIMES. And yes, there is GOTO, which also behaves differently with paragraphs, EXIT statements and all that.

naming conventions

When dealing with programs written for mainframes (yes, just like in the movies), you often have to deal with quirks like “oh, our system has a limit on file name length of 7 symbols”. Now imagine that you have to have 3 million lines of code spread across about 1000+ programs. This forces these programs to have descriptive names like NW101PW, FW5000X and so on. Most of these programs do not have any documentation for what they do. Each such program can call a different program, name can be set up dynamically (and fetched from an external file, for example).

standard implementation

I’ve mentioned that the code you deal with is running on a mainframe. This implies two things:

It uses hardware with completely different architecture from what you are using
Most likely that mainframe uses a proprietary operating system, for example z/OS by IBM
COBOL code was written for a proprietary implementation of COBOL that can’t be obtained. And if it can, it still wouldn’t be able to produce binaries for your precious Arch.
the main way to test things out is to send the file to the mainframe and run tests there remotely. That might not be available, since we are likely dealing with sensitive data and the system is air-gapped, so the other option is to either send code to a person who’s physically there and has access to machines, or just deal with it and move on
While there are open source alternatives like gnuCobol available, they are incomplete
In fact, to my knowledge, all of the production implementations of COBOL do not implement the ANSII standard to 100% (similar to how it works with C++, I think): Every vendor adds their own cool little extensions to work with their favourite proprietary tools such as DB/2 RDBMS by IBM, CICS, and so on.
This means that without having a compiler to test out your code, you are forced to read from 2 to 5 different tech specifications, each at least 600 pages long, which sometimes contradict each other. Rarely, but it happens.

indexed files

Indexed files is a semi-proprietary file format that is described by an ANSII standard, but there are no details as to how it can/should be implemented. The standard only says what you can technically do through an interface, which consists of several special COBOL statements. Namely:

Each file contains records of same size, and have to contain a unique primary key.
You can quickly search for a record in a file by its’ key
You can sequentially read through all of the records. This will access them in the order of ascending key
You can read only a subset of records by searching with constraints
You can do partial matching on keys: if you search for a record 100000, it will scroll past all records with keys <100000 and then end up on the first one with key larger than or equal to 100000
You can have a secondary key that can also be used for searching, but it can allow duplicates in some cases.
Files are modifiable: you can add, overwrite and delete data in it.

why do we even care about byte layout

To put it simply: I haven’t seen a language before or since, that cares so much about byte layout. There are several reasons for it:

It’s just what people cared about back in ‘85, when memory was expensive
Most of COBOL programs are generating reports. When you are bound to the output format of a punchcard (of course this is also some sort of legacy restriction), you have to care about your numbers fitting into the report, all the names fitting the specific schema and so on.
Since the language gives you limited tools, pointer arithmetic and equivalents are abundant, if not overused. This is usually done to access the same data with a different layout, and doing reinterpret casts makes you care about every bit of data that you have.

cics

CICS is this weird ultimate GUI toolkit that allows people to make interactive software in COBOL. The most relevant example of this is likely some sort of old ATM machine, or maybe you have seen some old system for ticket ordering, where they’ve used something like it. I don’t have actual screenshots on my hands, but a very close replication can be found here in README images.

why do we even care about COBOL?

back when I was working at that place, it was estimated that COBOL was powering more than half of banks and government sectors around the world. Someone has to care, because these are the systems that count your taxes, send you bills, and do all of the things that nowadays we tend to use Java for and laugh that of as the most boring enterprise language that exists. Trust me, Java is not that bad. At the very least,

the silver lining

Working with that codebase taught me a couple of things:

I’ve learned to read the documentation - not the tutorials in Google, which were wrong or incomplete 60% of the time, but actual language specification. Turns out, it’s not that hard, and is rather useful
This gave me a good perspective on what an unmaintainable codebase looks like

*record scratch* Yep, that’s me