Don't ever work on a COBOL modernization project. It is a career killer with little to no upside. COBOL systems and the people directly using them are Chernobyl.

Both government regulators and operations personnel have made careers out of trusting black boxes that no one fully understands anymore. Regulatory specifications are ambiguous and disorganized. Imagine working with someone who knows that you are trying to shed light on the technology that they were responsible for, yet do not fully understand. How cooperative do you think these people will be with your efforts? How do you think the managers responsible for this work survive the politics of the project's inevitable failure? Blame rolls downhill. Modernization projects require strong leadership and collaboration across many teams. That's a huge problem because such an environment very likely does not exist in the world today. No one understands how the algorithms work or even how they should work! Forget how challenging it is to work with COBOL, which someone could eventually figure out. Whatever you've written has to be tested, and you don't have a trusted source of business logic from which to verify how it ought to work. You only have the black box.

If you've been given this task, start interviewing at other companies. You were given a suicide mission by people who are well aware of that.

My experience with this was with regulatory margin trading systems supporting a multi-billion dollar market.

qz2

There’s a lot of that out there. I have avoided COBOL but I have seen stuff written in VB6/COM which was also in the same state. Eventually the one guy who knows anything about it leaves or dies. In the name of “progress” someone wraps it in something else and sticks a web API around it but everyone is afraid to touch that bit of VB code. Now a third party calls this and wonders why it takes 10 seconds for an HTTP API call to run and they can’t be issued in parallel or it throws a 500 error. Then the third party asks to send 20,000 of these a day and then finger pointing and stuff occurs at management level between both companies.

I have been on both sides of this and it sucks. I came up with Rug Driven Development as a title. All the risks and future problems are swept under the rug quickly while maintaining a happy smile and pretending everything is just fine and dandy.

yourapostasy

Rug Driven Development causes the majority of technical debt that I see in most of my clients, and it happens at the small scale, too. Just a couple weeks ago, a junior engineer was faced with a key process that segfaulted about once a minute. No worries though, they reasoned; the process immediately respawned and because it managed a stateless procedure, nothing is lost. Except the 1 out ~30,000 times it didn't respawn. No worries though, they reasoned; they put in a request to the monitoring team to look for the absence of the process for longer than three minutes and auto-spawn it. Ta da! Problem solved!

They let this go on for about six years before I arrived and saw it. It took me pointing out that there is a decent chance this is caused by the process relying upon something in the OS kernel that isn't quite aligned to the kernel documentation but is good enough, and the immediate respawn behavior might rely upon that in turn. If the kernel ever is "fixed", then this audit-compliance-related process suddenly stops and doesn't respawn, or worse, won't start at all, becoming top of mind with all the senior management. It is always cheaper to fix problems in the small before they become problems no one can ignore.

The urge to sweep problems under the rug and move on is very powerful within our industry. Until you've done the same yourself and been bitten so many times, your scar tissue twitches every time you see the same pattern again. These days, I treat code problems like I treat cleaning up messes while cooking: I clean up as I go along. My scar tissue thanks me now. There is a delicate balance between addressing these problems in the small and bikeshedding, though.

BTW, I performed an strace that revealed a SIGKILL just pops in without anyone or any known process issuing it. The application developers suspect something in the OS, so we're now engaged with the application support team, OS support team, and our own internal OS support team to trace it down and beat it into submission.

rwmj

If it's Linux there are various ways to find out who is sending that signal, killsnoop being one (part of bcc: https://github.com/iovisor/bcc), another using a trivial systemtap script (https://www.ibm.com/support/pages/systemtap-kill-who-killed-...).