Perhaps start by explaining what ETL stands for and what it is.

It seems like the article is written by someone just starting to get into the data engineering subfield and they thought they were going to be writing python (pyspark is my guess) to support some kind of ML effort, but they got saddled with a bunch of SQL/data warehousing stuff to support business intelligence/analytics instead. I'd say normally what you say makes sense especially when you're pulling in abbreviations that are not related to the topic at hand or you're introducing new people to the field, but ETL is a pretty basic concept in data engineering and it's a web search away (should be the top result), so I'm not sure if it would really add all that much to their article to start with definitions.

It sounds to me like the author got thrown to the wolves in an environment of what data engineering looked like before "big data" and ML took off (and before it was even really called data engineering). There are a lot of enterprises that are still working in this mode because they are not Google and they don't have the same level of sophistication and automation when it comes to this stuff.

There is some bad information no doubt in the article, but if we're being charitable, it feels like it's someone who took a wrong turn somewhere and is struggling to find their feet in an unfamiliar place without the proper guidance and mentorship and that's a bit admirable at least that they're trying on their own.

There is no direct bearing on ETL in the article, aside from the focus on SQL queries and data validation hints that they might be talking about ELT (Extract-Load-Transform) as the level beyond ETL, but it's not clearly explained. It's clear to me that they are at the start of their journey and they are gonna learn things the hard way without guidance from someone more experienced.

kurinikku

> There is some bad information no doubt in the article

Could you share more specific details? Happy to look over / revise where needed.

More broadly is the issue of the gap of what you think the role is, and what the role actually is when you join. There are definitely cases where this is accidental. The best way I can think of to close the gap is to maybe do a short-term contract, but may be challenging to do under time constraints etc.

smif

> Could you share more specific details? Happy to look over / revise where needed.

Sure thing! I'd say first off, the solutions may look different for a small company/startup vs. a large enterprise. It can help if you explain the scale at which you are solving for.

On the enterprise side of things, they tend to buy solutions rather than build them in-house. Things like Informatica, Talend, etc. are common for large enterprises whose primary products are not data or software related. They just don't have the will, expertise, or the capital to invest in building and maintaining these solutions in-house so they just buy them off the shelf. On the surface, these are very expensive products, but even in the face of that it can still make sense for large enterprises in terms of the bottom line to buy rather than build.

For startups and smaller companies, have you looked at something like `dbt` (https://github.com/dbt-labs/dbt-core) ? I understand the desire to write some code, but often times there are already existing solutions for the problems you might be encountering.

ORM's should typically only exist on the consumer-side of the equation, if at all. A lot of business intelligence / business analysts are just going to use tools like Tableau and hook up to the data warehouse via a connector to visualize their data. You might have some consumers that are more sophisticated and may want to write some custom post-processing or aggregation code, and they could certainly use ORM's if they choose, but it isn't something you should enforce on them because it's a poor place to validate data since as mentioned there are different ways/tools to access the data and not all of them are going to go through your python SDK.

Indeed in a large enough company, you are going to have producers and consumers that are going to use different tools and programming languages, so it's a little bit presumptuous to write an SDK in python there.

Another thing to talk about, and this probably mostly applies to larger companies - have you looked at an architecture like a distributed data mesh (https://martinfowler.com/articles/data-mesh-principles.html)? This might be something to bring to the CTO more than try to push for yourself, but it can completely change the landscape of what you are doing.

> More broadly is the issue of the gap of what you think the role is, and what the role actually is when you join. There are definitely cases where this is accidental. The best way I can think of to close the gap is to maybe do a short-term contract, but may be challenging to do under time constraints etc.

Yeah this definitely sucks and it's not an enviable position to be in. I guess you have a choice to look for another job or try to stick it out with the company that did this to you. It's possible there is a geniune existential crisis for the company and a good reason why they did the bait-and-switch. Maybe it pays to stay, especially if you have equity in the company. On the other hand, it could also be the case that it is the result of questionable practices at the company. It's hard to make that call.