i have a question about OCR. is there a project that lets you define fields for doing ocr, i am thinking scanning invoices and defining that this line means the invoice number, this here means the item name, item rate, etc.

many OCR software can scan this but not "understand" it to use it. ABBYY has something like this for scanning invoices so is there something for the foss folks?

While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...

The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula

However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.