What does HackerNews think of git-filter-repo?
Quickly rewrite git repository history (filter-branch replacement)
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
So the other option is:git filter-repo --invert-paths --path path/to/file
Let's use a common scenario as an example. Let's say someone accidentally committed a password into your codebase six months ago and pushed to remote.
You can't delete just that commit because Git is a graph, so you have to modify the parent commit that contains the password _and every other commit after it_.
So you discover `git filter-branch` which runs a shell script against matching SHAs. You run `man git-filter-branch` or `git filter-branch --help` but discover that (a) it can use special environment variables, none of which are documented here, (b) you notice that there are different types of filters, i.e. `env-filter`, `index-filter`, `tree-filter`, all of which require intimate knowledge of how Git works to use it.
(I train junior engineers on Git often. Once you mention the "staging area" and "index", you'll usually lose them. I can't blame them...they just want to track their work!)
You then try to run a `git filter-branch` command with the `--index-filter` since this can work against individual SHAs. Immediately, you get this:
$: git filter-branch --index-filter ''
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
And then it just _hangs._ Since you know that going through six months of history can take a while, you think that it's working in the background. Nope, it just hangs there, waiting for you to CTRL-C instead of exiting with a non-zero exit code like EVERY OTHER CLI IN EXISTENCE.You go back into the `man` page to better understand this warning. Surprise surprise, `FILTER_BRANCH_SQUELCH_WARNING` isn't documented!
So you provide the environment variable (if you know how, which isn't a given for junior engineers) and it finally runs...except you have no idea whether commits were rewritten or not (it does tell you whether the commits were rewritten, but you kind-of have to understand Git to understand the message).
Now let's say you actually powered through and got it to run...but it caused you to lose work! (Remember, you have to `git push --force` or `git push --force-with-lease` to apply your changes, so this is stupid dangerous already, but it must be done.) To get it back, you then need to learn about `git reflog` and `git reset --hard`, but since `git filter-branch` runs against every commit that matched your pattern, you have to spend time scrolling back at the reflog to find the one you want to reset to.
`bfg` exists to make this "simpler," but (a) it's a Java application, and you need to install the JRE to make it work, and (b) IT DOESN'T COME WITH GIT.
Again, since this is a really common use-case with Git, it would be really nice if `git` had something more user-friendly, like:
`git remove-references-to --regexp "PASSWORD" SHA_PATTERN`
and a convenient message after the deed is done, like:
85 commits rewritten; run "git reset --hard [SHA_BEFORE_THE_CHANGES]" to undo.
(Note: You must run `git push --force-with-lease to sync changes with your HTTPS-backed remote, "NAME_OF_REMOTE")
However, since Git is maintained primarily for Linux development (afaict), this would most likely have to come from a third-party Git CLI. (I can already feel the rage from Linus for a proposal like this.)Funny enough, making things easier against the desires of protocol/system maintainers is how decentralized systems eventually centralize. I can see a world where everyone uses `gh` instead of `git` or has `git` aliased to `gh`, which means that a single company has de-facto control over the future of Git...
It's in Python so runs pretty much everywhere *nix out of the box.
`git filter-repo --analyze` will generate a report of blobs stored in the repo at `.git/filter-repo/analysis/blob-shas-and-paths.txt`, and it's very easy to sort them by filesize and strip them out from there.
git filter-repo --force --name-callback 'return b"Name to use"' --email-callback 'return b"[email protected]"'
[0]: https://github.com/newren/git-filter-repoI don't remember the specifics but the method used here didn't produce the results we were looking for when migrating long histories and lots of branches.