What does HackerNews think of git-filter-repo?

Quickly rewrite git repository history (filter-branch replacement)

Language: Python

I am not familiar with a project that does that specifically. However, you can easily perform that type of operation with git-filter-repo and its --commit-callback flag , which lets you modify commit objects with a custom Python snippet.

https://github.com/newren/git-filter-repo

this is great! I used to delete the entire repo and make a new one to get around accidentally checking in something huge. But it does give:

WARNING: git-filter-branch has a glut of gotchas generating mangled history

  rewrites.  Hit Ctrl-C before proceeding to abort, then use an

  alternative filtering tool such as 'git filter-repo'

  (https://github.com/newren/git-filter-repo/) instead.  See the

  filter-branch manual page for more details; to squelch this warning,

  set FILTER_BRANCH_SQUELCH_WARNING=1.
So the other option is:

git filter-repo --invert-paths --path path/to/file

Regarding git filter-branch, that big warning does point you to try git filter-repo [1] instead. In my experience, every case that I've need git filter-branch for (or BFG for a bit) has been handled easily and well (including performance-wise) by git filter-repo. Don't let the fact that it needs Python and is still a separate download from the git suite (in part because it needs Python and that is not otherwise a git suite dependency) dissuade you from using an easier and faster alternative to git filter-branch.

[1] https://github.com/newren/git-filter-repo

The problem is that outside of those five or six commands, Git gets really hard REALLY FAST.

Let's use a common scenario as an example. Let's say someone accidentally committed a password into your codebase six months ago and pushed to remote.

You can't delete just that commit because Git is a graph, so you have to modify the parent commit that contains the password _and every other commit after it_.

So you discover `git filter-branch` which runs a shell script against matching SHAs. You run `man git-filter-branch` or `git filter-branch --help` but discover that (a) it can use special environment variables, none of which are documented here, (b) you notice that there are different types of filters, i.e. `env-filter`, `index-filter`, `tree-filter`, all of which require intimate knowledge of how Git works to use it.

(I train junior engineers on Git often. Once you mention the "staging area" and "index", you'll usually lose them. I can't blame them...they just want to track their work!)

You then try to run a `git filter-branch` command with the `--index-filter` since this can work against individual SHAs. Immediately, you get this:

    $: git filter-branch --index-filter ''
    WARNING: git-filter-branch has a glut of gotchas generating mangled history
             rewrites.  Hit Ctrl-C before proceeding to abort, then use an
             alternative filtering tool such as 'git filter-repo'
             (https://github.com/newren/git-filter-repo/) instead.  See the
             filter-branch manual page for more details; to squelch this warning,
             set FILTER_BRANCH_SQUELCH_WARNING=1.

And then it just _hangs._ Since you know that going through six months of history can take a while, you think that it's working in the background. Nope, it just hangs there, waiting for you to CTRL-C instead of exiting with a non-zero exit code like EVERY OTHER CLI IN EXISTENCE.

You go back into the `man` page to better understand this warning. Surprise surprise, `FILTER_BRANCH_SQUELCH_WARNING` isn't documented!

So you provide the environment variable (if you know how, which isn't a given for junior engineers) and it finally runs...except you have no idea whether commits were rewritten or not (it does tell you whether the commits were rewritten, but you kind-of have to understand Git to understand the message).

Now let's say you actually powered through and got it to run...but it caused you to lose work! (Remember, you have to `git push --force` or `git push --force-with-lease` to apply your changes, so this is stupid dangerous already, but it must be done.) To get it back, you then need to learn about `git reflog` and `git reset --hard`, but since `git filter-branch` runs against every commit that matched your pattern, you have to spend time scrolling back at the reflog to find the one you want to reset to.

`bfg` exists to make this "simpler," but (a) it's a Java application, and you need to install the JRE to make it work, and (b) IT DOESN'T COME WITH GIT.

Again, since this is a really common use-case with Git, it would be really nice if `git` had something more user-friendly, like:

`git remove-references-to --regexp "PASSWORD" SHA_PATTERN`

and a convenient message after the deed is done, like:

    85 commits rewritten; run "git reset --hard [SHA_BEFORE_THE_CHANGES]" to undo.
    (Note: You must run `git push --force-with-lease to sync changes with your HTTPS-backed remote, "NAME_OF_REMOTE")
However, since Git is maintained primarily for Linux development (afaict), this would most likely have to come from a third-party Git CLI. (I can already feel the rage from Linus for a proposal like this.)

Funny enough, making things easier against the desires of protocol/system maintainers is how decentralized systems eventually centralize. I can see a world where everyone uses `gh` instead of `git` or has `git` aliased to `gh`, which means that a single company has de-facto control over the future of Git...

Similarly there's also git-filter-repo: https://github.com/newren/git-filter-repo

It's in Python so runs pretty much everywhere *nix out of the box.

You’d have to rewrite the commit history for each of those repos and force push those changes. https://github.com/newren/git-filter-repo would be a decent starting point.
For a minute I confused this with another external tool: git-filter-repo [0]. It's recommended by the official manual as replacement for git-filter-branch [1].

[0] https://github.com/newren/git-filter-repo/

[1] https://git-scm.com/docs/git-filter-branch

For 1), try using git-filter-repo (https://github.com/newren/git-filter-repo). This is the currently recommended alternative to previous tools like filter-branch, and it is much more user-friendly.

`git filter-repo --analyze` will generate a report of blobs stored in the repo at `.git/filter-repo/analysis/blob-shas-and-paths.txt`, and it's very easy to sort them by filesize and strip them out from there.

FYI, your commits are signed with "Your Name ". You'll want to fix your git config. For these existing commits, you can use git-filter-repo [0] to rewrite the author name and email.

    git filter-repo --force --name-callback 'return b"Name to use"' --email-callback 'return b"[email protected]"'

[0]: https://github.com/newren/git-filter-repo
Having done this migration, I recommend you look into filter-repo, https://github.com/newren/git-filter-repo

I don't remember the specifics but the method used here didn't produce the results we were looking for when migrating long histories and lots of branches.