I have recently had the task of moving some files from one Git repository into a new Git repository. We simply wanted to move a service from its current repository into its own dedicated repository preserving the Git history for the files.

After some investigation, I discovered git filter-branch was the command to use using the index-filter option. We had just over 260 files to remove over 8000 odd commits so naturally I scripted the commands in a shell script and left it to run. After 27 hours it finally finished! Thankfully there is a solution so in this article I show you how to speed up git filter-branch.

What is git filter-branch?

As per the official Git documentation git filter-branch:

lets you rewrite Git revision history by rewriting the branches mentioned in the, applying custom filters on each revision. Those filters can modify each tree (e.g. removing a file or running a Perl rewrite on all files) or information about each commit. Otherwise, all information (including original commit times or merge information) will be preserved.

So for me, that meant I could remove the files that I didn’t want in the new repository leaving the service that we did want. The filter-branch option required to do this was index-filter. The other option is tree-filter but this actually checks out each commit whereas index-filter does not and therefore is slower so I stuck with index-filter.

The git command for this was:

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch my-service/src/main/java/com/mycompany/myproject/service/MyService.java' my_service_module

As I said in the introduction, I scripted this command to remove just over 260 files from a repository with over 8000 commits. It took just over 27 hours!

So, how to speed up git filter-branch?

This is where the BFG comes in.

What, as in the giant?

No, not the Big Friendly Giant by Roald Dahl but the BFG Repo-Cleaner by Roberto Tyley.

The BFG focuses on removing files in a much more efficient way. As a result, this reduced the task above to just 2 hours!

The process I took followed the advice on the BFG site so I first cloned the remote repository containing the existing service as a mirror. I then removed the origin remote for safety so that nothing gets pushed to the origin repository accidentally. I then created a new branch such that the branch can be pushed to the new Git repository as a unit of work to go through the standard PR process.

$ git clone --mirror http://<github-server>/myRepo
$ cd myRepo
$ git remote rm origin
$ git branch my-service-module

I then performed the history rewrite to remove the unwanted files by updating my existing shell script to use BFG. e.g.

java -jar bfg-1.12.8.jar --delete-files MyService.java myRepo

Once this process was done we allow Git to tidy up after it’s self:

git reflog expire --expire=now --all && git gc --prune=now --aggressive

Now that is complete we can push the new branch to the new repository. So first let’s create a new local Git repository:

$ cd ..
$ mkdir myNewRepo
$ cd myNewRepo
$ git init

Now we can add this new repository as a remote on the existing repository:

$ cd ..
$ cd myRepo
$ git remote add origin /<full_path>/<to>/myNewRepo

Then finally we can push the branch:

$ git push origin my-service-module

We can then go back to our new repository and checkout that branch:

$ cd ..
$ cd myNewRepo
$ git checkout my-service-module

We should then find that the only files in this branch are those from the myRepo repository that we wanted to keep including their complete history. All the files that we removed are no longer present and there is no history for the files either.