This was an issue that happened a couple of years ago while working at IMC. These are the leftovers of my notes, so apologies for some missing details / links.
I don’t have much Bazel experience, this is from the perspective as someone reponsible for maintaining an internal Gitlab repository while another team was working on a Bazel build pipeline.
Context
The brotli repository was forked in an internal Gitlab instance to serve as a local copy, and was being updated from upstream periodically.
Bazel was complaining about the contents of a tar.gz
file from brotli not matching the checksum, despite the checksum being recently updated. After updating it again, Bazel would be able to download the file but would complain about files being missing (I forget which ones). Downloading the archive from source and extracting it would result in some files being missing despite being present in the repo.
The Bazel code was using http_archive to retrieve brotli.
Digging into the Gitlab code, the archive was generated via git archive
and cached for a short amount of time.
Git attributes
The issue was due to export-ignore .gitattributes. This meant that an archive generated by git archive
vs. tar
on the source repo (which is what we were expecting) would differ.
Note that git archive
can use tar
under the hood, but would exclude certain files as defined in .gitattributes
.
Git archive
It was also found out that the hash provided to download the archive was a tree ID.
git archive behaves differently when given a tree ID as opposed to a commit ID or tag ID. When a tree ID is provided, the current time is used as the modification time of each file in the archive. On the other hand, when a commit ID or tag ID is provided, the commit time as recorded in the referenced commit object is used instead.
In addition, because a tree ID was given, every time it was archived the hash would change (using tar
format, which preserves file metadata).