diff hggit/git_handler.py @ 709:5c7943ca051f

hg2git: start incremental conversion from a known commit Previously, we'd spin up the Mercurial incremental exporter from the null commit and build up state from there. This meant that for the first exported commit, we'd have to read all the files in that commit and compute Git blobs and trees based on that. The current Mercurial to Git conversion scheme makes most sense with Mercurial's current default storage format, where manifests are diffed against the numerically previous revision. At some point in the future, the default will switch to generaldelta, where manifests would be diffed against one of their parents. In that world it might make more sense to have a stateless exporter that diffed each commit against its generaldelta parent and calculated dirty trees based on that instead. However, more experiments need to be done to see what export scheme is best. For a repo with around 50,000 files, this brings down an incremental 'hg gexport' of one commit from 18 seconds with a hot file cache (and tens of minutes with a cold one) to around 2 seconds with a hot file cache.
author Siddharth Agarwal <sid0@fb.com>
date Fri, 14 Mar 2014 20:45:09 -0700
parents 4f0a154ae374
children 268b9f6ed1c8
line wrap: on
line diff
--- a/hggit/git_handler.py	Fri Mar 14 19:18:19 2014 -0700
+++ b/hggit/git_handler.py	Fri Mar 14 20:45:09 2014 -0700
@@ -363,8 +363,24 @@
 
         # By only exporting deltas, the assertion is that all previous objects
         # for all other changesets are already present in the Git repository.
-        # This assertion is necessary to prevent redundant work.
-        exporter = hg2git.IncrementalChangesetExporter(self.repo)
+        # This assertion is necessary to prevent redundant work. Here, nodes,
+        # and therefore export, is in topological order. By definition,
+        # export[0]'s parents must be present in Git, so we start the
+        # incremental exporter from there.
+        pctx = self.repo[export[0]].p1()
+        pnode = pctx.node()
+        if pnode == nullid:
+            gitcommit = None
+        else:
+            gitsha = self._map_hg[hex(pnode)]
+            try:
+                gitcommit = self.git[gitsha]
+            except KeyError:
+                raise hgutil.Abort(_('Parent SHA-1 not present in Git'
+                                     'repo: %s' % gitsha))
+
+        exporter = hg2git.IncrementalChangesetExporter(
+            self.repo, pctx, self.git.object_store, gitcommit)
 
         for i, rev in enumerate(export):
             self.ui.progress('exporting', i, total=total)