annotate hggit/hg2git.py @ 709:5c7943ca051f

hg2git: start incremental conversion from a known commit Previously, we'd spin up the Mercurial incremental exporter from the null commit and build up state from there. This meant that for the first exported commit, we'd have to read all the files in that commit and compute Git blobs and trees based on that. The current Mercurial to Git conversion scheme makes most sense with Mercurial's current default storage format, where manifests are diffed against the numerically previous revision. At some point in the future, the default will switch to generaldelta, where manifests would be diffed against one of their parents. In that world it might make more sense to have a stateless exporter that diffed each commit against its generaldelta parent and calculated dirty trees based on that instead. However, more experiments need to be done to see what export scheme is best. For a repo with around 50,000 files, this brings down an incremental 'hg gexport' of one commit from 18 seconds with a hot file cache (and tens of minutes with a cold one) to around 2 seconds with a hot file cache.
author Siddharth Agarwal <sid0@fb.com>
date Fri, 14 Mar 2014 20:45:09 -0700
parents d5facc1be5f8
children 623cb724c3d0
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1 # This file contains code dealing specifically with converting Mercurial
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
2 # repositories to Git repositories. Code in this file is meant to be a generic
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
3 # library and should be usable outside the context of hg-git or an hg command.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
4
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
5 import os
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
6 import stat
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
7
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
8 import dulwich.objects as dulobjs
707
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
9 from dulwich import diff_tree
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
10
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
11 import util
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
12
671
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
13 def parse_subrepos(ctx):
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
14 sub = util.OrderedDict()
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
15 if '.hgsub' in ctx:
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
16 sub = util.parse_hgsub(ctx['.hgsub'].data().splitlines())
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
17 substate = util.OrderedDict()
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
18 if '.hgsubstate' in ctx:
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
19 substate = util.parse_hgsubstate(
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
20 ctx['.hgsubstate'].data().splitlines())
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
21 return sub, substate
71fb5dd678bc hg2git: move parse_subrepos to top level
Siddharth Agarwal <sid0@fb.com>
parents: 649
diff changeset
22
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
23 class IncrementalChangesetExporter(object):
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
24 """Incrementally export Mercurial changesets to Git trees.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
25
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
26 The purpose of this class is to facilitate Git tree export that is more
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
27 optimal than brute force.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
28
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
29 A "dumb" implementations of Mercurial to Git export would iterate over
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
30 every file present in a Mercurial changeset and would convert each to
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
31 a Git blob and then conditionally add it to a Git repository if it didn't
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
32 yet exist. This is suboptimal because the overhead associated with
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
33 obtaining every file's raw content and converting it to a Git blob is
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
34 not trivial!
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
35
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
36 This class works around the suboptimality of brute force export by
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
37 leveraging the information stored in Mercurial - the knowledge of what
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
38 changed between changesets - to only export Git objects corresponding to
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
39 changes in Mercurial. In the context of converting Mercurial repositories
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
40 to Git repositories, we only export objects Git (possibly) hasn't seen yet.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
41 This prevents a lot of redundant work and is thus faster.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
42
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
43 Callers instantiate an instance of this class against a mercurial.localrepo
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
44 instance. They then associate it with a specific changesets by calling
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
45 update_changeset(). On each call to update_changeset(), the instance
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
46 computes the difference between the current and new changesets and emits
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
47 Git objects that haven't yet been encountered during the lifetime of the
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
48 class instance. In other words, it expresses Mercurial changeset deltas in
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
49 terms of Git objects. Callers then (usually) take this set of Git objects
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
50 and add them to the Git repository.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
51
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
52 This class only emits Git blobs and trees, not commits.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
53
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
54 The tree calculation part of this class is essentially a reimplementation
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
55 of dulwich.index.commit_tree. However, since our implementation reuses
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
56 Tree instances and only recalculates SHA-1 when things change, we are
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
57 more efficient.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
58 """
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
59
709
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
60 def __init__(self, hg_repo, start_ctx, git_store, git_commit):
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
61 """Create an instance against a mercurial.localrepo.
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
62
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
63 start_ctx is the context for a Mercurial commit that has a Git
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
64 equivalent, passed in as git_commit. The incremental computation will be
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
65 started from this commit. git_store is the Git object store the commit
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
66 comes from. start_ctx can be repo[nullid], in which case git_commit
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
67 should be None.
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
68 """
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
69 self._hg = hg_repo
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
70
637
23d7caeed05a hg2git: store ctx instead of rev
Siddharth Agarwal <sid0@fb.com>
parents: 636
diff changeset
71 # Our current revision's context.
709
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
72 self._ctx = start_ctx
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
73
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
74 # Path to dulwich.objects.Tree.
709
5c7943ca051f hg2git: start incremental conversion from a known commit
Siddharth Agarwal <sid0@fb.com>
parents: 707
diff changeset
75 self._init_dirs(git_store, git_commit)
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
76
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
77 # Mercurial file nodeid to Git blob SHA-1. Used to prevent redundant
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
78 # blob calculation.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
79 self._blob_cache = {}
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
80
707
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
81 def _init_dirs(self, store, commit):
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
82 """Initialize self._dirs for a Git object store and commit."""
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
83 self._dirs = {}
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
84 if commit is None:
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
85 return
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
86 dirkind = stat.S_IFDIR
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
87 # depth-first order, chosen arbitrarily
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
88 todo = [('', store[commit.tree])]
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
89 while todo:
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
90 path, tree = todo.pop()
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
91 self._dirs[path] = tree
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
92 for entry in tree.iteritems():
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
93 if entry.mode == dirkind:
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
94 todo.append((path + '/' + entry.path, store[entry.sha]))
d5facc1be5f8 hg2git: implement a method to initialize _dirs from a Git commit
Siddharth Agarwal <sid0@fb.com>
parents: 672
diff changeset
95
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
96 @property
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
97 def root_tree_sha(self):
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
98 """The SHA-1 of the root Git tree.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
99
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
100 This is needed to construct a Git commit object.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
101 """
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
102 return self._dirs[''].id
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
103
636
0ab89bd32c8e hg2git: rename ctx to newctx in update_changeset
Siddharth Agarwal <sid0@fb.com>
parents: 598
diff changeset
104 def update_changeset(self, newctx):
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
105 """Set the tree to track a new Mercurial changeset.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
106
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
107 This is a generator of 2-tuples. The first item in each tuple is a
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
108 dulwich object, either a Blob or a Tree. The second item is the
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
109 corresponding Mercurial nodeid for the item, if any. Only blobs will
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
110 have nodeids. Trees do not correspond to a specific nodeid, so it does
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
111 not make sense to emit a nodeid for them.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
112
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
113 When exporting trees from Mercurial, callers typically write the
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
114 returned dulwich object to the Git repo via the store's add_object().
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
115
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
116 Some emitted objects may already exist in the Git repository. This
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
117 class does not know about the Git repository, so it's up to the caller
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
118 to conditionally add the object, etc.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
119
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
120 Emitted objects are those that have changed since the last call to
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
121 update_changeset. If this is the first call to update_chanageset, all
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
122 objects in the tree are emitted.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
123 """
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
124 # Our general strategy is to accumulate dulwich.objects.Blob and
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
125 # dulwich.objects.Tree instances for the current Mercurial changeset.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
126 # We do this incremental by iterating over the Mercurial-reported
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
127 # changeset delta. We rely on the behavior of Mercurial to lazy
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
128 # calculate a Tree's SHA-1 when we modify it. This is critical to
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
129 # performance.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
130
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
131 # In theory we should be able to look at changectx.files(). This is
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
132 # *much* faster. However, it may not be accurate, especially with older
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
133 # repositories, which may not record things like deleted files
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
134 # explicitly in the manifest (which is where files() gets its data).
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
135 # The only reliable way to get the full set of changes is by looking at
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
136 # the full manifest. And, the easy way to compare two manifests is
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
137 # localrepo.status().
638
f828d82c35dc hg2git: call status on newctx, not newctx.rev()
Siddharth Agarwal <sid0@fb.com>
parents: 637
diff changeset
138 modified, added, removed = self._hg.status(self._ctx, newctx)[0:3]
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
139
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
140 # We track which directories/trees have modified in this update and we
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
141 # only export those.
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
142 dirty_trees = set()
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
143
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
144 subadded, subremoved = [], []
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
145
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
146 for s in modified, added, removed:
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
147 if '.hgsub' in s or '.hgsubstate' in s:
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
148 subadded, subremoved = self._handle_subrepos(newctx)
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
149 break
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
150
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
151 # We first process subrepo and file removals so we can prune dead trees.
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
152 for path in subremoved:
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
153 self._remove_path(path, dirty_trees)
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
154
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
155 for path in removed:
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
156 if path == '.hgsubstate' or path == '.hgsub':
649
53423381c540 hg2git: call _handle_subrepos when .hgsubstate is removed
Siddharth Agarwal <sid0@fb.com>
parents: 648
diff changeset
157 continue
53423381c540 hg2git: call _handle_subrepos when .hgsubstate is removed
Siddharth Agarwal <sid0@fb.com>
parents: 648
diff changeset
158
645
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
159 self._remove_path(path, dirty_trees)
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
160
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
161 for path, sha in subadded:
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
162 d = os.path.dirname(path)
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
163 tree = self._dirs.setdefault(d, dulobjs.Tree())
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
164 dirty_trees.add(d)
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
165 tree.add(os.path.basename(path), dulobjs.S_IFGITLINK, sha)
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
166
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
167 # For every file that changed or was added, we need to calculate the
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
168 # corresponding Git blob and its tree entry. We emit the blob
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
169 # immediately and update trees to be aware of its presence.
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
170 for path in set(modified) | set(added):
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
171 if path == '.hgsubstate' or path == '.hgsub':
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
172 continue
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
173
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
174 d = os.path.dirname(path)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
175 tree = self._dirs.setdefault(d, dulobjs.Tree())
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
176 dirty_trees.add(d)
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
177
636
0ab89bd32c8e hg2git: rename ctx to newctx in update_changeset
Siddharth Agarwal <sid0@fb.com>
parents: 598
diff changeset
178 fctx = newctx[path]
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
179
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
180 entry, blob = IncrementalChangesetExporter.tree_entry(fctx,
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
181 self._blob_cache)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
182 if blob is not None:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
183 yield (blob, fctx.filenode())
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
184
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
185 tree.add(*entry)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
186
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
187 # Now that all the trees represent the current changeset, recalculate
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
188 # the tree IDs and emit them. Note that we wait until now to calculate
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
189 # tree SHA-1s. This is an important difference between us and
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
190 # dulwich.index.commit_tree(), which builds new Tree instances for each
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
191 # series of blobs.
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
192 for obj in self._populate_tree_entries(dirty_trees):
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
193 yield (obj, None)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
194
637
23d7caeed05a hg2git: store ctx instead of rev
Siddharth Agarwal <sid0@fb.com>
parents: 636
diff changeset
195 self._ctx = newctx
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
196
645
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
197 def _remove_path(self, path, dirty_trees):
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
198 """Remove a path (file or git link) from the current changeset.
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
199
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
200 If the tree containing this path is empty, it might be removed."""
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
201 d = os.path.dirname(path)
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
202 tree = self._dirs.get(d, dulobjs.Tree())
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
203
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
204 del tree[os.path.basename(path)]
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
205 dirty_trees.add(d)
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
206
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
207 # If removing this file made the tree empty, we should delete this
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
208 # tree. This could result in parent trees losing their only child
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
209 # and so on.
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
210 if not len(tree):
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
211 self._remove_tree(d)
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
212 else:
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
213 self._dirs[d] = tree
104f536be5c7 hg2git: factor out remove path logic into a separate function
Siddharth Agarwal <sid0@fb.com>
parents: 638
diff changeset
214
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
215 def _remove_tree(self, path):
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
216 """Remove a (presumably empty) tree from the current changeset.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
217
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
218 A now-empty tree may be the only child of its parent. So, we traverse
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
219 up the chain to the root tree, deleting any empty trees along the way.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
220 """
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
221 try:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
222 del self._dirs[path]
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
223 except KeyError:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
224 return
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
225
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
226 # Now we traverse up to the parent and delete any references.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
227 if path == '':
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
228 return
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
229
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
230 basename = os.path.basename(path)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
231 parent = os.path.dirname(path)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
232 while True:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
233 tree = self._dirs.get(parent, None)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
234
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
235 # No parent entry. Nothing to remove or update.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
236 if tree is None:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
237 return
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
238
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
239 try:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
240 del tree[basename]
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
241 except KeyError:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
242 return
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
243
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
244 if len(tree):
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
245 return
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
246
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
247 # The parent tree is empty. Se, we can delete it.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
248 del self._dirs[parent]
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
249
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
250 if parent == '':
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
251 return
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
252
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
253 basename = os.path.basename(parent)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
254 parent = os.path.dirname(parent)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
255
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
256 def _populate_tree_entries(self, dirty_trees):
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
257 self._dirs.setdefault('', dulobjs.Tree())
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
258
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
259 # Fill in missing directories.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
260 for path in self._dirs.keys():
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
261 parent = os.path.dirname(path)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
262
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
263 while parent != '':
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
264 parent_tree = self._dirs.get(parent, None)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
265
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
266 if parent_tree is not None:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
267 break
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
268
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
269 self._dirs[parent] = dulobjs.Tree()
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
270 parent = os.path.dirname(parent)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
271
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
272 for dirty in list(dirty_trees):
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
273 parent = os.path.dirname(dirty)
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
274
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
275 while parent != '':
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
276 if parent in dirty_trees:
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
277 break
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
278
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
279 dirty_trees.add(parent)
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
280 parent = os.path.dirname(parent)
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
281
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
282 # The root tree is always dirty but doesn't always get updated.
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
283 dirty_trees.add('')
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
284
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
285 # We only need to recalculate and export dirty trees.
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
286 for d in sorted(dirty_trees, key=len, reverse=True):
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
287 # Only happens for deleted directories.
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
288 try:
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
289 tree = self._dirs[d]
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
290 except KeyError:
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
291 continue
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
292
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
293 yield tree
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
294
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
295 if d == '':
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
296 continue
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
297
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
298 parent_tree = self._dirs[os.path.dirname(d)]
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
299
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
300 # Accessing the tree's ID is what triggers SHA-1 calculation and is
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
301 # the expensive part (at least if the tree has been modified since
598
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
302 # the last time we retrieved its ID). Also, assigning an entry to a
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
303 # tree (even if it already exists) invalidates the existing tree
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
304 # and incurs SHA-1 recalculation. So, it's in our interest to avoid
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
305 # invalidating trees. Since we only update the entries of dirty
792955be68dd Only export modified Git trees
Gregory Szorc <gregory.szorc@gmail.com>
parents: 596
diff changeset
306 # trees, this should hold true.
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
307 parent_tree[os.path.basename(d)] = (stat.S_IFDIR, tree.id)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
308
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
309 def _handle_subrepos(self, newctx):
648
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
310 sub, substate = parse_subrepos(self._ctx)
647
3ceacdd23abe hg2git: add 'new' prefix to _handle_subrepos variables
Siddharth Agarwal <sid0@fb.com>
parents: 646
diff changeset
311 newsub, newsubstate = parse_subrepos(newctx)
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
312
648
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
313 # For each path, the logic is described by the following table. 'no'
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
314 # stands for 'the subrepo doesn't exist', 'git' stands for 'git
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
315 # subrepo', and 'hg' stands for 'hg or other subrepo'.
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
316 #
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
317 # old new | action
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
318 # * git | link (1)
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
319 # git hg | delete (2)
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
320 # git no | delete (3)
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
321 #
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
322 # All other combinations are 'do nothing'.
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
323 #
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
324 # git links without corresponding submodule paths are stored as subrepos
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
325 # with a substate but without an entry in .hgsub.
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
326
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
327 # 'added' is both modified and added
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
328 added, removed = [], []
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
329
648
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
330 def isgit(sub, path):
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
331 return path not in sub or sub[path].startswith('[git]')
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
332
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
333 for path, sha in substate.iteritems():
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
334 if not isgit(sub, path):
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
335 # old = hg -- will be handled in next loop
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
336 continue
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
337 # old = git
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
338 if path not in newsubstate or not isgit(newsub, path):
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
339 # new = hg or no, case (2) or (3)
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
340 removed.append(path)
648
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
341
647
3ceacdd23abe hg2git: add 'new' prefix to _handle_subrepos variables
Siddharth Agarwal <sid0@fb.com>
parents: 646
diff changeset
342 for path, sha in newsubstate.iteritems():
648
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
343 if not isgit(newsub, path):
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
344 # new = hg or no; the only cases we care about are handled above
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
345 continue
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
346
648
bd63cdfbc1de hg2git: make _handle_subrepos worked in the removed case
Siddharth Agarwal <sid0@fb.com>
parents: 647
diff changeset
347 # case (1)
672
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
348 added.append((path, sha))
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
349
fbfa6353d96c hg2git: fix subrepo handling to be deterministic
Siddharth Agarwal <sid0@fb.com>
parents: 671
diff changeset
350 return added, removed
596
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
351
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
352 @staticmethod
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
353 def tree_entry(fctx, blob_cache):
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
354 """Compute a dulwich TreeEntry from a filectx.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
355
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
356 A side effect is the TreeEntry is stored in the passed cache.
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
357
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
358 Returns a 2-tuple of (dulwich.objects.TreeEntry, dulwich.objects.Blob).
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
359 """
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
360 blob_id = blob_cache.get(fctx.filenode(), None)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
361 blob = None
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
362
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
363 if blob_id is None:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
364 blob = dulobjs.Blob.from_string(fctx.data())
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
365 blob_id = blob.id
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
366 blob_cache[fctx.filenode()] = blob_id
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
367
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
368 flags = fctx.flags()
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
369
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
370 if 'l' in flags:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
371 mode = 0120000
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
372 elif 'x' in flags:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
373 mode = 0100755
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
374 else:
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
375 mode = 0100644
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
376
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
377 return (dulobjs.TreeEntry(os.path.basename(fctx.path()), mode, blob_id),
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
378 blob)
d6b9c30a3e0f Export Git objects from incremental Mercurial changes
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
379