[openstack-dev] [git-upstream] [Duplicate Changes] How to view only commits applied since last import

Darragh Bailey - mailing lists dbailey at hpe.com
Fri Nov 18 15:27:50 UTC 2016




On 17/11/16 12:47, Paul Bourke wrote:
> Hi Darragh / git-upstream community,
> 
> I've been looking at a way to easily view a log of what commits made
> since the last upstream import when managing a branch with git-upstream.
> Right now this can be hard to do - something like 'git log
> upstream/master..HEAD' shows a lot of duplicate commits reasons I don't
> understand well enough to explain.


As mentioned I thought it might be worth addressing this piece
separately, and then hopefully re-factor what's here to be added to the
git-upstream docs.


To start with, lets step through what things look like, and then I'll
try explain why git-upstream merges history in this way.



Assuming there are two local changes being carried locally that have not
yet been accepted upstream. These were initally added to a local/mitaka
branch and two imports were subsequently performed and the changes have
not yet landed on stable/mitaka:


       ----X---Y---N--------O     local/mitaka
      /           /        /
     /       X'--Y'       /       (rebase of X & Y onto G)
    /       /            /
   /       /       X''--Y''
  /       /       /
-E-------G-------I                upstreams stable/mitaka branch


When syncing latest from upstream stable/mitaka was initially at G, and
this resulted git-upstream replaying X' being onto G and then creating a
merge commit N which has exactly the same contents as the tree at X'.


When you look at the history of N you'll see:

     ----X---Y---N
    /           /
   /       X'--Y'
  /       /
-E-------G

And then looking at O shows


       ----X---Y---N--------O     local/mitaka
      /           /        /
     /       X'--Y'       /
    /       /            /
   /       /       X''--Y''
  /       /       /
-E-------G-------I


At this point, now you'll see X, X' & X'', and Y, Y' & Y''. Obviously
this cause a bit of confusion when listing the changes using:

 git log --oneline --graph E~1..local/mitaka

you see something like the following (see [1] for how I created this):

*   899cb6e [O] Merging Y2 into N
|\
| * 3c08f48 [Y] Adding tmp7wtzvo69  <--- really Y''
| * db8d2c3 [X] Adding tmpnxot0u9s
| * 97cc90c [I] Adding tmps2xhxp2f
* |   9ea35c3 [N] Merging Y1 into Y
|\ \
| * | f361e9f [Y] Adding tmp7wtzvo69  <--- really Y'
| * | 90d58eb [X] Adding tmpnxot0u9s
| |/
| * ed973e6 [G] Adding tmpb443aabz
| * 74cd9b8 [E] Adding tmpwcrm4bxi
* 3cc85cf [Y] Adding tmp7wtzvo69  <--- original Y
* e93f6cb [X] Adding tmpnxot0u9s


If this is limited further to:

 git log --oneline --graph stable/mitaka..local/mitaka

*   899cb6e [O] Merging Y2 into N
|\
| * 3c08f48 [Y] Adding tmp7wtzvo69  <--- really Y''
| * db8d2c3 [X] Adding tmpnxot0u9s
*   9ea35c3 [N] Merging Y1 into Y
|\
| * f361e9f [Y] Adding tmp7wtzvo69  <--- really Y'
| * 90d58eb [X] Adding tmpnxot0u9s
* 3cc85cf [Y] Adding tmp7wtzvo69
* e93f6cb [X] Adding tmpnxot0u9s  <--- original Y

This looks like the same 2 changes have been applied three times, and in
a way, they have.

This obviously can be confusing. Hence you're proposed change to provide
a way to display only the interesting commits so instead would see:

* 899cb6e [O] Merging Y2 into N
* 3c08f48 [Y] Adding tmp7wtzvo69  <--- really Y''
* db8d2c3 [X] Adding tmpnxot0u9s


> Thanks in advance for anything that might help cut through some of the
> confusion.
> 
> Cheers,
> -Paul

Back to the question as to why git-upstream does it this way:

Looked at possibility of merges, just land patches against the tree and
then on a regular basis attempt to merge in the latest from upstream.

* It's possible (if a little awkward at the moment) to extract
information on how many local patches are being carried. Merging would
meant that you wouldn't see the same patches from the series duplicated,
but you would have multiple commits in your local history that are
likely not identifiable as cherry-picks of the final accepted change
upstream (how many patches are accepted without any changes?)

* Re-applying changes allows for conflicts with changes to be resolved
in the relevant patch. Otherwise the conflicts are resolved inside a
merge commit can be quite hard to review. Subsequently when the changes
are accepted upstream, and then merged in on the next sync the conflicts
generated will frequently be different since the change accepted
upstream will be slightly different. Using a patch series approach
allows to automatically drop the duplicate changes, and re-apply an
update series, using a manual merge commit with conflict resolution
means likely someone is going to have to spend more time resolving
conflicts.

* Want to avoid rebasing published branches. As far as developers inside
a company are concerned, the local branches are published history, and
for them to co-operate when issues need to be fixed for internal
testing/releases to continue, rewriting history is likely going to make
life more difficult. Rebasing a published branch can work, and I know
some linux kernel developers have special published branches that are
regularly rebased, so it can be made work.

* Avoiding needing a dedicated maintainer. When merge to resolve
conflicts between branches with multiple divergent commits are involved,
it frequently means the person needs to know a lot of the codebase or be
really good with git, and in turn have someone that can review all the
conflicts sensibly in Gerrit. Assuming you're planning on contributing
the changes upstream, someone is going to have to resolve the conflict
for each change anyway and upload for review against the upstream.
Having a dedicated maintainer can work within some projects, but many
grow to the point that different people know different sections better.
And if you're goal is collaboration and promoting exchange of knowledge
it seems better to avoid silo'ing related knowledge. If instead the
conflicts are resolved for each carried change, it's possible to
re-review the updated patch with the context of what that change was
intended to resolve (and submit an updated patch for upstream).

It seemed unlikely to me that it would be possible to maximise the
automated syncing, since the odds of a conflict seemed high:
* locally landed changes are likely to be done because there is a
problem and need to keep something working
* getting accepted upstream means it should do the right thing

The correct approach could be either superficially or logically
different. Potentially the correct fix was to amend some code elsewhere.
If we merged from upstream, and resolved a conflict, then the correct
fix appeared upsteam, how to automatically drop the local changes? Since
they can't be simply reverted without the conflict resolution in the
merge conflicting on revert of the earlier change. Seems more likely to
require a lot more manual effort.


Also looked previously at using a dedicated branch and rebasing the
changes from the previous onto the updated:

* Any local changes not yet landed need to be re-targeted
* Have to block any changes submitted against the outdated
* Developers need to be provided with a tool that knows which is the
correct branch and re-target before uploading the commit for review each
time.



Other alternatives to this include using a patch series (guilt,
git-buildpackage), or producing a different target branch or rewriting
the history in way that would cause issues if the branch was published
(think topgit and stgit work this way). Not familiar enough with
git-series to know which set to include it in, and there is also "pry",
which I think is taking a list of patch id's to retrieve from OpenStack
Gerrit to be applied on top, and used by RackSpace???


So a few solutions in this area using different approaches for different
benefits/trade-offs.

YMMV, and I'm almost certain I'm forgetting some reasons as well ;-)

--
Regards,
Darragh Bailey
IRC: electrofelix
"Nothing is foolproof to a sufficiently talented fool" - Unknown



More information about the OpenStack-dev mailing list