--- title: "Advanced version control" author: Franklin Bristow --- Advanced version control ======================== ::: outcomes * [X] Clone an existing remote version control repository. * [X] Create a new remote version control repository. * [X] Push an existing local repository to a remote version control repository. * [X] Use branching and merging with a version control repository. * [X] Synchronize repositories between computers for a personal project. ::: Cloning an existing repository ------------------------------ We're not the first people to initialize repositories and use version control. In fact, vast online repositories of different projects exist on sites like [GitHub], [GitLab], [SourceHut], and more. Cloning an existing repository from a website is straightforward using the `git clone` subcommand. ::: example Let's clone a specific repository. The specific repository contains the source code that you downloaded on the command-line last week (the hello project). Start by opening your terminal and connecting to Aviary, then run: ```bash git clone https://code.cs.umanitoba.ca/cs-lab-course/hello ``` `git` is going to start connecting to a remote web server that's hosting some source code in a `git` repository, and you should see output that looks similar to the following: ``` Cloning into 'hello'... remote: Enumerating objects: 8, done. remote: Counting objects: 100% (8/8), done. remote: Compressing objects: 100% (4/4), done. remote: Total 8 (delta 0), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (8/8), done. ``` There's a lot of output here that (honestly) isn't that important in terms of being able to use `git` for tracking changes to your files. Just like when you created your own `git` repository last week, the folder `hello` now contains another repository. This repository contains the code that you downloaded last week as a `zip` file! ::: [GitHub]: https://github.com/ [GitLab]: https://about.gitlab.com/ [SourceHut]: https://sr.ht/ ### Finding a repository's address Similar to connecting to a remote computer, we have to tell `git` where to find the repository by giving it an address or location. In the example above, I provided you with the complete command to clone the repository, but sometimes you'll find a repository that you're interested in cloning (including your own!). ::: example [Standard Ebooks] is a web site that's curated and published many free, high quality public domain e-books. The team at Standard Ebooks use GitHub extensively to collaborate and track changes to the files that are used in the creation of their e-books. You can find a collection of their repositories on GitHub here: https://github.com/standardebooks Scroll down a little bit to the "Repositories" section and pick an e-book. The titles (unfortunately) are pretty messy here, but they are readable (if you can't pick an e-book, I'll pick one for you: [Alice's Adventures in Wonderland]). Once you click on a link to an e-book, you'll be taken to the main page on GitHub for that repository. The main table in the center of the page is a listing of the files and folders that are in that repository. Just above and to the right of that table is a green button labelled "Code": ![The green "Code" button in question.](github-code.png) Click the green "Code" button, then you can copy the text under "HTTPS", this is the address of the repository that you're going to use to tell `git` where it should look when it's cloning the repository. ![The location of the repository that you can copy.](github-https-url.png) Now you can paste this link into your terminal as part of a `git clone` command: ```bash git clone https://github.com/standardebooks/lewis-carroll_alices-adventures-in-wonderland_john-tenniel.git ``` Now you've cloned a repository! :tada: [Standard Ebooks]: https://standardebooks.org/ [Alice's Adventures in Wonderland]: https://github.com/standardebooks/lewis-carroll_alices-adventures-in-wonderland_john-tenniel ::: Create a new remote version control repository ---------------------------------------------- Cloning existing repositories is something that you will do *sometimes*, but not actually that often. The times when you will will want to clone existing repositories are either: 1. You've previously created a remote repository and you want to clone it into a new location, or 2. You're downloading someone's software for the purpose of compiling it and using it. Let's make our own new remote repository. Creating a new remote repository is different from initializing a repository on the command line with `git init`, but they are related to one another. When you initialize a new repository on the command line, `git` is creating that `.git` directory that's filled with all the [stuff `git` needs] to keep track of changes to your files. When you create a new remote repository, you're really creating a well-known other place (that's not on your computer or on Aviary) where you can duplicate and manage your files. We're going to be doing this not with GitHub, but with a free, open-source, and self-hostable product called [GitLab]. The U of M CS Department hosts an instance of GitLab (there's a web site you can go to that's dedicated for students taking COMP courses at the U of M) at https://code.cs.umanitoba.ca Open that link, then we're going to go through a few things to create a new repository. [stuff `git` needs]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects ### Sign up for GitLab Before we can do *anything* with this instance of GitLab, you're going to need to sign up for a new account. You can skip this step if you've previously signed up for a GitLab account here. On [the log in page], click on the "Register now" link, it's just below the big blue "Sign in" button. When you sign up, you can enter whatever you want for your "First name", "Last name", and "Username" (this is not audited by our tech staff), but you **must** use your `@myumanitoba.ca` e-mail address, no other e-mail addresses are permitted to sign up. ::: aside The account you're creating here is not at all connected to your account on Aviary, you don't (and shouldn't) use the same password for both of these systems. This is a real aside: consider starting to use a [password manager] to generate and securely store your passwords. I can personally recommend [KeePassXC], but some people like [1password]. Others prefer to manage their passwords on the command line and can use tools like [pass]. Main advice: don't use the same password for everything; don't write your password on your hand. [pass]: https://www.passwordstore.org/ [1password]: https://1password.com/ [KeePassXC]: https://keepassxc.org/ [password manager]: https://en.wikipedia.org/wiki/Password_manager ::: Just after you sign up, GitLab will ask you what you think your "role" should be. Ultimately this is helping GitLab decide how it should organize itself in terms of what you see when you first log in. I would suggest that you choose "**Software Developer**" here. Congrats! You just signed up for a version control repository service! :tada: [the log in page]: https://code.cs.umanitoba.ca/users/sign_in ### Create a new repository Now we get to the part that we're really interested in: actually creating a remote repository to upload our code to. Just after you finish creating your account, and each time you log in to GitLab, you're going to be at [your dashboard]. ![The GitLab dashboard.](dashboard.png) Click on "Create a project". There are several options you can choose from to create a project, but for now you should click on "Create blank project". Ultimately our goal is to copy an existing repository to this remote repository. Now you get to be creative by picking a project name. The project name that you enter here will become part of the project's URL (the address or location of the repository). If you're following along with the examples from last week, you should pick something like `my-project`. If you want to start putting your course documents into this repository (like you might in an assignment), you should pick something like a course name (e.g., `SCI2000`). ![Selecting a project name.](project-name.png) The only option you have for visibility is "Private", and you should leave it that way. ![Project visibility level (you can only choose Private).](visibility-level.png) ::: aside Some repository hosting software (like GitHub) provide the (default) option to have public repositories. While public repositories are important, in circumstances where you're planning to put your course documents into the repository (which you should be!), you don't want to risk having your course documents being publicly available and accessible. In short: avoid academic misconduct situations by keeping your repository private. ::: You'll also see some options for project configuration. You should **deselect everything** for your first repository; we're eventually going to be uploading an existing repository to this remote repository, and we don't want any file conflicts. ![Project configuration options.](project-configuration.png) Finally, click the "Create project" button. If everything's worked out, you should see a new (and empty) project page: ![A fresh, new, empty project.](new-empty-project.png) You just created a new remote repository! :tada: [your dashboard]: https://code.cs.umanitoba.ca/dashboard/projects Pushing to a remote repository ------------------------------ Right, that repository is there, but it's... empty. Our goal now is to take an existing repository and "push" the repository's contents to our new empty remote repository. Open your terminal, connect to Aviary, and navigate to the folder that contains a `git` repository (one of the ones you created last week). Alternatively, create a new folder and initialize an entirely new repository, and `add` and `commit` some files to that repository. All of the examples below this section assume that your present working directory (`pwd`) is a Git repository and contains a `.git` folder. ### Configuring your repository Before you can push to a remote repository, you have to tell Git where it's supposed to push to by configuring your repository. This is a **one-time** set up, you only need to do this once per repository. You can configure the remote repository in Git using the `git remote` subcommand: ::: example Starting from a repository (your present working directory should be one that has a `.git` directory), we first need to tell `git` *where* we want to push our code to, this is going to be the remote repository's address. We'll do this using the `git remote` command: ```bash git remote add origin https://code.cs.umanitoba.ca/you/your-project ``` You can get location or address of the repository in GitLab in any of the following ways: * Copy and paste the address from your web browser, * Find the address in the instructions GitLab shows you for setting up your repository, or * Click on the blue "Clone" button in GitLab and copy the address from there. ![The blue "Clone" button in GitLab](gitlab-clone.png) When you run this command with your own repository's address, `git` shouldn't print any output, and you're ready to push your repository to the remote! ::: ### Pushing to the remote repository Now that your repository has a remote, we want to push all the commits that you've made (and thus all the files) to the remote repository. Pushing to the remote repository is something that you should do often, our goal is to make sure that the local repository you have is approximately synchronized with the remote repository on the server. You can push all commits to the remote repository using the `git push` subcommand: ::: example The first time you push to the remote repository, you have to tell Git which "branch" goes to the remote (more on branches below): ```bash git push -u origin main ``` Any subsequent time you want to synchronize your local repository with the remote repository, you just need to run: ```bash git push ``` Regardless of which one you're running here, `git` is going to ask you to enter a username and password. The username and password that you enter here are the ones that you chose when you signed up for your GitLab account at . You should see output similar to the following: ``` [you@bird my-project]> git push -u origin main Username for 'https://code.cs.umanitoba.ca': you@myumanitoba.ca Password for 'https://you@mumanitoba.ca@code.cs.umanitoba.ca': Enumerating objects: 3, done. Counting objects: 100% (3/3), done. Writing objects: 100% (3/3), 225 bytes | 225.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 To https://code.cs.umanitoba.ca/csstud1/my-project.git * [new branch] main -> main Branch 'main' set up to track remote branch 'main' from 'origin'. ``` Now refresh your browser window that has the remote repository, and you should see some files! :tada: ::: Branches -------- So far we've been using the basic features of Git to track changes to files that we have in a repository. We can visualize these changes as a linear list: ![A linear sequence of commits.](linear-commits.jpg) We can literally see this if we use the `git log` subcommand, and we can also get back to previous commits using the `git checkout` subcommand. ::: example Take a look at the commit history from one of your own repositories: ```bash [you@bird my-project]> git log commit 3d7739db044efa992bbfdb2d613a0fc025f39d71 (HEAD -> main, origin/main) Author: Franklin Bristow Date: Mon Sep 26 14:07:53 2022 -0500 Updating TODO. commit 943ae855084a66fbdee8b7469f81be284e04cf65 Author: Franklin Bristow Date: Mon Sep 26 13:58:54 2022 -0500 Writing more about version control. Looking back at log messages is cool, so let's use `git log` to see them. ``` Those long numbers that you see after the word "commit" are unique identifiers for that commit. You can get back to the state that your project was in at that state by using the `git checkout` command: ```bash [you@bird my-project]> git checkout 3d7739 ``` You can either copy and paste the complete commit ID, or you can use the first 7 or 8 characters. When you run that command (with the appropriate commit ID!), `git` will print out some information about what you can do, including how to get back to the `main` branch. You can quickly get back to the "head" (the last commit that you made) using the `git switch` command: ```bash [you@bird my-project]> git switch main ``` ::: This is good enough for the most part, but there's more to Git than just keeping track of a linear sequence of changes over time. One of the most powerful concepts behind Git is that it can actually keep track of an entire *graph* of changes over time, possibly coming from many authors and contributors. We're going to take a look at a feature in Git called "branches". Branches are something that we can use to track multiple independent sequences of commits to a repository simultaneously. When a repository has multiple branches, its commit history can look something like this: ![Branches in a repository.](commits with branches.jpg) Branches are very useful when you're working in a team of people, who are each independently working on (for example) a feature in a bigger project. All of these people will want to independently make changes and commits, but not interfere with each other's work until they're finished. ::: aside You will almost certainly not use branches on projects that you're working on by yourself. You *might*, and you're more than welcome to use them, but don't worry too much about forcing this into your version control workflow. Because of that, we're going to be looking mostly at the mechanical "How do I make a branch, and how do I merge a branch back into the `main` branch?" instead of the philosophy of best practices with branches. ::: ### Creating a new branch You can create a new branch in your `git` repository using the `git branch` subcommand. ::: example From anywhere inside your `git` repository, you can run `git branch` to create a new branch in your repository: ```bash git branch my-new-branch ``` Typically when you make a new branch, the name of the branch should represent something about what you're intending to work on in that branch (e.g., "new-feature"). ::: ### Listing and switching branches We just created a branch (:tada:), but... that's not very useful in and of itself. We want to be able to see which branches are currently in a repository, and we want to be able to *switch* to that branch so that we can track changes. We're going to use the `git branch` and `git switch` commands. ::: example We can ask `git` to tell us which branches are in a repository by using the `git branch` subcommand without specifying a branch name: ```bash [you@bird my-project]> git branch * main my-new-branch ``` We can ask `git` to switch us to that branch using the `git switch` subcommand: ```bash [you@bird my-project]> git switch my-new-branch [you@bird my-project]> git branch main * my-new-branch ``` ::: Once we're on this new branch, we can use `git` the same way that we were before, `add`ing and `commit`ing changes, then switch back to our `main` branch using `git switch`. Try creating several branches, make changes, stage (add), then commit those changes, then switch between branches to get an sense of how commits apply to branches. ### Pushing branches to remote You can also push branches that you have in your local repository to the remote repository. You actually already did this the first time you pushed to a remote repository with your `main` branch when you did `git push -u origin main`. You can also push local branches to remote with `git push`: ::: example To push the new `my-new-branch` to the remote repository, we would use the `push` subcommand with the `-u origin` option: ```bash git push -u origin my-new-branch ``` ::: ### Merging branches OK great! Now we have one repository that has several different commit histories. If you want to think in terms of data structures (and maybe you don't want to think in terms of data structures), we've effectively built a tree :tree: of commit histories. Having these multiple, independent commit histories is fine, but if we're working on a project with multiple people, and those multiple people each have their own branch, we need to be able to take all the work that they've independently done and put it back into the `main` branch so that we have a single project that has *all* the commits and features. To do this, we're going to have to "merge" the branches back together; take all the commits from one branch, and put them into another branch. We're going to use the `git merge` subcommand to do this. ::: example Start by listing the branches that you have in your repository with `git branch`: ```bash [you@bird my-project]> git branch main my-new-branch * another-branch ``` You can merge multiple branches together by switching to the "target" branch (usually `main`), then running `git merge` with the name of the branch that you want to bring into that branch. ```bash [you@bird my-project]> git switch main # switch back to the main branch [you@bird my-project]> git merge my-new-branch ``` If everything goes smoothly, `git` will tell you about what it's done to your file by telling you how many files were changes, how many "insertions" it made (additions to your files) and how many "deletions" it made (lines or changes that removed content): ``` Updating 8e7c6d6..ca7accf Fast-forward updated-file | 3 +-- 1 file changed, 1 insertion(+), 2 deletion(-) ``` ::: #### Merge conflicts [Things may not go smoothly]. When two people are working independently on two different, independent histories within a single repository, they may need to make changes to the same file. `git` does a pretty good job of automatically "merging" multiple changes to the same file, but sometimes it can't and you'll get a "merge conflict". ::: example Let's start again with a different branch, switch to `main`, then `merge` changes again: ```bash [you@bird my-project]> git switch main # switch back to the main branch [you@bird my-project]> git merge my-other-new-branch ``` When things don't go the way you expected, `git` is going to report that it couldn't automatically merge your changes: ``` Auto-merging updated-file CONFLICT (content): Merge conflict in updated-file Automatic merge failed; fix conflicts and then commit the result. ``` Git tried it's best (it really did!), but it couldn't figure out how it should merge the changes from the different histories. When Git can't merge changes, it's in a state where you either must abandon the merge, or help Git figure out the correct way to merge things. If you just want to bail out (give up on merging and get back to a good state in `main`), you can use the `--abort` option: ```bash [you@bird my-project]> git merge --abort ``` At that point, you're back to the last commit you had on the `main` branch. You probably want to help Git with the merge, though. The way you help Git with the merge is to open the file that's listed in the conflict and take a look. The file's contents are going to look something like: ``` <<<<<<< HEAD the content that's in the main branch ======= the content that's in the branch you were trying to merge >>>>>>> my-other-new-branch ``` Git is highlighting the change that it couldn't merge. Your job to help Git is to replace this entire block (everything between `<<<<<<` and `>>>>>>`) with what is the correct way to "merge" this. The "correct" way to merge this, of course, depends on the files that you're tracking the history of. If you're tracking Java files, for example, you'll have to figure out and then write whatever the actually correct and compilable version of code should be for this section. Once you've done that and you're satisfied with the changes, you should `add` the changes that you made, then `commit`: ```bash [you@bird my-project]> git add . [you@bird my-project]> git commit -m "Merge with my-other-new-branch" ``` :tada:! You just resolved a merge conflict! ::: [Things may not go smoothly]: https://www.youtube.com/watch?v=yJxCdh1Ps48 Synchronizing projects between computers ---------------------------------------- Doing `scp` is, honestly, a little tedious. Even using graphical tools is pretty tedious. One way (among many!) that you can keep your folders synchronized between two different computers is to use a remote `git` repository, cloning and pulling on the "secondary" computer (Aviary in this case). You've already got your repository on Aviary, so let's clone the repository onto your own personal computer. ### Install Git Git is originally a command-line tool (how you've been using it). You can install `git` on all of Windows, macOS, and Linux. How you install it depends on which of those that you're using.
Installing Git on macOS You can install Git in several ways on macOS, but probably the most straightforward way is to install [Xcode]. This honestly feels a bit like [overkill]; we're installing an entire programming environment and multiple programming language compilers just to get Git, but it really is the most straightforward way to get Git. The easiest way to install Xcode is from the [Mac App Store]. If you're feeling adventurous, you can take an alternative approach and install [Homebrew]. If you take this approach, I'll give you a link to the [macOS install page on Git's website], but you're otherwise on your own. [Xcode]: https://developer.apple.com/xcode/ [overkill]: https://en.wikipedia.org/wiki/Overkill_(term) [Mac App Store]: https://apps.apple.com/us/app/xcode/id497799835?mt=12 [Homebrew]: https://brew.sh/ [macOS install page on Git's website]: https://git-scm.com/download/mac
Installing Git on Windows You can install Git for Windows by downloading the installer on [Git's "Download for Windows" page]. You almost certainly want to download the **Standalone Installer**, and you almost certainly want to download the "64-bit Git for Windows Setup". Download the installer, launch it, and follow the instructions on-screen. You should be able to accept all default options in the install process. If you're feeling adventurous, you can take an alternative approach and install either [winget] or [Chocolatey] and install Git using a package manager. If you choose to take this approach, you're on your own. [Git's "Download for Windows" page]: https://git-scm.com/download/win [winget]: https://learn.microsoft.com/en-us/windows/package-manager/winget/ [Chocolatey]: https://chocolatey.org/
Installing Git on Linux The best way to install Git on your Linux machine is from the command line, using your package manager. We're not going to talk much about package managers (yet), but [Git's "Download for Linux and Unix" page] lists the exact command that you need to run for most popular Linux distributions. Find yours and run it. You do need to run these commands as the `root` user. If you've used `sudo` before, this is what you should use to run these commands. If you're not sure what to do here, you should ask for help. [Git's "Download for Linux and Unix" page]: https://git-scm.com/download/linux
### Cloning your repository Once you've got a `git` client installed you can actually follow the instructions in [Cloning an existing repository]. ### Workflow You can generally keep using `git` the way that you've been using it up until now: make changes to your files, stage the changes with `add`, then `commit` the changes to history. Now you should also `push` to your remote repository occasionally: ```bash [you@bird my-project]> git push ``` You *don't* need to do `-u origin`. To get the changes on your other computer, you should use `git pull`: ```bash [you@your-computer my-project]> git pull ``` Depending on what you've done with your repositories, you may get merge conflicts when you pull (if you edited a file on your local computer *and* on Aviary). You can deal with merge conflicts as described above. Further reading =============== This is **a lot**. Git is an incredibly big tool with lots of ways to use it, and we couldn't possibly look at every possible way to use it. Instead, here are some additional resources for using Git that you can check out: * The [Documentation page] on Git's homepage. This actually has several sub-resources that you should check out: a complete reference manual for Git, "Pro Git" (a book), and a series of videos introducing basic Git. * [Learn GitLab with tutorials] is a pretty comprehensive overview of **GitLab** itself. There's a lot about Git in this set of tutorials, but the target for these tutorials is specifically how to use GitLab. Some of the skills that you learn here would be transferable to something like GitHub. * [Learn Enough Git to be Dangerous] is a good resource for version control basics. Part of it is free, but it isn't entirely free. * [GitHub Skills] is a suite of resources similar to "Learn GitLab with tutorials", but targeting GitHub specifically. [Documentation page]: https://git-scm.com/doc [Learn GitLab with tutorials]: https://docs.gitlab.com/ee/tutorials/ [Learn Enough Git to be Dangerous]: https://www.learnenough.com/git-tutorial [GitHub Skills]: https://skills.github.com/