Kyle Banks

Unity3D: Panning and Zooming (pinch-to-zoom) Your Camera With Touch and Mouse Input

Written by @kylewbanks on Nov 26, 2016.

While developing Byter for GitHub Game Off 2016, I wanted to allow players to pan and zoom the camera on both touch (tap and drag to pan, pinch to zoom) devices including Android and iOS, as well as using the mouse (click and drag to pan, mouse-wheel scroll to zoom) on the desktop and WebGL versions of the game. The camera for Byter has a fixed angle, meaning players cannot rotate it, but zooming and panning are important for collecting the Lost Packets in the game, and are a nice addition to allow interaction in a mostly static clicker style game.

Unity3D Camera Panning and Zooming (pinch-to-zoom) using Touch and Mouse Input

The source code for this is available in the open source github.com/KyleBanks/ggo16-byter repository for the game, and a slightly generalized version will be provided at the bottom of this post, but I figured it could be pretty useful for other Unity developers so I’d like to walk through the code and explain how it works.

Camera Settings

The Camera setup is pretty standard, I’m simply using perspective projection (which will be important for how we handle zooming) with a fixed rotation to give a slightly tilted field of view.

Unity3D Camera Settings for Panning and Pinch-to-Zoom

The only thing really noteworthy here is the CameraHandler script attached to the camera, which is where the input handling to control panning and zooming will be located.

The Script

All of the code below is contained within the CameraHandler script, and there is no need for logic outside of this class unless you want further customization.

Required Variables

The first thing I did was define some boundaries for panning and zooming, and the speed at which the camera moves and zooms. These allow me to restrict the camera to a certain area of the scene, and to limit how much you can zoom the camera. These values will be different depending on your game and desires, but here’s how I defined them:

private static readonly float PanSpeed = 20f;
private static readonly float ZoomSpeedTouch = 0.1f;
private static readonly float ZoomSpeedMouse = 0.5f;

private static readonly float[] BoundsX = new float[]{-10f, 5f};
private static readonly float[] BoundsZ = new float[]{-18f, -4f};
private static readonly float[] ZoomBounds = new float[]{10f, 85f};

The boundaries are defined as float arrays each with a length of two. The first value represents the lower bound, and the second represents the upper bound. For the case of BoundsX for example, I’m keeping the camera between -10 and 5 on the x-axis. Similarly, BoundsZ will control the z-axis, and ZoomBounds will be used for the zooming. For Byter there is no y-axis movement, so there’s no need to define bounds for that.

Next up we’re going to need some instance variables to keep track of the camera state between frames. We’ll also want to keep a reference to the actual Camera so we can modify its properties as the user interacts:

private Camera cam;

private Vector3 lastPanPosition;
private int panFingerId; // Touch mode only

private bool wasZoomingLastFrame; // Touch mode only
private Vector2[] lastZoomPositions; // Touch mode only

Let’s break down these properties one-by-one:

  • cam will simply store a reference to our Camera.
  • lastPanPosition is the location of the user’s finger or mouse during the last frame where they were panning the camera.
  • panFingerId tracks the ID of the finger being used to pan the camera, in touch-mode only. This is not used at all for mouse controls, as there is typically only one mouse being used.
  • wasZoomingLastFrame is used in touch-mode only to determine if the camera was being zoomed in the last frame.
  • lastZoomPositions, like the lastPanPosition, tracks the location of the user’s fingers during the last frame where they were zooming the camera. This property, unlike the lastPanPosition, is only applicable in touch-mode, and is not used for mouse controls.

Awake

Next up we have the Awake function where we’ll grab a reference to the camera:

void Awake() {
    cam = GetComponent<Camera>();
}

This should be familiar to you if you’ve used Unity before, we’re simply grabbing the Camera component of the GameObject. You could also replace this with cam = Camera.main but I prefer to use GetComponent in case you ever add another camera to your game or remove the MainCamera tag from your camera. In any case, this is all we’ll need in our Awake function, and we’re ready for some actual logic.

Update

In the Update loop we’ll want to check if we should handle touch or mouse controls, and we’ll define two empty functions (for now) that we can call in either case:

void Update() {
    if (Input.touchSupported && Application.platform != RuntimePlatform.WebGLPlayer) {
        HandleTouch();
    } else {
        HandleMouse();
    }
}

void HandleTouch() {

}

void HandleMouse() {

}

For my purposes, I simply check if touch is supported using the Input.touchSupported property, and ensure the game isn’t running in a WebGL player. This is important because even on desktop the WebGL player does support touch, however as mobile is not currently supported for Unity in WebGL, I’m making the assumption that all WebGL players are playing in their desktop browser and not on a mobile device.

It’s also worth noting that for Byter I wanted to disable camera panning and zooming when a menu was open, so at the top of the Update function prior to checking the platform and calling a Handle___ function, I have something like so:

// OPTIONAL
if (isMenuOpen) {
    return;
}

This is entirely optional and you may have your own cases for your game where you want to disable the camera movement, so if th, the top of the Update function is a good place to verify that you’re in a state where the camera should be interactive.

HandleMouse

Alright, time to make some moves. We’ll implement the HandleMouse function first as it’s the easier of the two to test (the Unity Editor will be using this), and it’s also simpler to implement than the touch controls.

void HandleMouse() {
    // On mouse down, capture it's position.
    // Otherwise, if the mouse is still down, pan the camera.
    if (Input.GetMouseButtonDown(0)) {
        lastPanPosition = Input.mousePosition;
    } else if (Input.GetMouseButton(0)) {
        PanCamera(Input.mousePosition);
    }

    // Check for scrolling to zoom the camera
    float scroll = Input.GetAxis("Mouse ScrollWheel");
    ZoomCamera(scroll, ZoomSpeedMouse);
}

First we check if the mouse was clicked this frame using Input.GetMouseButtonDown and simply store the current mouse position as the lastPanPosition. If the mouse was not clicked this frame, we check if it is still clicked using Input.GetMouseButton and execute the PanCamera method, which we’ll get to shortly.

Next, we handle zooming using the scroll wheel. Using Input.GetAxis and providing “Mouse ScrollWheel” as the axis, we get the distance that the scroll wheel has been scrolled since the last frame. Next we call ZoomCamera with the scroll returned from Input.GetAxis, and the speed at which we want to zoom. Here’s where I’m using the ZoomSpeedMouse constant that I defined above, and you can probably guess where we’ll be using the ZoomSpeedTouch constant.

PanCamera and ZoomCamera

Before we get to the touch input, let’s implement the PanCamera and ZoomCamera functions. Neither of these are particularly complicated, but let’s take a look:

void PanCamera(Vector3 newPanPosition) {
    // Determine how much to move the camera
    Vector3 offset = cam.ScreenToViewportPoint(lastPanPosition - newPanPosition);
    Vector3 move = new Vector3(offset.x * PanSpeed, 0, offset.y * PanSpeed);
    
    // Perform the movement
    transform.Translate(move, Space.World);  
    
    // Ensure the camera remains within bounds.
    Vector3 pos = transform.position;
    pos.x = Mathf.Clamp(transform.position.x, BoundsX[0], BoundsX[1]);
    pos.z = Mathf.Clamp(transform.position.z, BoundsZ[0], BoundsZ[1]);
    transform.position = pos;

    // Cache the position
    lastPanPosition = newPanPosition;
}

void ZoomCamera(float offset, float speed) {
    if (offset == 0) {
        return;
    }

    cam.fieldOfView = Mathf.Clamp(cam.fieldOfView - (offset * speed), ZoomBounds[0], ZoomBounds[1]);
}

PanCamera takes the new position of the mouse (or finger) and creates an offset based on the previous position of the mouse (or finger). This is the distance that the mouse (again, or finger) has moved since the last time PanCamera was called, likely the previous frame. Next, a move Vector3 is constructed that takes the x and z coordinates of the offset (again, for my purposes there was no y-axis movement), and multiplies them by the PanSpeed.

Next, we use transform.Translate to move the camera in world space. Once this is executed, the camera will have actually been moved. However, we need to ensure that the camera remains within bounds, so we grab a reference to the camera’s position and Clamp the x and z coordinates within the appropriate bounds we defined above. This means that, for example, if we moved the x-axis to a value lower than BoundsX[0], we’ll replace it with BoundsX[0] ensuring it never goes too far left, and same with the upper bound ensuring we never go too far right. The same logic is then applied to the z-axis, and again since there is no y-axis movement we don’t need to worry about that. This clamped position is then applied directly to the camera transform and we have set our final camera position. All that’s left to do is to cache the newPanPosition in the lastPanPosition variable to be used during the next pan, and we’re all set.

For zooming the camera we have the ZoomCamera function, which takes an offset and a speed. The reason we take a speed parameter here is that the mouse and touch controlled zooming are significantly different, and a different speed will be applied to each. You saw that we provided the ZoomSpeedMouse constant when calling ZoomCamera from HandleMouse, and we’ll be providing the ZoomSpeedTouch constant from HandleTouch below.

This method is significantly simpler than PanCamera because we only need to modify a single property, which is the camera’s fieldOfView. The fieldOfView defines, essentially, how much the camera can see (vertically) in degrees. What’s useful about this, as you can experiment with in the Unity Editor by dragging the camera’s Field of View slider, is that a lower fieldOfView means the camera can see less vertically, and therefor appears to move closer. Likewise, the higher the fieldOfView, the further away the camera appears to. As you can see in the GIF below this effect is pretty powerful, and yet the camera never actually moves.

Unity3D Camera fieldOfView for Zooming

Anyways, knowing this, we simply Clamp the fieldOfView to a new value which is the current fieldOfView minus the offset times speed, or the lower or upper zoom bounds if the new value happens to go out of bounds.

Alright, at this point you should be ready to test out the mouse controls in the Unity Editor. Run your game and click and drag to move the camera around, scroll in and out to zoom the camera, and modify the constants defined at the top to suit your needs.

HandleTouch

Finally we’re going to implement the touch controls. I saved this for last because it’s the toughest to test since you’ll need a real device to actually test it out, which requires full application builds and results in a slower process. Since we know the panning and zooming works well on desktop, we should be able to safely assume we’ve developed some solid enough logic that implementing touch won’t take a whole lot of trial-and-error.

Here’s a look at the function:

void HandleTouch() {
    switch(Input.touchCount) {

    case 1: // Panning
        wasZoomingLastFrame = false;
        
        // If the touch began, capture its position and its finger ID.
        // Otherwise, if the finger ID of the touch doesn't match, skip it.
        Touch touch = Input.GetTouch(0);
        if (touch.phase == TouchPhase.Began) {
            lastPanPosition = touch.position;
            panFingerId = touch.fingerId;
        } else if (touch.fingerId == panFingerId && touch.phase == TouchPhase.Moved) {
            PanCamera(touch.position);
        }
        break;

    case 2: // Zooming
        Vector2[] newPositions = new Vector2[]{Input.GetTouch(0).position, Input.GetTouch(1).position};
        if (!wasZoomingLastFrame) {
            lastZoomPositions = newPositions;
            wasZoomingLastFrame = true;
        } else {
            // Zoom based on the distance between the new positions compared to the 
            // distance between the previous positions.
            float newDistance = Vector2.Distance(newPositions[0], newPositions[1]);
            float oldDistance = Vector2.Distance(lastZoomPositions[0], lastZoomPositions[1]);
            float offset = newDistance - oldDistance;

            ZoomCamera(offset, ZoomSpeedTouch);

            lastZoomPositions = newPositions;
        }
        break;
        
    default: 
        wasZoomingLastFrame = false;
        break;
    }
}

The function is really broken down into two core sections based on the number of fingers touching the screen. If only a single finger is touching, we handle panning, and if two fingers are touching, we handle zooming. Let’s break down the logic of panning first, and then move on to zooming.

For panning, we first set the wasZoomingLastFrame boolean to false which we’ll come back to shortly. Next we check the phase of the single Touch and act accordingly. If the touch began this frame, we store the position and the finger ID to be used in subsequent frames when the finger has moved. Speaking of which, we next check if the current single-touch finger matches the finger being used to pan the camera, and it if has moved we go ahead and call PanCamera, passing the finger’s position.

Next in the zooming section (where there are two fingers), we store the position of the two fingers in a Vector2[] called newPositions. If we were not zooming during the last frame we simply store the newPositions array in our lastZoomPositions array, and set wasZoomingLastFrame to true. If you recall we set this to false when we’re panning so we know to restart the zoom when there are two fingers on the screen. If we were zooming in the last frame, we calculate the distance between the fingers in the current frame, and the distance between the fingers in the previous frame, and then the offset based on these two distances. This basically tells us which direction the fingers are moving (are they getting closer together or further away). We then provide this offset along with the ZoomSpeedTouch to ZoomCamera which will handle the fieldOfView modification for us. Finally, we cache the newPositions in the lastZoomPositions variable and we’re done with the zooming logic.

Finally in the default case, meaning there are either zero fingers or more than two fingers on the screen, we simply set wasZoomingLastFrame to false.

Go ahead and run your game on a touch device (Android, iPhone, iPad, etc.) and you should find that you can pan the camera by dragging your finger, or pinch-to-zoom in and out in a very familiar fashion!

Full Source

As promised at the beginning, here is the full source code for the CameraHandler script. Simply bring the script into your Unity3D project, attach it to your camera, and tweak the bounds and speeds to suit your needs!

using UnityEngine;
using System.Collections;

public class CameraHandler : MonoBehaviour {

    private static readonly float PanSpeed = 20f;
    private static readonly float ZoomSpeedTouch = 0.1f;
    private static readonly float ZoomSpeedMouse = 0.5f;
    
    private static readonly float[] BoundsX = new float[]{-10f, 5f};
    private static readonly float[] BoundsZ = new float[]{-18f, -4f};
    private static readonly float[] ZoomBounds = new float[]{10f, 85f};
    
    private Camera cam;
    
    private Vector3 lastPanPosition;
    private int panFingerId; // Touch mode only
    
    private bool wasZoomingLastFrame; // Touch mode only
    private Vector2[] lastZoomPositions; // Touch mode only

    void Awake() {
        cam = GetComponent<Camera>();
    }
    
    void Update() {
        if (Input.touchSupported && Application.platform != RuntimePlatform.WebGLPlayer) {
            HandleTouch();
        } else {
            HandleMouse();
        }
    }
    
    void HandleTouch() {
        switch(Input.touchCount) {
    
        case 1: // Panning
            wasZoomingLastFrame = false;
            
            // If the touch began, capture its position and its finger ID.
            // Otherwise, if the finger ID of the touch doesn't match, skip it.
            Touch touch = Input.GetTouch(0);
            if (touch.phase == TouchPhase.Began) {
                lastPanPosition = touch.position;
                panFingerId = touch.fingerId;
            } else if (touch.fingerId == panFingerId && touch.phase == TouchPhase.Moved) {
                PanCamera(touch.position);
            }
            break;
    
        case 2: // Zooming
            Vector2[] newPositions = new Vector2[]{Input.GetTouch(0).position, Input.GetTouch(1).position};
            if (!wasZoomingLastFrame) {
                lastZoomPositions = newPositions;
                wasZoomingLastFrame = true;
            } else {
                // Zoom based on the distance between the new positions compared to the 
                // distance between the previous positions.
                float newDistance = Vector2.Distance(newPositions[0], newPositions[1]);
                float oldDistance = Vector2.Distance(lastZoomPositions[0], lastZoomPositions[1]);
                float offset = newDistance - oldDistance;
    
                ZoomCamera(offset, ZoomSpeedTouch);
    
                lastZoomPositions = newPositions;
            }
            break;
            
        default: 
            wasZoomingLastFrame = false;
            break;
        }
    }
    
    void HandleMouse() {
        // On mouse down, capture it's position.
        // Otherwise, if the mouse is still down, pan the camera.
        if (Input.GetMouseButtonDown(0)) {
            lastPanPosition = Input.mousePosition;
        } else if (Input.GetMouseButton(0)) {
            PanCamera(Input.mousePosition);
        }
    
        // Check for scrolling to zoom the camera
        float scroll = Input.GetAxis("Mouse ScrollWheel");
        ZoomCamera(scroll, ZoomSpeedMouse);
    }
    
    void PanCamera(Vector3 newPanPosition) {
        // Determine how much to move the camera
        Vector3 offset = cam.ScreenToViewportPoint(lastPanPosition - newPanPosition);
        Vector3 move = new Vector3(offset.x * PanSpeed, 0, offset.y * PanSpeed);
        
        // Perform the movement
        transform.Translate(move, Space.World);  
        
        // Ensure the camera remains within bounds.
        Vector3 pos = transform.position;
        pos.x = Mathf.Clamp(transform.position.x, BoundsX[0], BoundsX[1]);
        pos.z = Mathf.Clamp(transform.position.z, BoundsZ[0], BoundsZ[1]);
        transform.position = pos;
    
        // Cache the position
        lastPanPosition = newPanPosition;
    }
    
    void ZoomCamera(float offset, float speed) {
        if (offset == 0) {
            return;
        }
    
        cam.fieldOfView = Mathf.Clamp(cam.fieldOfView - (offset * speed), ZoomBounds[0], ZoomBounds[1]);
    }
}
Let me know if this post was helpful on Twitter @kylewbanks or down below!