Automate Your GCP VM Instance Program

Sadman Kabir Soumik
Geek Culture
Published in
6 min readJul 3, 2022

--

If you have any script located in the Google Cloud’s VM Instance that needs to run every day/week/month at a particular time, and it’s a very long-running process (e.g. a data pipeline for Machine Learning model training, data crawlers, etc.), then this article will guide you to automate the task.

Photo by Alex Knight on Unsplash

Problem Scenario

Your script/program can be in any language, but for simplicity, let’s assume we are trying to automate a Python script.

Let’s say, you are working on Google Compute Engine’s VM instance. You have a Python script main.pythat does some specific task (e.g., scrape data from multiple sites, train Machine Learning model, etc.). You need to do this same task (running the script) every Friday at 12:00 AM, then you have a couple of options to automate this process.

  1. Use cloud functions with pub/sub topics.
  2. Add a startup-scriptin your VM instance to run the program automatically.

As mentioned above, this article focuses on long-running processes; I will explain how long-running processes can be solved. You can consider the first option if your job takes less than 540 seconds. Otherwise, it’s ideal to go with the second approach. The first option won’t work for the long-running processes because the Google Cloud Functions can run a maximum of 940 seconds¹.

But, what is a ‘startup-script’?

A startup script is a file that performs tasks during the startup process of a virtual machine (VM) instance. Startup scripts can apply to all VMs in a project or to a single VM⁴.

Configure the VM Instance and Environment

If you try to add a startup-script to your VM instance², your startup-script will run in the root user mode.

When you connect your VM instance remotely with an SSH connection to your local machine, you log in as a different user from the root mode. You can enter into the root user mode by the following command:

$ sudo su -

This command will take you to the root user mode. You will not find your coding resources in root mode, which you probably worked on while connecting the VM instance via SSH to your local machine. So, I suggest maintaining a git repo to maintain your code and cloning the git repo in the root user. You should also install program dependencies into the root user. For example, if your program runs on Anaconda Virtual Environment, then install Anaconda in the root user. Ensure you can run your program correctly by logging into the root user mode. Let’s say you can perfectly run your program by the following command in your virtual environment:

$ python main.py

There might be different python interpreters existing in your VM. Just make sure which python interpreter you are using to run your program, you can run the following command in the terminal to see which interpreter you are using. If you are working inside a virtual environment, then activate it, and run the following command.

$ which python

This will give your python interpreter path, like usr/bin/python3 .

The Startup Script

To attach a startup-script in your instance, go to the Compute Engine > VM Instances page. Then click on your target VM Instance. Now, Edit the instance.

Screenshot 1: VM instance page

Click on Edit. after that, scroll down the page, and you will find a section called ‘Metadata’. There, under the Automation section, you can add your startup-script .

Screenshot 2: Startup-script section

For example: let’s say our driver code exists in the /root/data_pipeline/src/main.py file. So, to run the program automatically when VM starts, we can write the following startup-script .

#! /bin/bash

/usr/bin/python3 /root/data_pipeline/src/main.py

We want to run the program to keep running without any kind of interruption, even if the configuration changes in some other VMs and the main GCP project. To run the program uninterruptedly, we should enable shielded-learn-integrity-policy policy³. To do that, we can add the following command before calling the main.py script.

#! /bin/bashgcloud compute instances update <instance-name> --zone <instance-zone-name> --shielded-learn-integrity-policy/usr/bin/python3 /root/data_pipeline/src/main.py

When the main.py program execution gets completed, we want our VM instance gets stopped automatically. To do that, we can add the following command at the end of the startup-script :

#! /bin/bashgcloud compute instances update <your-instance-name> --zone <instance-zone-name> --shielded-learn-integrity-policy/usr/bin/python3 /root/data_pipeline/src/main.pygcloud compute instances stop <your-instance-name> --zone <instance-zone-name>

You can also add other commands in thestartup-script as per your requirements. For example:

#! /bin/bashsudo service tor restartgcloud compute instances update <your-instance-name> --zone <instance-zone-name> --shielded-learn-integrity-policyulimit -n 100000/usr/bin/python3 /root/data_pipeline/src/main.pygcloud compute instances stop <your-instance-name> --zone <instance-zone-name>

The above script makes our program to gets executed automatically when the VM instance gets started and stops the VM instance after the program execution gets completed.

Now, we need to schedule the instance to automatically starts the VM instance.

Schedule the VM instance

Go to the Navigation menu > Compute Engine > VM Instances page. There is a section called INSTANCE SCHEDULES .

Screenshot 3: Instance Schedules page on GCP UI

Go to that tab. Now, you will find a section in the top bar called CREATE SCHEDULE

Screenshot 4: Create Schedules page on GCP UI

If you click on the Create Schedule, you will see a page like the below:

Screenshot 5: Scheduler form to submit

Give any name you want. The region area must match the VM instance’s region. Define a Start time and Frequency (like daily / weekly / monthly, etc.). You don’t need to define any Stop time, as we already have a command to stop the instance automatically after our main.py program ends its execution in the startup-script .

You can also use CRON expressions to define the start time for instance. Finally, submit the page. This will create a scheduler page. Go to that page, and you will find options to add instances in it.

Screenshot 6: Adding your instance to schedulerScreenshot 1

Just add your target instance to it, and you’re done.

This will allow your instance to automatically start at your defined start time on the scheduled page, call the main.py file from the startup-script , after the program finishes its execution, the VM instance will be automatically turned off.

--

--

Sadman Kabir Soumik
Geek Culture

Artificial Intelligence | Cloud Computing | Back-End Engineering ☕️☕️