Cloud mode

In this setup, the coalition server is allowed to start and delete instances so that all the jobs get done with minimum costs. Here, we install the coalition server on a dedicated cloud instance (instead of locahost). This way we simplify the network setup as we don’t need a VPN or VLAN.

Configuration

Amazon Cloud

First, an initial setup is required on the cloud provider side. We provide here a minimal working setup. It can of course be enriched by your specificid needs and policy.

  1. Amazon account

    To be allowed to manage cloud instances (ie. starting and terminating), the coalition server needs authentication.

    • from you amazon cloud account, visit the section Manage security credentials
    • get an access Keys (ID and Secret key) as text file
You might prefer to create a dedicated user instead of your global user account.
  1. Virtual Private Cloud (VPC)

    For the workers and the server to communicate securely, we use a common VPC:

    • create a new VPC
  2. Security Groups

    The coalition’s server and workers communication port defaults to 19211. The server should be accessible by the user (to interact with the API and/or web frontend) and from workers. The server can be hosted in the office or in the cloud, according to your network policy. Here, the server is instanciated in the cloud, belongs to the security group sg-coalition and the workers belong to the security group sg-worker. The workers should be accessible from the server only. So, the security groups and inbound rules are like those:

    • create a security group sg-coalition
      • Inbound Rules: TCP 19211 sg-worker
      • Inbound Rules: TCP 19211 office-public-IP
    • create security group sg- worker
      • Inbound Rules: TCP 19211 sg-coalition
  3. Setup Coalition server as a cloud instance

Now that the cloud provider has been set up, the coalition server has to be configured accordingly.

  • install a coalition server on a cloud instance as explained in Installation documentation page

  • edit the file coalition.ini in the [Server] section and set:

    servermode = aws
    
  • copy the file _cloud_aws.ini to cloud_aws.ini

  • edit the file cloud_aws.ini

The configuration file cloud_aws.ini is self-explanatory. Set the options with your own amazon parameters:

# Configuration file for aws cloud

[authentication]
# Aws ssh key pair name
keyname=
# Accesskey
accesskey=
# Secretkey
secretaccesskey=

[storage]
# Storage name
name=
# Mountpoint in the worker
mountpoint=/mnt/bucket
# Location of the guerilla installer in the storage
guerillarenderfilename=srv/guerilla_render_2.0.0a13_linux64.tar.gz
# Location of the coalition installer in the storage
coalitionfilename=srv/coalition.tar.gz

[coalition]
# Coalition server IP
ip =
# Coalition server port
port = 19211
# Maximum number of simultaneous workers
workerinstancemax=3
# Delay in seconds between creation of instances.
# This prevents massive instances creation for big list of short time jobs.
# Default is 30 seconds.
workerinstancestartdelay=30
# Minimum lifetime in seconds before allowing the termination of useless
# worker instances. Since an instance requires several minutes to start,
# this option offers the possibility of keeping instances ready even during
# a short time without jobs.
# Default is 900 seconds = 15 minutes.
workerinstanceminimumlifetime=900

[worker]
# Prefix for the new instance name
nameprefix=cloud-
spot=true
# Instance type
# https://aws.amazon.com/ec2/instance-types/
# http://www.ec2instances.info/
instancetype=m3.medium
# Aws image, for instance debian-stretch-amd64-hvm-2016-09-23-08-48-ebs
imageid=ami-2f40bd40
# Aws subnet
subnetid=
# Aws instance profile
iaminstanceprofile=
# Aws security group
securitygroupid=
availabilityzone=
[spot]
# http://docs.aws.amazon.com/cli/latest/reference/ec2/request-spot-instances.html
# https://aws.amazon.com/ec2/spot/pricing/
spotprice=10
instancecount=1
type=one-time

  1. Bucket

    As workers are instanciated on demand, they need to fetch startup configuration files somewhere. Besides, as the workers might produce some data files (for example in a renderfarm usecase), those files must be saved in a filer. We create a bucket for that:

    • create a bucket
    • prepare the startup configuration files in the bucket
      • create a directory srv
      • copy the coalition source code into the srv directory:
        • download coalition source code as a zip file (or use the git source you got while installing the server)
        • unzip the file
        • copy _coalition.ini into coalition.ini and edit the [worker] section
        • recompress and pack it as a tar compressed file
        • copy coalition.tar.gz to the bucket: srv/coalition.tar.gz
      • in this setup, we build a guerilla render cloud renderfarm, so the worker needs the guerilla render binary:
        • copy guerilla_render_2.0.0a13_linux64.tar.gz to srv/guerilla_render_2.0.0a13_linux64.tar.gz

Google cloud

  1. Google cloud account
  • login on google cloud console <https://console.cloud.google.com>
  • create a new project eg. guerilla-cloud
  • get the json key file for the service account (menu IAM & Admin > Service accounts > Options > Create keys)
  1. Networking

We want to be able to visit the coalition server web frontend, so we need to allow remote connection from our office.

  • add a firewall rule allowing office IP on port tcp:19211
  1. Setup Coalition server as a google cloud instance

Now that the cloud provider has been set up, the coalition server has to be configured accordingly.

  • install a coalition server in a compute cloud instance as explained in Installation documentation page

    • as the server will create and delete cloud instances, set the instance access scope to Allow full acees to all Cloud APIs
    • use a dedicated IP instead of an ephemeral one for permanent reachability
    • ssh access for copying coalition files can be done via google credentials:

    ssh -i ~/.ssh/google_compute_engine <coalition_server_ip>

  • edit the file coalition.ini in the [Server] section and set:

    servermode = gcloud
    
  • copy the file _cloud_gcloud.ini to cloud_gcloud.ini

  • edit the file cloud_gcloud.ini

The configuration file cloud_gcloud.ini is self-explanatory. Set the options with your own google parameters:

# Configuration file for google cloud

[authentication]
# Project name
;project=guerilla-cloud
# Location of json key file for service user got from developper interface
;keyfile=guerilla-cloud-34bf64e0149b.json
# Service account
;serviceaccount=19254862847-compute@developer.gserviceaccount.com
;scopes=default


[storage]
# Storage name
;name=guerilla-cloud-bucket
# Mountpoint in the worker
;mountpoint=/mnt/bucket
# Location of the coalition installer in the storage
;coalitionpackage=srv/coalition.tar.gz

[coalition]
# Coalition server IP
;ip = 10.132.0.2
# Coalition server port
;port = 19211
# Maximum number of simultaneous workers
;workerinstancemax=3
# Delay in seconds between creation of instances.
# This prevents massive instances creation for big list of short time jobs.
# Default is 30 seconds.
;workerinstancestartdelay=30
# Minimum lifetime in seconds before allowing the termination of useless
# worker instances. Since an instance requires several minutes to start,
# this option offers the possibility of keeping instances ready even during
# a short time without jobs.
# Default is 900 seconds = 15 minutes.
;workerinstanceminimumlifetime=900

[main_program]
;package=srv/guerilla_render_2.0.0a13_linux64.tar.gz
;environment=GUERILLA=/usr/local/bin/guerillarender/data/usr/local/guerilla GUERILLA_CLOUD_ROOT=/mnt/bucket

[worker]
# Install dir fr coalition and main program
;installdir=/usr/local/bin
# Prefix for the new instance name
;nameprefix=cloud-
;zone=europe-west1-d
;machinetype=f1-micro
;subnet=default
;preemptible=true
# maintenancepolicy must be TERMINATE if preemptible is true
;maintenancepolicy=TERMINATE
;image=debian-8-jessie-v20170308
;imageproject=debian-cloud
;bootdisksize=10
;bootdisktype=pd-standard

  1. Storage

As workers are instanciated on demand, they need to fetch startup configuration files somewhere. Besides, as the workers might produce some data files (for example in a renderfarm usecase), those files must be saved in a filer. We create a bucket for that:

  • create a bucket
  • prepare the startup configuration files in the bucket
    • create a directory srv
    • copy the coalition source code into the srv directory:
      • download coalition source code as a zip file (or use the git source you got while installing the server)
      • unzip the file
      • copy the service user json key file into the coalition directory
      • copy _coalition.ini into coalition.ini and edit the [worker] section
      • recompress and pack it as a tar compressed file
      • copy coalition.tar.gz to the bucket: srv/coalition.tar.gz
    • in this setup, we build a guerilla render cloud renderfarm, so the worker needs the guerilla render binary:
      • copy guerilla_render_2.0.0a13_linux64.tar.gz to srv/guerilla_render_2.0.0a13_linux64.tar.gz

Running coalition

The coalition server is now ready to manage workers in the cloud:

  • start the server
  • visit the web interface http://<server_adress>:19211
  • add affinities
  • add some jobs

Workers will automagically be instanciated, getting jobs, working and terminated according to the configuration until there are no more jobs in waiting state on the server.

Changing coalition server or worker configuration while running

On the server instance, edit the concerned configuration files coalition.py and cloud_<cloud_provider>.py and restart the server.

As the configuration for workers is set up in the bucket, edit the configuration file coalition.py and re-upload the coalition.tar.gz to the bucket. Newly started instances will immediately use the new configuration. You might want to manually terminate previous instances. The coalition server does not need restarting in this case since the file names in the bucket are unchanged.

Monitoring the cloud deployment

The coalition server limits the number of simultaneous instances to the configuration parameter workerinstancemax in coalition.ini. But if there is a configuration problem (for instance in the workers starting scripts located in the bucket), coalition server might not be reached by the workers. In this case, coalition server will keep starting instances. So, as long as the configuration is not confirmed, you are advised to check in your cloud provider console the the effective number of starting instances. Some limits can also be setup directly in the cloud provider preventing any excessive cloud usage.

On the web frontend, in the workers tab, clicking the button Terminate destroys the selected instances after confirmation.

Additional documentation for programmers

python cloud module

cloud.common

This module contains functions common to various cloud providers.

cloud.common.createWorkerInstanceName(prefix)

Return a unique name based on prefix and timestamp.

cloud.aws

This module provides functions used for aws service.

cloud.aws.startInstance(name, config)

Run the aws command to start a worker instance. Return the created instanceid in case of dedicated ec2 instance or the spotinstancerequestid in case of a spot instance.

cloud.aws.stopInstance(name, config)

Run the aws command to terminate the instance.

Amazon specific templates

cloud/aws_worker_cloud_init.template

#cloud-config

# This cloud-init template is used for aws workers's startup configuration.
# http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
# http://cloudinit.readthedocs.io/

fqdn: $hostname

repo_update: true
repo_upgrade: all

packages:
  - python2.7
  - python-pip
  - python-twisted
  - python-twisted-web
  - python-mysqldb
  - curl
  - s3fs

runcmd:
  - pip install awscli
  - AWS_ACCESS_KEY_ID=$access_key AWS_SECRET_ACCESS_KEY=$secret_access_key aws ec2 --region $region create-tags --resources $$(curl http://instance-data/latest/meta-data/instance-id) --tags Key=Name,Value=$hostname
  - mkdir -p $mount_point
  - chmod a+w $mount_point
  - echo $bucket_name:$access_key:$secret_access_key > /etc/passwd-s3fs
  - chmod 0640 /etc/passwd-s3fs
  - s3fs -o url=https://s3.amazonaws.com,enable_content_md5 $bucket_name $mount_point
  - cat $mount_point/$guerilla_render_filename | tar xzf - -C /tmp/
  - mv /tmp/guerillarender/data/usr/local/guerilla /usr/local/bin/
  - rm -rf /tmp/guerillarender
  - cat $mount_point/$coalition_filename | tar xzf - -C /tmp/
  - mv /tmp/coalition /usr/local/bin/
  - GUERILLA=/usr/local/bin/guerilla GUERILLA_CLOUD_ROOT=$mount_point /usr/bin/python2.7 /usr/local/bin/coalition/worker.py http://$coalition_server_ip:$coalition_server_port

cloud/aws_worker_spot_launchspecification.json.template

{
  "ImageId": "$image_id",
  "KeyName": "$keyname",
  "SecurityGroupIds": [ "$security_group_id" ],
  "InstanceType": "$instance_type",
  "UserData": "$user_data"
}