Terraform 自动化管理云资源实战:代码定义基础设施,让云运维更优雅


阿里云推广

Terraform 自动化管理云资源实战:代码定义基础设施,让云运维更优雅

💡 前言:手动在控制台点鼠标创建云资源?改配置要记住之前的参数?换环境要重新来一遍?Terraform 帮你用代码管理一切基础设施,实现「基础设施即代码」。

一、为什么需要 Terraform?

1.1 传统运维的痛点

❌ 手动操作问题:
├── 控制台点鼠标,10分钟创建一个云服务器
├── 改配置时忘记之前填了什么参数
├── 测试环境/UAT环境/生产环境配置不一致
├── 资源变更无法追溯,谁改了什么不知道
├── 人员离职,知识随之流失
└── 灾备切换要重建整套环境,耗时数天

1.2 Terraform 带来的改变

✅ Terraform 优势:
├── 用代码定义云资源,版本可控
├── 一键创建/修改/销毁整套基础设施
├── 不同环境(dev/staging/prod)使用同一套模板
├── 变更预览(terraform plan),确认后再执行
├── 状态记录,清楚知道当前有多少资源
└── 团队协作,多人修改同一套基础设施

二、Terraform 核心概念

2.1 三大核心概念

概念 说明 类比
—– —— ——
**Provider** 云服务商插件 数据库驱动
**Resource** 云资源定义 数据库表结构
**State** 当前资源状态记录 数据库数据
工作流程:
1. 编写 HCL 配置(.tf 文件)
2. terraform init → 下载 Provider
3. terraform plan → 预览变更
4. terraform apply → 执行变更
5. terraform destroy → 销毁资源

2.2 Terraform 文件结构

terraform-project/
├── main.tf           # 主配置,定义资源
├── variables.tf      # 变量定义
├── outputs.tf       # 输出定义
├── terraform.tfvars # 变量赋值
├── .terraform.lock.hcl  # 依赖锁定
└── terraform.tfstate    # 状态文件(不要手动编辑)

三、腾讯云实战配置

3.1 安装 Terraform

Windows(使用 Scoop):

scoop install terraform

验证安装:

terraform -version
# Terraform v1.7.0

3.2 配置腾讯云 Provider

# main.tf

terraform {
  required_version = ">= 1.0"
  
  required_providers {
    tencentcloud = {
      source  = "tencentcloudstack/tencentcloud"
      version = "~> 1.81"
    }
  }
  
  # 使用远程状态(生产必备)
  backend "cos" {
    region      = "ap-guangzhou"
    bucket      = "your-terraform-state-bucket"
    prefix      = "prod/terraform-state"
    encrypt     = true
  }
}

provider "tencentcloud" {
  region = var.region
  
  # 推荐使用环境变量或 CAM 角色
  # secret_id  = var.secret_id
  # secret_key = var.secret_key
}

# 变量定义
variable "region" {
  description = "腾讯云地域"
  type        = string
  default     = "ap-guangzhou"
}

variable "environment" {
  description = "环境标识"
  type        = string
  default     = "prod"
}

variable "vpc_cidr" {
  description = "VPC 网段"
  type        = string
  default     = "10.0.0.0/16"
}

variable "instance_type" {
  description = "云服务器规格"
  type        = string
  default     = "S5.MEDIUM2"
}

variable "db_instance_class" {
  description = "数据库规格"
  type        = string
  default     = "mysql-sa2-micro-1"
}

3.3 创建 VPC 网络

# network.tf

# 虚拟私有网络
resource "tencentcloud_vpc" "main" {
  name       = "${var.environment}-vpc"
  cidr_block = var.vpc_cidr
  
  tags = {
    Environment = var.environment
    Managed     = "Terraform"
  }
}

# 子网 - 可用区 1
resource "tencentcloud_subnet" "subnet_az1" {
  name              = "${var.environment}-subnet-az1"
  vpc_id            = tencentcloud_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, 0)
  availability_zone = "${var.region}-1"
  
  tags = {
    AZ = "Zone 1"
  }
}

# 子网 - 可用区 2
resource "tencentcloud_subnet" "subnet_az2" {
  name              = "${var.environment}-subnet-az2"
  vpc_id            = tencentcloud_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, 1)
  availability_zone = "${var.region}-2"
  
  tags = {
    AZ = "Zone 2"
  }
}

# 路由表
resource "tencentcloud_route_table" "main" {
  name = "${var.environment}-rt"
  vpc_id = tencentcloud_vpc.main.id
  
  # 关联子网
  subnet_ids = [
    tencentcloud_subnet.subnet_az1.id,
    tencentcloud_subnet.subnet_az2.id
  ]
}

3.4 创建安全组

# security.tf

resource "tencentcloud_security_group" "web" {
  name        = "${var.environment}-sg-web"
  description = "Web 服务安全组"
  vpc_id      = tencentcloud_vpc.main.id
  
  # 入站规则
  ingress {
    protocol    = "tcp"
    port        = "80,443"
    cidr_block  = "0.0.0.0/0"
    description = "HTTP/HTTPS"
  }
  
  ingress {
    protocol    = "tcp"
    port        = "22"
    cidr_block  = "10.0.0.0/8"
    description = "SSH"
  }
  
  ingress {
    protocol    = "all"
    cidr_block  = "10.0.0.0/16"
    description = "内网访问"
  }
  
  # 出站规则
  egress {
    protocol    = "all"
    cidr_block  = "0.0.0.0/0"
    description = "允许所有出站"
  }
  
  tags = {
    Type = "Web"
  }
}

resource "tencentcloud_security_group" "db" {
  name        = "${var.environment}-sg-db"
  description = "数据库安全组"
  vpc_id      = tencentcloud_vpc.main.id
  
  # 仅允许内网访问
  ingress {
    protocol    = "tcp"
    port        = "3306"
    cidr_block  = "10.0.0.0/16"
    description = "MySQL"
  }
  
  ingress {
    protocol    = "tcp"
    port        = "6379"
    cidr_block  = "10.0.0.0/16"
    description = "Redis"
  }
  
  tags = {
    Type = "Database"
  }
}

3.5 创建云服务器

# compute.tf

# 密钥对(创建或导入已有)
resource "tencentcloud_key_pair" "main" {
  name     = "${var.environment}-key"
  key_info = jsondecode(tencentcloud_elasticsearch_instance.main.public_key)
  
  # 如果要创建新密钥,取消下面这行并注释上面那行
  # public_key = file("~/.ssh/id_rsa.pub")
}

# CVM 实例
resource "tencentcloud_instance" "web" {
  count = 2
  
  instance_name     = "${var.environment}-web-${count.index + 1}"
  availability_zone = "${var.region}-${count.index % 2 + 1}"
  image_id          = "img-xxxxxxxx"  # 替换为实际镜像 ID
  instance_type     = var.instance_type
  
  vpc_id            = tencentcloud_vpc.main.id
  subnet_id         = count.index == 0 ? tencentcloud_subnet.subnet_az1.id : tencentcloud_subnet.subnet_az2.id
  
  security_groups = [tencentcloud_security_group.web.id]
  
  internet_max_bandwidth_out = 10
  internet_charge_type       = "TRAFFIC_POSTPAID_BY_HOUR"
  
  system_disk_type = "CLOUD_SSD"
  system_disk_size = 50
  
  data_disk {
    disk_type = "CLOUD_SSD"
    disk_size = 100
  }
  
  key_id = [tencentcloud_key_pair.main.id]
  
  tags = {
    Role = "Web"
    Index = count.index + 1
  }
  
  # 初始化脚本
  user_data = base64encode(<<-EOF
              #!/bin/bash
              yum install -y nginx
              systemctl start nginx
              systemctl enable nginx
              EOF
  )
}

3.6 创建云数据库

# database.tf

resource "tencentcloud_mysql_instance" "main" {
  name              = "${var.environment}-mysql"
  intranet_vpc_id   = tencentcloud_vpc.main.id
  intranet_subnet_id = tencentcloud_subnet.subnet_az1.id
  
  engine_version    = "8.0"
  instance_type     = var.db_instance_class
  storage           = 200
  
  port              = 3306
  auto_renew_flag   = 2  # 到期自动续费
  
  security_groups   = [tencentcloud_security_group.db.id]
  
  tags = {
    Type = "Database"
  }
}

resource "tencentcloud_redis_instance" "main" {
  name              = "${var.environment}-redis"
  vpc_id            = tencentcloud_vpc.main.id
  subnet_id         = tencentcloud_subnet.subnet_az1.id
  
  type_id           = 2  # 集群版
  ShardNum          = 3
  ReplicaNum        = 1
  
  memory            = 4096
  port              = 6379
  
  tags = {
    Type = "Cache"
  }
}

3.7 输出定义

# outputs.tf

output "vpc_id" {
  description = "VPC ID"
  value       = tencentcloud_vpc.main.id
}

output "web_server_ips" {
  description = "Web 服务器公网 IP"
  value       = tencentcloud_instance.web[*].public_ip
}

output "web_server_private_ips" {
  description = "Web 服务器内网 IP"
  value       = tencentcloud_instance.web[*].private_ip
}

output "mysql_endpoint" {
  description = "MySQL 连接地址"
  value       = tencentcloud_mysql_instance.main.intranet_domain
  sensitive   = true
}

output "redis_address" {
  description = "Redis 连接地址"
  value       = tencentcloud_redis_instance.main.vpc_domain
  sensitive   = true
}

3.8 变量赋值

# terraform.tfvars

region      = "ap-guangzhou"
environment = "prod"
vpc_cidr    = "10.0.0.0/16"
instance_type = "S5.MEDIUM2"
db_instance_class = "mysql-sa2-micro-1"

四、工作流程实战

4.1 初始化

# 初始化,加载 Provider
terraform init

# 输出示例:
# Initializing the backend...
# Initializing provider plugins...
# - Downloading plugin for provider "tencentcloudstack"...
# Terraform has been successfully initialized!

4.2 预览变更

# 预览将要创建的资源
terraform plan

# 输出示例:
# Plan: 15 to add, 0 to change, 0 to destroy.
#
# + tencentcloud_vpc.main
#   + create
#   + name: "prod-vpc"
#   + cidr_block: "10.0.0.0/16"
#
# + tencentcloud_instance.web[0]
#   + create
#   + instance_name: "prod-web-1"
#   + instance_type: "S5.MEDIUM2"

4.3 执行部署

# 确认后执行部署
terraform apply

# 输入 "yes" 确认
# 等待资源创建完成...
# Apply complete! Resources: 15 added.

4.4 查看状态

# 查看当前状态
terraform show

# 列出所有资源
terraform state list

# 查看特定资源
terraform state show tencentcloud_instance.web

4.5 修改资源

# 修改配置后,再次预览
terraform plan

# 例如:将实例数从 2 改为 3
# Plan: 1 to add, 0 to change, 0 to destroy.
#
# + tencentcloud_instance.web[2]
#   + create
#   + instance_name: "prod-web-3"

# 确认后应用
terraform apply

4.6 销毁资源

# 预览将要销毁的内容
terraform plan -destroy

# 确认后销毁
terraform destroy

# 输入 "yes" 确认
# Destroy complete! Resources: 15 destroyed.

五、多环境管理

5.1 环境目录结构

terraform/
├── modules/              # 可复用模块
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── instance/
│       └── ...
│
├── env/
│   ├── dev/
│   │   ├── main.tf       # 引用模块
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       └── terraform.tfvars

5.2 模块调用

# env/prod/main.tf

module "vpc" {
  source = "../../modules/vpc"
  
  environment = "prod"
  cidr_block  = "10.0.0.0/16"
  region      = "ap-guangzhou"
}

module "web_cluster" {
  source = "../../modules/instance"
  
  environment = "prod"
  vpc_id      = module.vpc.vpc_id
  subnet_ids  = module.vpc.subnet_ids
  
  instance_count = 3
  instance_type = "S5.MEDIUM2"
}

六、最佳实践

6.1 状态管理

# ✅ 推荐:使用远程状态(云对象存储 COS)
terraform {
  backend "cos" {
    bucket = "my-terraform-state"
    prefix = "prod/"
  }
}

6.2 敏感信息管理

# ✅ 使用环境变量
provider "tencentcloud" {
  # 不要在这里硬编码
  # secret_id = "xxx"  ❌
}

# ✅ 使用变量
variable "secret_id" {
  sensitive = true
}

6.3 锁定依赖

# ✅ 锁定 Provider 版本
terraform init -upgrade

6.4 Workspace 隔离

# 创建工作空间
terraform workspace new prod
terraform workspace new dev

# 切换工作空间
terraform workspace select prod

# 在不同空间使用不同后端配置

七、常见问题

Q1: State 冲突怎么办?

多人同时执行 terraform apply 可能导致状态文件冲突。

解决方案:

# 使用远程状态 + 状态锁定
terraform {
  backend "cos" {
    # 腾讯云 COS 自动支持状态锁定
  }
}

Q2: 手动修改了云资源怎么办?

手动在控制台修改了资源,导致 Terraform 状态与实际不一致。

解决方案:

# 同步状态
terraform refresh

# 或导入已有资源
terraform import tencentcloud_instance.web inst-123456

Q3: 大规模资源创建太慢?

解决方案:

# 使用 -parallelism 控制并发
terraform apply -parallelism=20

八、总结

Terraform 让云资源管理变得可控、可追溯、可协作:

1. 用代码定义基础设施 — 所有资源在 Git 中版本化管理

2. 声明式配置 — 只需描述期望状态,Terraform 自动规划执行路径

3. 预览再执行terraform plan 确保变更符合预期

4. 多环境支持 — dev/staging/prod 使用同一套模板

5. 团队协作 — 远程状态 + 状态锁定,支持多人同时操作


关于作者

长期关注大模型应用落地与云服务器实战,专注技术在企业场景中的落地实践。

个人博客:yunduancloud.icu —— 持续更新云计算、AI大模型实战教程,欢迎访问交流。

发表评论