Terraform 自动化管理云资源实战:代码定义基础设施,让云运维更优雅
💡 前言:手动在控制台点鼠标创建云资源?改配置要记住之前的参数?换环境要重新来一遍?Terraform 帮你用代码管理一切基础设施,实现「基础设施即代码」。
一、为什么需要 Terraform?
1.1 传统运维的痛点
❌ 手动操作问题: ├── 控制台点鼠标,10分钟创建一个云服务器 ├── 改配置时忘记之前填了什么参数 ├── 测试环境/UAT环境/生产环境配置不一致 ├── 资源变更无法追溯,谁改了什么不知道 ├── 人员离职,知识随之流失 └── 灾备切换要重建整套环境,耗时数天
1.2 Terraform 带来的改变
✅ Terraform 优势: ├── 用代码定义云资源,版本可控 ├── 一键创建/修改/销毁整套基础设施 ├── 不同环境(dev/staging/prod)使用同一套模板 ├── 变更预览(terraform plan),确认后再执行 ├── 状态记录,清楚知道当前有多少资源 └── 团队协作,多人修改同一套基础设施
二、Terraform 核心概念
2.1 三大核心概念
| 概念 | 说明 | 类比 |
|---|---|---|
| —– | —— | —— |
| **Provider** | 云服务商插件 | 数据库驱动 |
| **Resource** | 云资源定义 | 数据库表结构 |
| **State** | 当前资源状态记录 | 数据库数据 |
工作流程: 1. 编写 HCL 配置(.tf 文件) 2. terraform init → 下载 Provider 3. terraform plan → 预览变更 4. terraform apply → 执行变更 5. terraform destroy → 销毁资源
2.2 Terraform 文件结构
terraform-project/ ├── main.tf # 主配置,定义资源 ├── variables.tf # 变量定义 ├── outputs.tf # 输出定义 ├── terraform.tfvars # 变量赋值 ├── .terraform.lock.hcl # 依赖锁定 └── terraform.tfstate # 状态文件(不要手动编辑)
三、腾讯云实战配置
3.1 安装 Terraform
Windows(使用 Scoop):
scoop install terraform
验证安装:
terraform -version # Terraform v1.7.0
3.2 配置腾讯云 Provider
# main.tf
terraform {
required_version = ">= 1.0"
required_providers {
tencentcloud = {
source = "tencentcloudstack/tencentcloud"
version = "~> 1.81"
}
}
# 使用远程状态(生产必备)
backend "cos" {
region = "ap-guangzhou"
bucket = "your-terraform-state-bucket"
prefix = "prod/terraform-state"
encrypt = true
}
}
provider "tencentcloud" {
region = var.region
# 推荐使用环境变量或 CAM 角色
# secret_id = var.secret_id
# secret_key = var.secret_key
}
# 变量定义
variable "region" {
description = "腾讯云地域"
type = string
default = "ap-guangzhou"
}
variable "environment" {
description = "环境标识"
type = string
default = "prod"
}
variable "vpc_cidr" {
description = "VPC 网段"
type = string
default = "10.0.0.0/16"
}
variable "instance_type" {
description = "云服务器规格"
type = string
default = "S5.MEDIUM2"
}
variable "db_instance_class" {
description = "数据库规格"
type = string
default = "mysql-sa2-micro-1"
}
3.3 创建 VPC 网络
# network.tf
# 虚拟私有网络
resource "tencentcloud_vpc" "main" {
name = "${var.environment}-vpc"
cidr_block = var.vpc_cidr
tags = {
Environment = var.environment
Managed = "Terraform"
}
}
# 子网 - 可用区 1
resource "tencentcloud_subnet" "subnet_az1" {
name = "${var.environment}-subnet-az1"
vpc_id = tencentcloud_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, 0)
availability_zone = "${var.region}-1"
tags = {
AZ = "Zone 1"
}
}
# 子网 - 可用区 2
resource "tencentcloud_subnet" "subnet_az2" {
name = "${var.environment}-subnet-az2"
vpc_id = tencentcloud_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, 1)
availability_zone = "${var.region}-2"
tags = {
AZ = "Zone 2"
}
}
# 路由表
resource "tencentcloud_route_table" "main" {
name = "${var.environment}-rt"
vpc_id = tencentcloud_vpc.main.id
# 关联子网
subnet_ids = [
tencentcloud_subnet.subnet_az1.id,
tencentcloud_subnet.subnet_az2.id
]
}
3.4 创建安全组
# security.tf
resource "tencentcloud_security_group" "web" {
name = "${var.environment}-sg-web"
description = "Web 服务安全组"
vpc_id = tencentcloud_vpc.main.id
# 入站规则
ingress {
protocol = "tcp"
port = "80,443"
cidr_block = "0.0.0.0/0"
description = "HTTP/HTTPS"
}
ingress {
protocol = "tcp"
port = "22"
cidr_block = "10.0.0.0/8"
description = "SSH"
}
ingress {
protocol = "all"
cidr_block = "10.0.0.0/16"
description = "内网访问"
}
# 出站规则
egress {
protocol = "all"
cidr_block = "0.0.0.0/0"
description = "允许所有出站"
}
tags = {
Type = "Web"
}
}
resource "tencentcloud_security_group" "db" {
name = "${var.environment}-sg-db"
description = "数据库安全组"
vpc_id = tencentcloud_vpc.main.id
# 仅允许内网访问
ingress {
protocol = "tcp"
port = "3306"
cidr_block = "10.0.0.0/16"
description = "MySQL"
}
ingress {
protocol = "tcp"
port = "6379"
cidr_block = "10.0.0.0/16"
description = "Redis"
}
tags = {
Type = "Database"
}
}
3.5 创建云服务器
# compute.tf
# 密钥对(创建或导入已有)
resource "tencentcloud_key_pair" "main" {
name = "${var.environment}-key"
key_info = jsondecode(tencentcloud_elasticsearch_instance.main.public_key)
# 如果要创建新密钥,取消下面这行并注释上面那行
# public_key = file("~/.ssh/id_rsa.pub")
}
# CVM 实例
resource "tencentcloud_instance" "web" {
count = 2
instance_name = "${var.environment}-web-${count.index + 1}"
availability_zone = "${var.region}-${count.index % 2 + 1}"
image_id = "img-xxxxxxxx" # 替换为实际镜像 ID
instance_type = var.instance_type
vpc_id = tencentcloud_vpc.main.id
subnet_id = count.index == 0 ? tencentcloud_subnet.subnet_az1.id : tencentcloud_subnet.subnet_az2.id
security_groups = [tencentcloud_security_group.web.id]
internet_max_bandwidth_out = 10
internet_charge_type = "TRAFFIC_POSTPAID_BY_HOUR"
system_disk_type = "CLOUD_SSD"
system_disk_size = 50
data_disk {
disk_type = "CLOUD_SSD"
disk_size = 100
}
key_id = [tencentcloud_key_pair.main.id]
tags = {
Role = "Web"
Index = count.index + 1
}
# 初始化脚本
user_data = base64encode(<<-EOF
#!/bin/bash
yum install -y nginx
systemctl start nginx
systemctl enable nginx
EOF
)
}
3.6 创建云数据库
# database.tf
resource "tencentcloud_mysql_instance" "main" {
name = "${var.environment}-mysql"
intranet_vpc_id = tencentcloud_vpc.main.id
intranet_subnet_id = tencentcloud_subnet.subnet_az1.id
engine_version = "8.0"
instance_type = var.db_instance_class
storage = 200
port = 3306
auto_renew_flag = 2 # 到期自动续费
security_groups = [tencentcloud_security_group.db.id]
tags = {
Type = "Database"
}
}
resource "tencentcloud_redis_instance" "main" {
name = "${var.environment}-redis"
vpc_id = tencentcloud_vpc.main.id
subnet_id = tencentcloud_subnet.subnet_az1.id
type_id = 2 # 集群版
ShardNum = 3
ReplicaNum = 1
memory = 4096
port = 6379
tags = {
Type = "Cache"
}
}
3.7 输出定义
# outputs.tf
output "vpc_id" {
description = "VPC ID"
value = tencentcloud_vpc.main.id
}
output "web_server_ips" {
description = "Web 服务器公网 IP"
value = tencentcloud_instance.web[*].public_ip
}
output "web_server_private_ips" {
description = "Web 服务器内网 IP"
value = tencentcloud_instance.web[*].private_ip
}
output "mysql_endpoint" {
description = "MySQL 连接地址"
value = tencentcloud_mysql_instance.main.intranet_domain
sensitive = true
}
output "redis_address" {
description = "Redis 连接地址"
value = tencentcloud_redis_instance.main.vpc_domain
sensitive = true
}
3.8 变量赋值
# terraform.tfvars region = "ap-guangzhou" environment = "prod" vpc_cidr = "10.0.0.0/16" instance_type = "S5.MEDIUM2" db_instance_class = "mysql-sa2-micro-1"
四、工作流程实战
4.1 初始化
# 初始化,加载 Provider terraform init # 输出示例: # Initializing the backend... # Initializing provider plugins... # - Downloading plugin for provider "tencentcloudstack"... # Terraform has been successfully initialized!
4.2 预览变更
# 预览将要创建的资源 terraform plan # 输出示例: # Plan: 15 to add, 0 to change, 0 to destroy. # # + tencentcloud_vpc.main # + create # + name: "prod-vpc" # + cidr_block: "10.0.0.0/16" # # + tencentcloud_instance.web[0] # + create # + instance_name: "prod-web-1" # + instance_type: "S5.MEDIUM2"
4.3 执行部署
# 确认后执行部署 terraform apply # 输入 "yes" 确认 # 等待资源创建完成... # Apply complete! Resources: 15 added.
4.4 查看状态
# 查看当前状态 terraform show # 列出所有资源 terraform state list # 查看特定资源 terraform state show tencentcloud_instance.web
4.5 修改资源
# 修改配置后,再次预览 terraform plan # 例如:将实例数从 2 改为 3 # Plan: 1 to add, 0 to change, 0 to destroy. # # + tencentcloud_instance.web[2] # + create # + instance_name: "prod-web-3" # 确认后应用 terraform apply
4.6 销毁资源
# 预览将要销毁的内容 terraform plan -destroy # 确认后销毁 terraform destroy # 输入 "yes" 确认 # Destroy complete! Resources: 15 destroyed.
五、多环境管理
5.1 环境目录结构
terraform/ ├── modules/ # 可复用模块 │ ├── vpc/ │ │ ├── main.tf │ │ ├── variables.tf │ │ └── outputs.tf │ └── instance/ │ └── ... │ ├── env/ │ ├── dev/ │ │ ├── main.tf # 引用模块 │ │ └── terraform.tfvars │ ├── staging/ │ │ ├── main.tf │ │ └── terraform.tfvars │ └── prod/ │ ├── main.tf │ └── terraform.tfvars
5.2 模块调用
# env/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
environment = "prod"
cidr_block = "10.0.0.0/16"
region = "ap-guangzhou"
}
module "web_cluster" {
source = "../../modules/instance"
environment = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.subnet_ids
instance_count = 3
instance_type = "S5.MEDIUM2"
}
六、最佳实践
6.1 状态管理
# ✅ 推荐:使用远程状态(云对象存储 COS)
terraform {
backend "cos" {
bucket = "my-terraform-state"
prefix = "prod/"
}
}
6.2 敏感信息管理
# ✅ 使用环境变量
provider "tencentcloud" {
# 不要在这里硬编码
# secret_id = "xxx" ❌
}
# ✅ 使用变量
variable "secret_id" {
sensitive = true
}
6.3 锁定依赖
# ✅ 锁定 Provider 版本 terraform init -upgrade
6.4 Workspace 隔离
# 创建工作空间 terraform workspace new prod terraform workspace new dev # 切换工作空间 terraform workspace select prod # 在不同空间使用不同后端配置
七、常见问题
Q1: State 冲突怎么办?
多人同时执行 terraform apply 可能导致状态文件冲突。
解决方案:
# 使用远程状态 + 状态锁定
terraform {
backend "cos" {
# 腾讯云 COS 自动支持状态锁定
}
}
Q2: 手动修改了云资源怎么办?
手动在控制台修改了资源,导致 Terraform 状态与实际不一致。
解决方案:
# 同步状态 terraform refresh # 或导入已有资源 terraform import tencentcloud_instance.web inst-123456
Q3: 大规模资源创建太慢?
解决方案:
# 使用 -parallelism 控制并发 terraform apply -parallelism=20
八、总结
Terraform 让云资源管理变得可控、可追溯、可协作:
1. 用代码定义基础设施 — 所有资源在 Git 中版本化管理
2. 声明式配置 — 只需描述期望状态,Terraform 自动规划执行路径
3. 预览再执行 — terraform plan 确保变更符合预期
4. 多环境支持 — dev/staging/prod 使用同一套模板
5. 团队协作 — 远程状态 + 状态锁定,支持多人同时操作
关于作者
长期关注大模型应用落地与云服务器实战,专注技术在企业场景中的落地实践。
个人博客:yunduancloud.icu —— 持续更新云计算、AI大模型实战教程,欢迎访问交流。
