遗传算法-句子配对问题


2022-01-03 机器学习

# 问题描述

给定一个目标的句子,让机器通过遗传算法经过有限次学习得到我们目标的句子。

# 适应度算法

计算在有多少字母已匹配

def get_fitness(self):                      # 计算种群的适应度,count how many character matches
        match_count = (self.pop == TARGET_ASCII).sum(axis=1)
        return match_count

# 选择

每次迭代后选择适应度大的个体保留下来

def select(self):
        fitness = self.get_fitness() + 1e-4     # 选择,add a small amount to avoid all zero fitness
        idx = np.random.choice(np.arange(self.pop_size), size=self.pop_size, replace=True, p=fitness/fitness.sum())
        return self.pop[idx]

# 交叉

使用单点交叉

def crossover(self, parent, pop):    # 交叉
        if np.random.rand() < self.cross_rate:
            i_ = np.random.randint(0, self.pop_size, size=1)                        # select another individual from pop
            cross_points = np.random.randint(0, 2, self.DNA_size).astype(bool)      # choose crossover points
            parent[cross_points] = pop[i_, cross_points]                            # mating and produce one child
        return parent

# 变异

当我们通过交叉生成了一条新的染色体后,需要在新染色体上随机选择若干个基因,然后随机修改基因的值,从而给现有的染色体引入了新的基因,突破了当前搜索的限制,更有利于算法寻找到全局最优解

def mutate(self, child):      # 变异
        for point in range(self.DNA_size):
            if np.random.rand() < self.mutate_rate:
                child[point] = np.random.randint(*self.DNA_bound)  # choose a random ASCII index
        return child

# 进化

定义一个进化过程,进化 = 选择 + 交叉 + 变异

def evolve(self):    # 进化 = 选择 + 交叉 + 进化
        pop = self.select()
        pop_copy = pop.copy()
        for parent in pop:  # for every parent
            child = self.crossover(parent, pop_copy)
            child = self.mutate(child)
            parent[:] = child
        self.pop = pop

# 完整代码

import numpy as np

TARGET_PHRASE = 'I am Anrunlu'       # target DNA   目标 DNA
POP_SIZE = 300                      # population size   种群规模
CROSS_RATE = 0.4                    # mating probability (DNA crossover) 交叉概率
MUTATION_RATE = 0.01                # mutation probability   变异概率
N_GENERATIONS = 1000                # 迭代代数

DNA_SIZE = len(TARGET_PHRASE)     # 计算 DNA 长度
TARGET_ASCII = np.fromstring(TARGET_PHRASE, dtype=np.uint8)  # 将字符转为数字,按 ASCII 编码方式
ASCII_BOUND = [32, 126]


class GA(object):
    def __init__(self, DNA_size, DNA_bound, cross_rate, mutation_rate, pop_size):
        self.DNA_size = DNA_size
        DNA_bound[1] += 1
        self.DNA_bound = DNA_bound
        self.cross_rate = cross_rate
        self.mutate_rate = mutation_rate
        self.pop_size = pop_size

        self.pop = np.random.randint(*DNA_bound, size=(pop_size, DNA_size)).astype(np.int8)  # int8 for convert to ASCII

    def translateDNA(self, DNA):                 # convert to readable string
        return DNA.tobytes().decode('ascii')

    def get_fitness(self):                      # 计算种群的适应度,count how many character matches
        match_count = (self.pop == TARGET_ASCII).sum(axis=1)
        return match_count

    def select(self):
        fitness = self.get_fitness() + 1e-4     # 选择,add a small amount to avoid all zero fitness
        idx = np.random.choice(np.arange(self.pop_size), size=self.pop_size, replace=True, p=fitness/fitness.sum())
        return self.pop[idx]

    def crossover(self, parent, pop):    # 交叉
        if np.random.rand() < self.cross_rate:
            i_ = np.random.randint(0, self.pop_size, size=1)                        # select another individual from pop
            cross_points = np.random.randint(0, 2, self.DNA_size).astype(bool)      # choose crossover points
            parent[cross_points] = pop[i_, cross_points]                            # mating and produce one child
        return parent

    def mutate(self, child):      # 变异
        for point in range(self.DNA_size):
            if np.random.rand() < self.mutate_rate:
                child[point] = np.random.randint(*self.DNA_bound)  # choose a random ASCII index
        return child

    def evolve(self):    # 进化 = 选择 + 交叉 + 进化
        pop = self.select()
        pop_copy = pop.copy()
        for parent in pop:  # for every parent
            child = self.crossover(parent, pop_copy)
            child = self.mutate(child)
            parent[:] = child
        self.pop = pop

if __name__ == '__main__':
    ga = GA(DNA_size=DNA_SIZE, DNA_bound=ASCII_BOUND, cross_rate=CROSS_RATE,
            mutation_rate=MUTATION_RATE, pop_size=POP_SIZE)

    for generation in range(N_GENERATIONS):
        
        # 计算适应度,打印出每次迭代的结果
        fitness = ga.get_fitness()
        best_DNA = ga.pop[np.argmax(fitness)]
        best_phrase = ga.translateDNA(best_DNA)
        print('Gen', generation, ': ', best_phrase)
        
        # 如果得到结果,退出
        if best_phrase == TARGET_PHRASE:
            break
            
        # 进行进化
        ga.evolve()

# 运行结果

种群经过 35 次进化,最终得到了我们想要的 DNA

# 总结

首先,再简单说一下我理解的遗传算法。达尔文的生物进化论告诉我们,自然界中的生物都是在不断进化的,遵循“优胜劣汰,适者生存”的原则,遗传算法正是通过模拟生物的进化过程而将计算结果不断优化的一种生物智能算法。

“纸上得来终觉浅,绝知此事要躬行。” 这次通过 句子配对 问题,我亲手实践了遗传算法解决问题的整个过程,使得我对遗传算法各个过程各个阶段的作用理解更加深刻,希望在以后的计算问题中能够更好的尝试和运用遗传算法。

最后,我想说我真的很佩服研究计算机算法的科学家们,他们这种跨学科融合做研究的方法和创新的精神很值得我去学习,尤其是在数据和计算问题种类日益多样化的今天,培养跨学科、多元化的思维模型一定会提升我解决问题的能力,加油。

Last Updated: 1/4/2022, 2:22:33 PM