[SQL]字段打乱脚本

PgSQL,字段打乱脚本

需求场景

  • 随着AI场景的广泛应用,或是数据分级分类开发等场景的需求,需要在保证数据安全的同时,又尽可能的保留数据的原有特征以供AI进行数据学习分析或是外部人员进行数据治理的脚本开发。
  • 最简单且常用的方法即是将数据表的各个字段进行随机打乱,再进行重新排序并截取其部分数据。可在保证单个字段保持其自有特征和内部逻辑的情况下,实现数据记录的内容混淆,保证数据内容的不外泄。

代码逻辑

  1. 生成自增序列,序列长度与表记录数一致
  2. 依次将随机数与各个目标字段进行拼接
  3. 依次对随机数进行递增/递减排序,进而打乱目标字段
  4. 截取期望数量的记录数,并对各个字段进行自增序列编号
  5. 利用自增序列关联各个目标字段,形成最终的乱序目标表

代码实现

总体代码

select 
    xzcfxwbh.xzcfxwbh,
    jlwybs.jlwybs,
    xm.xm
from 
    (select row_number() over() as abc from spfjbasepop.b01_hyxxb) abc  
left join 
    (select xzcfxwbh as xzcfxwbh,row_number() over() as rn from
        (select xzcfxwbh,random() as rand from spfjbasepop.b01_xzcfxwxx order by rand desc )
    limit 1000 
    ) xzcfxwbh 
on abc.abc=xzcfxwbh.rn 

left join 
    (select jlwybs as jlwybs,row_number() over() as rn from
        (select jlwybs,random() as rand from spfjbasepop.b01_xzcfxwxx order by rand desc )
    limit 1000 
    ) jlwybs 
on abc.abc=jlwybs.rn 

left join 
    (select xm as xm,row_number() over() as rn from
        (select xm,random() as rand from spfjbasepop.b01_xzcfxwxx order by rand desc )
    limit 1000 
    ) xm 
on abc.abc=xm.rn 

代码分解

  1. 生成自增序列
select 
    row_number() over() as abc 
from spfjbasepop.b01_hyxxb
  1. 拼接随机数并排序
select 
    xzcfxwbh
    ,random() as rand 
from spfjbasepop.b01_xzcfxwxx 
order by rand desc
  1. 为乱序数据编号
 select 
    xzcfxwbh as xzcfxwbh
    ,row_number() over() as rn 
from
        (select xzcfxwbh,random() as rand from spfjbasepop.b01_xzcfxwxx order by rand desc )
limit 1000 

自动化脚本

库名 表名 字段名 限制字段数 最终脚本
spfjbasepop B01_XZCFXWXX XZCFXWBH 1000 =CONCAT("left join (select ",C2," as ",C2,",row_number() over() as rn from(select ",C2,",random() as rand from ",A2,".",B2," order by rand desc )limit ",$D$2," ) ",C2," on abc.abc=",C2,".rn") =LOWER(TEXTJOIN(" ",TRUE,CONCAT("select ",TEXTJOIN(",",TRUE,F:F)," from (select row_number() over() as abc from spfjbasepop.B01_HYXXB) abc "),E:E))