๐Ÿ„

5.2.3. ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ

1. MERGE

์ด๋ฒˆ ์ฑ•ํ„ฐ์—์„œ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์šฐ์„  ๋ฐ์ดํ„ฐ์˜ ์ˆ˜ํ‰์  1๋Œ€ 1๊ฒฐํ•ฉ๋ถ€ํ„ฐ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์‹คํ–‰ ๊ฒฐ๊ณผ๋ฅผ ๋ฏธ๋ฆฌ ๋ณด์‹œ๊ณ  ์–ด๋–ป๊ฒŒ ๊ฒฐํ•ฉํ• ์ง€ ์ƒ๊ฐํ•ด๋ณด์„ธ์š”.
subject_test1_1
subject_test1_1
mysas.subject_test1_1
data mysas.subject_test1_1; input id name $9. class_ $ subject $ score ; datalines; 101 ์ดํ˜ธ์ค€ 1๋ฐ˜ computer 92 102 ์ดํ˜ธ์ค‘ 1๋ฐ˜ computer 80 201 ์ด๊ธธ๋™ 2๋ฐ˜ computer 90 202 ์ด์ค€ํ˜ธ 2๋ฐ˜ computer 86 ; run; proc print data=mysas.subject_test1_1; run;
 
subject_test2
subject_test2
mysas.subject_test2
data mysas.subject_test2; input id phonenumber $16.; datalines; 101 010-1234-5678 102 010-9876-5432 201 010-4567-1565 202 010-7536-9512 ; run; proc print data=mysas.subject_test2; run;
 
๊ณตํ†ต๋œ ๊ธฐ์ค€๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
data mysas.join; merge mysas.subject_test1_1 mysas.subject_test2; by id; run; proc print data=mysas.join; run;
 
๋กœ๊ทธ๋ฅผ ๋ณด์‹œ๋ฉด ์—๋Ÿฌ๊ฐ€ ๋‚˜ํƒ€๋‚  ๊ฒƒ์ž…๋‹ˆ๋‹ค. by ๋ณ€์ˆ˜๊ฐ€ ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•ด์„œ ์ •๋ ฌ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋ณ‘ํ•ฉํ•˜๊ธฐ ์ „์— ๊ธฐ์ค€๋ณ€์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์ •๋ ฌ์„ ํ•ด์ค˜์•ผํ•ฉ๋‹ˆ๋‹ค. BY ๋ณ€์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.
 
proc sort data=๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ช….๋ฐ์ดํ„ฐ์…‹; by ์ •๋ ฌํ•  ๋ณ€์ˆ˜; run;
 
proc sort data=mysas.subject_test1_1; by id; run; proc sort data=mysas.subject_test2; by id; run;
 
๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์ •๋ ฌํ•œ ํ›„์— ๋‹ค์‹œ ๊ฒฐํ•ฉ์„ ํ•˜๋ฉด 1:1 ์ˆ˜ํ‰์  ๊ฒฐํ•ฉ์ด ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ฐ€๋กœ๋กœ ๊ฒฐํ•ฉ๋œ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
 
notion image
 

2. ์กฐ์ธ์˜ ์ข…๋ฅ˜

 
w3school โ†’ ์šฐ๋ฆฌ ์ž๋ฃŒ๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•จ
w3school โ†’ ์šฐ๋ฆฌ ์ž๋ฃŒ๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•จ
 

2.1 full join

full join์€ ์ง‘ํ•ฉ์—์„œ์˜ ํ•ฉ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. ์•„๋ž˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ณ  ์–ด๋–ป๊ฒŒ ํ•ฉ์ณ์งˆ์ง€ ์ƒ๊ฐํ•ด๋ณด์‹œ๊ณ  ์‹ค์Šต์„ ํ•ด๋ณด์‹œ๊ณ  ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด ๋ณด์‹œ๋ฉด ๋ณด๋‹ค ์ดํ•ดํ•˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค.
mysas.subject_test1_1
mysas.subject_test1_1
mysas.phonehw
 
phonehw์ด๋ผ๋Š” ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
data mysas.phonehw; input id name $10. type $ number$16.; datalines; 101 ์ดํ˜ธ์ค€ company 004-050-1200 101 ์ดํ˜ธ์ค€ phone 010-1234-5678 102 ์ดํ˜ธ์ค‘ home 012-134-7894 201 ์ด๊ธธ๋™ phone 010-4567-1565 202 ์ด์ค€ํ˜ธ phone 010-7536-9512 ; run; proc print data=mysas.phonehw; run;
 
์ด์ „์— ๋งŒ๋“ค์—ˆ๋˜ mysas.subject_test1_1๊ณผ mysas.phonehw๋ฅผ ๋ณ‘ํ•ฉํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. mysas.subject_test1_1์ด ์—†์œผ์‹œ๋‹ค๋ฉด ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด์ค๋‹ˆ๋‹ค.
 
data mysas.subject_test1_1; input id name $9. class_ $ subject $ score ; datalines; 101 ์ดํ˜ธ์ค€ 1๋ฐ˜ computer 92 102 ์ดํ˜ธ์ค‘ 1๋ฐ˜ computer 80 201 ์ด๊ธธ๋™ 2๋ฐ˜ computer 90 202 ์ด์ค€ํ˜ธ 2๋ฐ˜ computer 86 ; run; proc print data=mysas.subject_test1; run;
 
๋ณ‘ํ•ฉํ•˜๊ธฐ ์ „์— ๊ผญ ์ •๋ ฌ์„ ํ•ด ์ฃผ์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.
 
proc sort data=mysas.phonehw; by id; run; proc sort data=mysas.subject_test1_1; by id; run;
 
data mysas.fulljoin; merge mysas.subject_test1_1 mysas.phonehw; by id; run; proc print data=mysas.fulljoin; run;
 
notion image
 

2.2 inner join

inner join์€ ์ง‘ํ•ฉ์—์„œ์˜ ๊ต์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค.
mysas.subject_test1_1
mysas.subject_test1_1
 
mysas.phonehw2
mysas.phonehw2
 
mysas.phonehw2 ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
data mysas.phonehw2; input id name $10. type $ number$16.; datalines; 101 ์ดํ˜ธ์ค€ company 004-050-1200 101 ์ดํ˜ธ์ค€ phone 010-1234-5678 102 ์ดํ˜ธ์ค‘ home 012-134-7894 201 ์ด๊ธธ๋™ phone 010-4567-1565 202 ์ด์ค€ํ˜ธ phone 010-7536-9512 203 ์ด์ค€๊ธธ phone 010-1563-4595 ; run; proc print data=mysas.phonehw2; run;
 
๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ณ‘ํ•ฉํ•ด ์ฃผ๊ธฐ ์ „์— ์ •๋ ฌ์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
proc sort data=mysas.phonehw2; by id; run; proc sort data=mysas.subject_test1_1; by id; run;
 
data mysas.innerjoin; merge mysas.phonehw2(in=emps) mysas.subject_test1_1(in=cell); by id; if emps=1 and cell=1; run; proc print data=mysas.innerjoin; run;
 
data mysas.innerjoin;
data mysas.innerjoin;
 
๊ฒฐ๊ณผ๋ฅผ ๋ณด์‹œ๋ฉด name ๋ณ€์ˆ˜๊ฐ€ ๋‘ ํ…Œ์ด๋ธ”์—์„œ ์ค‘๋ณต๋˜๋Š” ๊ฐ’๋“ค๋งŒ ์ถœ๋ ฅํ•ด ์ฃผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
 

2.3 right join

์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ” ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์ณ์ง‘๋‹ˆ๋‹ค.
data mysas.rightjoin; merge mysas.subject_test1_1(in=emps) mysas.phonehw2(in=cell); by id; if emps=0 and cell=1; run;
 
notion image
 
์•„๋ž˜ ๋ฐ์ดํ„ฐ์™€ ๋น„๊ตํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
 
mysas.subject_test1_1
mysas.subject_test1_1
mysas.phonehw2
mysas.phonehw2
 

2.4 left join

์™ผ์ชฝ ํ…Œ์ด๋ธ” ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์ณ์ง‘๋‹ˆ๋‹ค.
data mysas.data11; merge mysas.data3(in=emps) mysas.data9(in=cell); by name; if emps=1 or cell=0; run;
 
notion image
 

2.5 ๋‘˜๋‹ค 0์ธ ๊ฒฝ์šฐ

๋‘˜๋‹ค 0์ธ ๊ฒฝ์šฐ๋Š” ์™ผ์ชฝ ํ…Œ์ด๋ธ”๊ณผ ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ” ์ค‘ ์„œ๋กœ ๊ฒน์น˜์ง€ ์•Š๋Š” ๋ถ€๋ถ„์„ ์ถœ๋ ฅํ•ด ์ค๋‹ˆ๋‹ค. ๊ต์ง‘ํ•ฉ์˜ ๋ฐ˜๋Œ€์ธ ๋‘ ํ…Œ์ด๋ธ”์˜ ์ฐจ์ง‘ํ•ฉ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
data mysas.data11; merge mysas.data3(in=emps) mysas.data9(in=cell); by name; if emps=0 or cell=0; run;
 
notion image
 

3. SET

๋ฐ์ดํ„ฐ์˜ ์ˆ˜์ง์  1๋Œ€ 1๊ฒฐํ•ฉ (๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๊ฐ€ ๋™์ผํ•  ๋•Œ)ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
data ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.ํ†ตํ•ฉ๋ฐ์ดํ„ฐ์…‹; set ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.๋ฐ์ดํ„ฐ์…‹1 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.๋ฐ์ดํ„ฐ์…‹2; run;
 
data mysas.set1; set mysas.subject_test1_1 mysas.phonehw2; run; proc print data=mysas.set1; run;
 
notion image
์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด set์€ ์„ธ๋กœ๋กœ ์ถœ๋ ฅ๋œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
 

3.1 ์นผ๋Ÿผ๋ช…์ด ์ผ์น˜ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ

1) ๋งŒ์•ฝ ํ•ด๋‹น ์นผ๋Ÿผ์— ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๋‹ค๋ฉด, ์—†๋Š”๋Œ€๋กœ ๋นˆ ์นธ์œผ๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
2) ๋‘ ํ…Œ์ด๋ธ”์˜ ๋™์ผ ์ด๋ฆ„ ์นผ๋Ÿผ ๊ฐ„ ๋ฐ์ดํ„ฐ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ์—๋Š” SET๋ช…๋ น์–ด๋กœ ๋‘ ๊ฐœ ์ด์ƒ์˜ ํ…Œ์ด๋ธ”์„ ์ด์–ด๋ถ™์ผ ๋•Œ ๊ธฐ์ค€์ด ๋˜๋Š” ๊ฒƒ์€ ์ฒซ ๋ฒˆ์งธ๋กœ ์ง€์ •๋œ ํ…Œ์ด๋ธ”์ž…๋‹ˆ๋‹ค.
์•ž์—์„œ ์„ค๋ช…ํ–ˆ๋“ฏ์ด SAS๋Š” ๋ช…๋ น์–ด๋ฅผ ์œ„์—์„œ๋ถ€ํ„ฐ ์ฝ๊ณ  ์™ผ์ชฝ์—์„œ๋ถ€ํ„ฐ ์ฝ์Šต๋‹ˆ๋‹ค. ์ด ๊ฐ™์€ ๊ธฐ์ค€์— ๋”ฐ๋ผ SET๋ช…๋ น์–ด์—์„œ ๊ฐ€์žฅ ์œ„์— ์žˆ๋Š” ํ…Œ์ด๋ธ”์„ ๋จผ์ € ์ฝ์–ด ๋“ค์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด์— ๋”ฐ๋ผ ํ†ตํ•ฉ๋œ ํ…Œ์ด๋ธ”์˜ ์นผ๋Ÿผ NAME์€ ๊ธธ์ด๊ฐ€ 10์ธ ์ƒํƒœ๋กœ ์ƒ์„ฑ์ด ๋ฉ๋‹ˆ๋‹ค. ์ด ์ƒํ™ฉ์—์„œ ๋‘ ๋ฒˆ์งธ๋กœ ๋‚˜์˜ค๋Š” ํ…Œ์ด๋ธ” B์˜ ์นผ๋Ÿผ NAME์˜ ๊ธธ์ด๊ฐ€ 14๋ผ๋ฉด, ํ…Œ์ด๋ธ” B์˜ ๋’ค์ชฝ ๊ธธ์ด๊ฐ€ 2๋งŒํผ ์ž˜๋ฆฌ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
๊ทธ๋ ‡๊ธฐ์— ์นผ๋Ÿผ๋“ค์˜ ํ–‰์„ ์ด์–ด์ค„ ๋•Œ๋Š” ๊ธธ์ด๋ฅผ ์œ ์‹ฌํžˆ ์‚ดํ”ผ์…”์•ผ ํ•ฉ๋‹ˆ๋‹ค.
 

3.2 ์นผ๋Ÿผ์˜ ์†์„ฑ์ด ๋‹ค๋ฅธ ๊ฒฝ์šฐ

๋ณ€์ˆ˜ Gender ์ด(๊ฐ€) ๋ชจ๋‘ ๋ฌธ์ž์™€ ์ˆซ์ž๋กœ ์ •์˜๋˜์—ˆ๋‹ค๋ฉด Error๊ฐ€ ๋‚˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ๋‘˜ ์ค‘ ํ•œ ๋ณ€์ˆ˜์˜ ์ด๋ฆ„์„ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•  ๋•Œ ๋ณ€์ˆ˜์˜ ์†์„ฑ์„ ๋™์ผํ•˜๊ฒŒ ๋งŒ๋“ค๊ฑฐ๋‚˜, ์ด๋ฏธ ๋งŒ๋“ค์–ด์ง„ ํ…Œ์ด๋ธ”์˜ ์†์„ฑ์„ ๋ณ€๊ฒฝํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.