์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(Recurrent Neural Network, RNN)

์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(RNN) RNN์ด ์ˆœ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ '๊ธฐ์–ต'ํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š”์ง€ ์›๋ฆฌ๋ฅผ ์•Œ์•„๋ณด๊ณ , ๊ฐ„๋‹จํ•œ ์‹œ๊ณ„์—ด ์˜ˆ์ธก ๋ชจ๋ธ์„ ์ง์ ‘ ๊ตฌํ˜„ํ•œ๋‹ค.


๋“ค์–ด๊ฐ€๋ฉฐ

์šฐ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ์–ธ์–ด, ์ฃผ์‹ ์‹œ์žฅ์˜ ๊ฐ€๊ฒฉ ๋ณ€๋™, ์‹ฌ์žฅ ๋ฐ•๋™ ๋ฐ์ดํ„ฐ ๋“ฑ ์„ธ์ƒ์—๋Š” ์ˆœ์„œ(sequence)๊ฐ€ ๋งค์šฐ ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ๋‹ค. "๋‚˜๋Š” ๋ฐฅ์„ ๋จน๋Š”๋‹ค"์™€ "๋ฐฅ์€ ๋‚˜๋ฅผ ๋จน๋Š”๋‹ค"๊ฐ€ ์ „ํ˜€ ๋‹ค๋ฅธ ์˜๋ฏธ์ธ ๊ฒƒ์ฒ˜๋Ÿผ, ์ˆœ์„œ ์ •๋ณด๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ณธ์งˆ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ํ•œ๋‹ค.

MLP๋‚˜ CNN๊ณผ ๊ฐ™์€ ๋ชจ๋ธ๋“ค์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ˆœ์„œ๋‚˜ ์‹œ๊ฐ„์  ๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๋Š”๋‹ค. ์ด๋Ÿฌํ•œ ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ(Sequential Data)๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๋ชจ๋ธ์ด ๋ฐ”๋กœ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(Recurrent Neural Network, RNN)์ด๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” RNN์ด ์–ด๋–ป๊ฒŒ ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋ฅผ '๊ธฐ์–ต'ํ•˜์—ฌ ํ˜„์žฌ์˜ ์ž…๋ ฅ๊ณผ ์—ฐ๊ฒฐ ์ง“๋Š”์ง€ ๊ทธ ์›๋ฆฌ๋ฅผ ์•Œ์•„๋ณด๊ณ , ๊ฐ„๋‹จํ•œ ์‹œ๊ณ„์—ด ์˜ˆ์ธก ๋ชจ๋ธ์„ ์ง์ ‘ ๊ตฌํ˜„ํ•ด๋ณด์•˜๋‹ค.

1. ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(RNN)์ด๋ž€?

RNN์€ ์ˆœ์„œ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์ด๋‹ค.
๋ชจ๋ธ ๋‚ด๋ถ€์— ์ˆœํ™˜ํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด,
์ด์ „ ์‹œ์ (time step)์˜ ์ •๋ณด๋ฅผ ๊ธฐ์–ตํ•˜๊ณ  ํ˜„์žฌ์˜ ์ž…๋ ฅ๊ณผ ํ•จ๊ป˜ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ํฐ ํŠน์ง•์ด๋‹ค.

RNN์€ ์ด์ „ ๋‹จ๊ณ„์˜ ์ถœ๋ ฅ์„ ํ˜„์žฌ ๋‹จ๊ณ„์˜ ์ž…๋ ฅ์œผ๋กœ ๋‹ค์‹œ ์‚ฌ์šฉํ•˜๋Š” ์žฌ๊ท€์ ์ธ ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค. ๋ชจ๋ธ์€ ๊ฐ ์‹œ์ ์—์„œ ์ž…๋ ฅ๊ฐ’๊ณผ ์ด์ „ ์‹œ์ ์˜ ์€๋‹‰ ์ƒํƒœ(Hidden State)๋ฅผ ํ•จ๊ป˜ ๋ฐ›์•„ ํ˜„์žฌ ์‹œ์ ์˜ ์€๋‹‰ ์ƒํƒœ๋ฅผ ๊ฐฑ์‹ ํ•œ๋‹ค. ์ด ์€๋‹‰ ์ƒํƒœ๊ฐ€ ๋ฐ”๋กœ RNN์ด ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋ฅผ ์š”์•ฝํ•˜์—ฌ ์ €์žฅํ•˜๋Š” '๋ฉ”๋ชจ๋ฆฌ' ์—ญํ• ์„ ํ•œ๋‹ค.

RNN
LSTM (Long Short-Term Memory)

2. RNN์˜ ํ•œ๊ณ„์™€ LSTM์˜ ๋“ฑ์žฅ

๊ธฐ๋ณธ์ ์ธ RNN ๊ตฌ์กฐ๋Š” ๊ฐ„๋‹จํ•˜์ง€๋งŒ, ์น˜๋ช…์ ์ธ ๋‹จ์ ์ด ์žˆ์—ˆ๋‹ค. ๋ฐ”๋กœ ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ(Long-term Dependency Problem)๋‹ค. ์‹œํ€€์Šค๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก, ์—ญ์ „ํŒŒ ๊ณผ์ •์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์ ์ฐจ ์‚ฌ๋ผ์ง€๊ฑฐ๋‚˜(Vanishing Gradient) ํญ๋ฐœํ•˜๋Š”(Exploding Gradient) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ ์•„์ฃผ ๋จผ ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋Š” ํ˜„์žฌ๊นŒ์ง€ ์ „๋‹ฌ๋˜๊ธฐ ์–ด๋ ค์› ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋“ฑ์žฅํ•œ ๊ฒƒ์ด LSTM(Long Short-Term Memory)์ด๋‹ค. LSTM์€ RNN์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ์— ์…€ ์ƒํƒœ(Cell State)์™€ 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ(Gate)๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ, ์–ด๋–ค ์ •๋ณด๋ฅผ ๊ธฐ์–ตํ•˜๊ณ , ์–ด๋–ค ์ •๋ณด๋ฅผ ์žŠ์–ด๋ฒ„๋ฆด์ง€๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ œ์–ดํ•œ๋‹ค.

  • Forget Gate: ๊ณผ๊ฑฐ์˜ ์ •๋ณด ์ค‘ ๋ฌด์—‡์„ ์žŠ์„์ง€ ๊ฒฐ์ •ํ•œ๋‹ค.
  • Input Gate: ํ˜„์žฌ ์ •๋ณด ์ค‘ ๋ฌด์—‡์„ ์…€ ์ƒํƒœ์— ์ €์žฅํ• ์ง€ ๊ฒฐ์ •ํ•œ๋‹ค.
  • Output Gate: ์…€ ์ƒํƒœ๋กœ๋ถ€ํ„ฐ ์–ด๋–ค ์ •๋ณด๋ฅผ ์ถœ๋ ฅ์œผ๋กœ ๋‚ด๋ณด๋‚ผ์ง€ ๊ฒฐ์ •ํ•œ๋‹ค.

์ด๋Ÿฌํ•œ ๊ฒŒ์ดํŠธ ๊ตฌ์กฐ ๋•๋ถ„์— LSTM์€ ํ›จ์”ฌ ๋” ๊ธด ์‹œํ€€์Šค์˜ ์˜์กด์„ฑ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค.

3. Python์œผ๋กœ RNN(LSTM) ๊ตฌํ˜„ํ•˜๊ธฐ

์ด๋ฒˆ์—๋Š” TensorFlow/Keras๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ์‚ฌ์ธ(Sine)ํŒŒํ˜•์„ ์˜ˆ์ธกํ•˜๋Š” LSTM ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋ณด์•˜๋‹ค. ๊ณผ๊ฑฐ์˜ ์‚ฌ์ธํŒŒ ๊ฐ’์„ ๋ณด๊ณ  ๋‹ค์Œ ์‹œ์ ์˜ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ์‹œ๊ณ„์—ด ์˜ˆ์ธก ๋ฌธ์ œ๋‹ค.

๊ฐ€. ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

tensorflow์™€ scikit-learn์ด ์ด๋ฏธ ์„ค์น˜๋˜์–ด ์žˆ๋‹ค๋ฉด ๋ณ„๋„์˜ ์„ค์น˜๋Š” ํ•„์š” ์—†๋‹ค.

๋‚˜. ์˜ˆ์ œ ์†Œ์Šค ์ฝ”๋“œ

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# 1. ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
# ์‚ฌ์ธํŒŒ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
def create_sequence_data(timesteps=50):
    # 0๋ถ€ํ„ฐ 100๊นŒ์ง€ 0.1 ๊ฐ„๊ฒฉ์œผ๋กœ 1000๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ์ƒ์„ฑ
    data = np.sin(np.arange(0, 100, 0.1))
    X, y = [], []
    for i in range(len(data) - timesteps):
        # timesteps ๋งŒํผ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ(X)์œผ๋กœ, ๊ทธ ๋‹ค์Œ ๋ฐ์ดํ„ฐ๋ฅผ ์ •๋‹ต(y)์œผ๋กœ
        X.append(data[i:(i + timesteps)])
        y.append(data[i + timesteps])
    return np.array(X), np.array(y)

TIMESTEPS = 50
X, y = create_sequence_data(TIMESTEPS)

# RNN/LSTM ์ž…๋ ฅ์„ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ํ˜•ํƒœ ๋ณ€๊ฒฝ (samples, timesteps, features)
X = X.reshape(X.shape[0], X.shape[1], 1)

# ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ (๋งˆ์ง€๋ง‰ 100๊ฐœ๋ฅผ ํ…Œ์ŠคํŠธ์šฉ์œผ๋กœ ์‚ฌ์šฉ)
X_train, X_test = X[:-100], X[-100:]
y_train, y_test = y[:-100], y[-100:]

# 2. LSTM ๋ชจ๋ธ ๊ตฌ์ถ•
model = tf.keras.Sequential([
    # ์ž…๋ ฅ ํ˜•ํƒœ: (TIMESTEPS, 1)
    # 50๊ฐœ์˜ LSTM ์œ ๋‹›์„ ๊ฐ€์ง„ ์ธต
    tf.keras.layers.LSTM(50, input_shape=(TIMESTEPS, 1)),

    # ์ถœ๋ ฅ์ธต (1๊ฐœ์˜ ๊ฐ’์„ ์˜ˆ์ธก)
    tf.keras.layers.Dense(1)
])

# 3. ๋ชจ๋ธ ์ปดํŒŒ์ผ
# ์†์‹ค ํ•จ์ˆ˜: mean_squared_error (ํšŒ๊ท€ ๋ฌธ์ œ์šฉ)
model.compile(optimizer='adam', loss='mean_squared_error')

# ๋ชจ๋ธ ๊ตฌ์กฐ ์š”์•ฝ
model.summary()

 

# 4. ๋ชจ๋ธ ํ•™์Šต

model.fit(X_train, y_train, epochs=20, batch_size=32)


# ์‹คํ–‰ ๊ฒฐ๊ณผ

20 ์—ํฌํฌ(epoch) ํ•™์Šต ํ›„, ๋ชจ๋ธ์€ ๋งค์šฐ ๋‚ฎ์€ MSE(ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ)๋ฅผ ๋ณด์ด๋ฉฐ ์‚ฌ์ธํŒŒํ˜•์„ ์„ฑ๊ณต์ ์œผ๋กœ ์˜ˆ์ธกํ–ˆ๋‹ค. ์•„๋ž˜๋Š” ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์™€ ํ•™์Šต ๊ณผ์ •, ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ๊ฐ’๊ณผ ์˜ˆ์ธก ๊ฐ’์„ ๋น„๊ตํ•œ ์‹œ๊ฐํ™” ๊ฒฐ๊ณผ์ด๋‹ค.

Model: "sequential"
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Layer (type)                    โ”ƒ Output Shape           โ”ƒ       Param # โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ lstm (LSTM)                     โ”‚ (None, 50)             โ”‚        10,400 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ dense (Dense)                   โ”‚ (None, 1)              โ”‚            51 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 Total params: 10,451 (40.82 KB)
 Trainable params: 10,451 (40.82 KB)
 Non-trainable params: 0 (0.00 B)

Epoch 1/20
27/27 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 2s 16ms/step - loss: 0.2361
Epoch 2/20
27/27 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 0s 14ms/step - loss: 0.0167
...
Epoch 20/20
27/27 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 0s 15ms/step - loss: 2.6237e-07

Test MSE: 0.000000
Test RMSE: 0.000481

ํ•™์Šต์ด ์ง„ํ–‰๋ ์ˆ˜๋ก ์†์‹ค(MSE Loss)์ด ๊ธ‰๊ฒฉํžˆ ๊ฐ์†Œํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

RNN ํ•™์Šต๊ณก์„ 

# 5. ๋ชจ๋ธ ํ‰๊ฐ€ ๋ฐ ์˜ˆ์ธก

# ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ์˜ˆ์ธก ์ˆ˜ํ–‰

predicted_values = model.predict(X_test)

# 6. ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”

plt.figure(figsize=(12, 6))
plt.plot(np.arange(len(y_train), len(y_train) + len(y_test)), y_test, label='Actual')
plt.plot(np.arange(len(y_train), len(y_train) + len(y_test)), predicted_values, label='Predicted', linestyle='--')
plt.title('Sine Wave Prediction')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()


์•„๋ž˜ ๊ทธ๋ž˜ํ”„๋Š” ์‹ค์ œ ์‚ฌ์ธํŒŒํ˜•๊ณผ ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๊ฐ’์„ ๋น„๊ตํ•œ ๊ฒƒ์ด๋‹ค.
์ฃผํ™ฉ์ƒ‰ ์‹ค์„ ์ด ์‹ค์ œ ๊ฐ’, ๋ถ‰์€์ƒ‰ ์ ์„ ์ด ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’์ธ๋ฐ, ๊ฑฐ์˜ ์™„๋ฒฝํ•˜๊ฒŒ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

RNN ์˜ˆ์ธก ๊ฒฐ๊ณผ

๋งˆ์น˜๋ฉฐ

RNN๊ณผ ๊ทธ ๋ฐœ์ „ํ˜•์ธ LSTM, GRU๋Š” ์ˆœ์„œ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ๋„๊ตฌ๋‹ค. ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(๊ธฐ๊ณ„ ๋ฒˆ์—ญ, ์ฑ—๋ด‡, ๊ฐ์„ฑ ๋ถ„์„), ์‹œ๊ณ„์—ด ์˜ˆ์ธก(์ฃผ๊ฐ€ ์˜ˆ์ธก, ์ˆ˜์š” ์˜ˆ์ธก), ์Œ์„ฑ ์ธ์‹ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ๋‹ค.

์ตœ๊ทผ์—๋Š” Transformer ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ถ„์•ผ์—์„œ RNN์˜ ์ž๋ฆฌ๋ฅผ ๋Œ€์ฒดํ•˜๊ณ  ์žˆ์ง€๋งŒ, RNN์˜ ๊ธฐ๋ณธ ์•„์ด๋””์–ด์ธ '์ˆœํ™˜'๊ณผ '๊ธฐ์–ต'์˜ ๊ฐœ๋…์€ ์—ฌ์ „ํžˆ ๋”ฅ๋Ÿฌ๋‹์˜ ์ค‘์š”ํ•œ ์ถ•์„ ์ด๋ฃจ๊ณ  ์žˆ๋‹ค.


์ฐธ๊ณ  ์ž๋ฃŒ