ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(Convolutional Neural Network, CNN)

ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN) CNN์ด ์ด๋ฏธ์ง€์˜ ๊ณต๊ฐ„ ์ •๋ณด๋ฅผ ์–ด๋–ป๊ฒŒ ํ•™์Šตํ•˜๋Š”์ง€ ์›๋ฆฌ๋ฅผ ํŒŒํ—ค์น˜๊ณ , TensorFlow/Keras๋กœ MNIST ์†๊ธ€์”จ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ๋งŒ๋“ ๋‹ค.


๋“ค์–ด๊ฐ€๋ฉฐ

์ง€๋‚œ ํฌ์ŠคํŒ…์—์„œ ๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ๋ณธ ๋ชจ๋ธ์ธ MLP์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์•˜๋‹ค. MLP๋Š” ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์ด์ง€๋งŒ, ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ๋Š” ํ•œ๊ณ„๊ฐ€ ๋ช…ํ™•ํ–ˆ๋‹ค. ์ด๋ฏธ์ง€๋ฅผ 1์ฐจ์› ๋ฒกํ„ฐ๋กœ ํŽผ์ณ์„œ ์ฒ˜๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ํ”ฝ์…€ ๊ฐ„์˜ ๊ณต๊ฐ„์ ์ธ ๊ด€๊ณ„ ์ •๋ณด(spatial information)๋ฅผ ์žƒ์–ด๋ฒ„๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋“ฑ์žฅํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(Convolutional Neural Network, CNN)์ด๋‹ค. CNN์€ ์ธ๊ฐ„์˜ ์‹œ์‹ ๊ฒฝ์ด ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹์„ ๋ชจ๋ฐฉํ•˜์—ฌ, ์ด๋ฏธ์ง€์˜ ์ง€์—ญ์  ํŠน์ง•์„ ํšจ๊ณผ์ ์œผ๋กœ ์ถ”์ถœํ•˜๊ณ  ํ•™์Šตํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” CNN์˜ ํ•ต์‹ฌ ์›๋ฆฌ๋ฅผ ํŒŒํ—ค์น˜๊ณ , TensorFlow/Keras๋กœ ์ง์ ‘ ์†๊ธ€์”จ ์ˆซ์ž ์ด๋ฏธ์ง€(MNIST)๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋ณด์•˜๋‹ค.

1. ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN)์ด๋ž€?

CNN์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์— ํŠนํ™”๋œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด๋‹ค.
๋ชจ๋ธ ์Šค์Šค๋กœ ์ด๋ฏธ์ง€์˜ ํŠน์ง•(Feature)์„ ํ•™์Šตํ•˜์—ฌ ํŒจํ„ด์„ ํŒŒ์•…ํ•œ๋‹ค.
ํ•„ํ„ฐ(Filter, ๋˜๋Š” ์ปค๋„)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ถ”์ถœํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ(Convolution) ์—ฐ์‚ฐ๊ณผ,
ํŠน์ง•์„ ์••์ถ•ํ•˜๊ณ  ๊ฐ•์กฐํ•˜๋Š” ํ’€๋ง(Pooling) ์—ฐ์‚ฐ์ด ํ•ต์‹ฌ์ ์ธ ๊ตฌ์„ฑ ์š”์†Œ๋‹ค.

CNN์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ

2. CNN์˜ ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ

๊ฐ€. ํ•ฉ์„ฑ๊ณฑ ์ธต (Convolutional Layer)

ํ•ฉ์„ฑ๊ณฑ ์ธต์—์„œ๋Š” ํ•„ํ„ฐ(Filter)๊ฐ€ ์ด๋ฏธ์ง€ ์œ„๋ฅผ ์ผ์ •ํ•œ ๊ฐ„๊ฒฉ(Stride)์œผ๋กœ ์ด๋™ํ•˜๋ฉด์„œ, ํ•„ํ„ฐ์™€ ์ด๋ฏธ์ง€์˜ ํ•ด๋‹น ๋ถ€๋ถ„์˜ ์›์†Œ๋ณ„ ๊ณฑ์…ˆ ํ•ฉ์„ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ํŠน์ • ํŒจํ„ด(์ˆ˜์ง์„ , ์ˆ˜ํ‰์„ , ํŠน์ • ์ƒ‰์ƒ ๋“ฑ)์ด ์–ด๋””์— ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํŠน์ง• ๋งต(Feature Map)์ด ์ƒ์„ฑ๋œ๋‹ค.

  • ํ•„ํ„ฐ(Filter): ํŠน์ง• ์ถ”์ถœ๊ธฐ์˜ ์—ญํ• ์„ ํ•˜๋Š” ์ž‘์€ ํ–‰๋ ฌ. ์ด ํ•„ํ„ฐ์˜ ๊ฐ’๋“ค(๊ฐ€์ค‘์น˜)์ด ๋ฐ”๋กœ ํ•™์Šต ๊ณผ์ •์—์„œ ์—…๋ฐ์ดํŠธ๋œ๋‹ค.
  • ์ŠคํŠธ๋ผ์ด๋“œ(Stride): ํ•„ํ„ฐ๊ฐ€ ํ•œ ๋ฒˆ์— ์ด๋™ํ•˜๋Š” ํ”ฝ์…€์˜ ํฌ๊ธฐ.
  • ํŒจ๋”ฉ(Padding): ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ ํ›„ ํŠน์ง• ๋งต์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์•„์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ณ , ์ด๋ฏธ์ง€์˜ ์™ธ๊ณฝ ๋ถ€๋ถ„ ์ •๋ณด๋ฅผ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๊ฐ€์žฅ์ž๋ฆฌ์— ํŠน์ • ๊ฐ’(์ฃผ๋กœ 0)์„ ์ฑ„์›Œ ๋„ฃ๋Š” ๊ฒƒ.

๋‚˜. ํ’€๋ง ์ธต (Pooling Layer)

ํ’€๋ง ์ธต์€ ํ•ฉ์„ฑ๊ณฑ ์ธต์—์„œ ์–ป์€ ํŠน์ง• ๋งต์˜ ํฌ๊ธฐ๋ฅผ ์ค„์—ฌ(Sub-sampling) ๊ณ„์‚ฐ๋Ÿ‰์„ ๊ฐ์†Œ์‹œํ‚ค๊ณ , ์ฃผ์š” ํŠน์ง•์„ ๋”์šฑ ๊ฐ•์กฐํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค. ์ฃผ๋กœ ์ตœ๋Œ€ ํ’€๋ง(Max Pooling)์ด ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์ด๋Š” ํŠน์ • ๊ตฌ์—ญ์—์„œ ๊ฐ€์žฅ ํฐ ๊ฐ’(๊ฐ€์žฅ ํ™œ์„ฑํ™”๋œ ํŠน์ง•)๋งŒ์„ ๋‚จ๊ธฐ๋Š” ๋ฐฉ์‹์ด๋‹ค.

ํ’€๋ง์„ ํ†ตํ•ด ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜๊ฐ€ ์กฐ๊ธˆ ๋ณ€ํ•˜๋”๋ผ๋„ ๋™์ผํ•œ ๊ฐ์ฒด๋กœ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋Š” ์ด๋™ ๋ถˆ๋ณ€์„ฑ(Translation Invariance) ํŠน์„ฑ์„ ์–ป๊ฒŒ ๋œ๋‹ค.

3. Python์œผ๋กœ CNN ๊ตฌํ˜„ํ•˜๊ธฐ

์ด๋ฒˆ์—๋Š” TensorFlow/Keras๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์œ ๋ช…ํ•œ MNIST ์†๊ธ€์”จ ์ˆซ์ž ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฐ„๋‹จํ•œ CNN ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์•˜๋‹ค.

๊ฐ€. ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

MLP ํฌ์ŠคํŒ…์—์„œ tensorflow๋ฅผ ์ด๋ฏธ ์„ค์น˜ํ–ˆ๋‹ค๋ฉด ๋ณ„๋„์˜ ์„ค์น˜๋Š” ํ•„์š” ์—†๋‹ค.

๋‚˜. ์˜ˆ์ œ ์†Œ์Šค ์ฝ”๋“œ

28x28 ํ”ฝ์…€ ํฌ๊ธฐ์˜ ํ‘๋ฐฑ ์†๊ธ€์”จ ์ˆซ์ž ์ด๋ฏธ์ง€๋ฅผ 0๋ถ€ํ„ฐ 9๊นŒ์ง€ 10๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” CNN ๋ชจ๋ธ์ด๋‹ค.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# 1. MNIST ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ ๋ฐ ์ „์ฒ˜๋ฆฌ
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ 0~1 ์‚ฌ์ด ๊ฐ’์œผ๋กœ ์ •๊ทœํ™” ๋ฐ ์ฑ„๋„ ์ฐจ์› ์ถ”๊ฐ€
# (60000, 28, 28) -> (60000, 28, 28, 1)
x_train = x_train.reshape((60000, 28, 28, 1)) / 255.0
x_test = x_test.reshape((10000, 28, 28, 1)) / 255.0

# 2. CNN ๋ชจ๋ธ ๊ตฌ์ถ• (Keras ์‚ฌ์šฉ)
model = tf.keras.models.Sequential([
    # ์ฒซ ๋ฒˆ์งธ ํ•ฉ์„ฑ๊ณฑ ์ธต
    # 32๊ฐœ์˜ 3x3 ํ•„ํ„ฐ, ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ReLU
    # ์ž…๋ ฅ ์ด๋ฏธ์ง€ ํฌ๊ธฐ: (28, 28, 1)
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),

    # ์ฒซ ๋ฒˆ์งธ ํ’€๋ง ์ธต (Max Pooling)
    tf.keras.layers.MaxPooling2D((2, 2)),

    # ๋‘ ๋ฒˆ์งธ ํ•ฉ์„ฑ๊ณฑ ์ธต
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),

    # ๋‘ ๋ฒˆ์งธ ํ’€๋ง ์ธต
    tf.keras.layers.MaxPooling2D((2, 2)),

    # ์„ธ ๋ฒˆ์งธ ํ•ฉ์„ฑ๊ณฑ ์ธต
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),

    # 3D ํŠน์ง• ๋งต์„ 1D ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜
    tf.keras.layers.Flatten(),

    # ์™„์ „ ์—ฐ๊ฒฐ์ธต (MLP)
    tf.keras.layers.Dense(64, activation='relu'),

    # ์ถœ๋ ฅ์ธต (10๊ฐœ ํด๋ž˜์Šค, Softmax ํ™œ์„ฑํ™” ํ•จ์ˆ˜)
    tf.keras.layers.Dense(10, activation='softmax')
])

# 3. ๋ชจ๋ธ ์ปดํŒŒ์ผ
# ์†์‹ค ํ•จ์ˆ˜: sparse_categorical_crossentropy (์ •์ˆ˜ ํ˜•ํƒœ์˜ ๋ ˆ์ด๋ธ”์šฉ)
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# ๋ชจ๋ธ ๊ตฌ์กฐ ์š”์•ฝ
model.summary()

# 4. ๋ชจ๋ธ ํ•™์Šต
model.fit(x_train, y_train, epochs=5, batch_size=64)

# 5. ๋ชจ๋ธ ํ‰๊ฐ€
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f"\nTest accuracy: {test_acc*100:.2f}%")

# 6. ์˜ˆ์ธก ๊ฒฐ๊ณผ ํ™•์ธ
# ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ์ฒซ ๋ฒˆ์งธ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก
predictions = model.predict(x_test)
predicted_label = np.argmax(predictions[0])
actual_label = y_test[0]

plt.imshow(x_test[0].reshape(28, 28), cmap='gray_r')
plt.title(f"Predicted: {predicted_label}, Actual: {actual_label}")
plt.show()

๋‹ค. ์‹คํ–‰ ๊ฒฐ๊ณผ

ํ•™์Šต ์ค‘ ์†์‹ค(Loss)๊ณผ ์ •ํ™•๋„(Accuracy)๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ณ€ํ•˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋Š” ๊ทธ๋ž˜ํ”„์ด๋‹ค. 
์—ํฌํฌ๊ฐ€ ์ง„ํ–‰๋ ์ˆ˜๋ก ์†์‹ค์€ ์ค„๊ณ  ์ •ํ™•๋„๋Š” 1์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ์ด์ƒ์ ์ธ ๋ชจ์Šต์„ ๋ณด์ธ๋‹ค.

ํ•™์Šต ์ค‘ ์†์‹ค(Loss)๊ณผ ์ •ํ™•๋„(Accuracy)๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ณ€ํ•˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋Š” ๊ทธ๋ž˜ํ”„

 

๋‹ค์Œ์€ ์‹ค์ œ ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€ 10๊ฐœ์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฒฐ๊ณผ์ด๋‹ค. 
๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๊ฐ’(Pred)๊ณผ ์‹ค์ œ ๊ฐ’(Actual)์„ ๋น„๊ตํ•ด๋ณด๋ฉด, ํ‹€๋ฆฐ ์˜ˆ์ธก์€ ๋ถ‰์€์ƒ‰์œผ๋กœ ํ‘œ์‹œ๋œ๋‹ค. 
์ด ์˜ˆ์‹œ์—์„œ๋Š” 10๊ฐœ ๋ชจ๋‘ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ–ˆ๋‹ค.

ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€ 10๊ฐœ์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฒฐ๊ณผ

 

๋งˆ์น˜๋ฉฐ

CNN์€ ํ•ฉ์„ฑ๊ณฑ๊ณผ ํ’€๋ง์ด๋ผ๋Š” ๋…์ฐฝ์ ์ธ ์•„์ด๋””์–ด๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ๊ณต๊ฐ„์  ํŠน์ง•์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•œ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ๊ฐ์ฒด ํƒ์ง€(Object Detection), ์ด๋ฏธ์ง€ ๋ถ„ํ• (Image Segmentation) ๋“ฑ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์˜ ํ•ต์‹ฌ์ ์ธ ๊ธฐ์ˆ ๋กœ ์ž๋ฆฌ ์žก์•˜๋‹ค.


์ฐธ๊ณ  ์ž๋ฃŒ