Posts on Axect's Blog https://axect.github.io/kr/posts/ Recent content in Posts on Axect's Blog Hugo -- gohugo.io kr <a href="https://creativecommons.org/licenses/by-nc/4.0/" target="_blank" rel="noopener">CC BY-NC 4.0</a> Mon, 04 Dec 2023 15:38:04 +0900 ๐Ÿค– Rust์™€ ๋ฏธ๋ถ„ํ•˜๊ธฐ 03: ์ •๋ฐฉํ–ฅ ์ž๋™ ๋ฏธ๋ถ„ https://axect.github.io/kr/posts/007_ad_3/ Mon, 04 Dec 2023 15:38:04 +0900 https://axect.github.io/kr/posts/007_ad_3/ <blockquote> <p><strong>๐Ÿ”– Automatic Differentiation Series</strong></p> <ol> <li><a href="../002_ad_1">๐Ÿ’ป Numerical Differentiation</a></li> <li><a href="../002_ad_2">๐Ÿ–Š๏ธ Symbolic Differentiation</a></li> <li><a href="../007_ad_3">๐Ÿค– Automatic Differentiation</a></li> </ol> </blockquote> <p>๋”ฅ๋Ÿฌ๋‹์„ ๊ตฌํ˜„ํ•จ์— ์žˆ์–ด์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์š”์†Œ๊ฐ€ ๋ญ˜๊นŒ์š”? ๋ฌผ๋ก  ๋งŽ์€ ํ•™๋ฌธ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋”ฅ๋Ÿฌ๋‹์˜ ํŠน์„ฑ์ƒ ๋ชจ๋“  ์š”์†Œ๋“ค์ด ๋‹ค ์ค‘์š”ํ•˜์ง€๋งŒ, ๊ทธ ์ค‘์—์„œ๋„ ํŠนํžˆ ์‹ ๊ฒฝ์จ์•ผํ•˜๋Š” ์š”์†Œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์Œ์˜ PyTorch ์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ด…์‹œ๋‹ค.</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>net <span style="color:#f92672">=</span> nn<span style="color:#f92672">.</span>Sequential( </span></span><span style="display:flex;"><span> nn<span style="color:#f92672">.</span>Linear(<span style="color:#ae81ff">2</span>, <span style="color:#ae81ff">1</span>), </span></span><span style="display:flex;"><span> nn<span style="color:#f92672">.</span>Sigmoid() </span></span><span style="display:flex;"><span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span><span style="color:#75715e"># x = ...</span> </span></span><span style="display:flex;"><span><span style="color:#75715e"># y = ...</span> </span></span><span style="display:flex;"><span><span style="color:#75715e"># criterion = ...</span> </span></span><span style="display:flex;"><span>opt <span style="color:#f92672">=</span> optim<span style="color:#f92672">.</span>SGD(net<span style="color:#f92672">.</span>parameters(), lr<span style="color:#f92672">=</span><span style="color:#ae81ff">0.01</span>) </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>opt<span style="color:#f92672">.</span>zero_grad() </span></span><span style="display:flex;"><span>loss <span style="color:#f92672">=</span> criterion(net(x), y) </span></span><span style="display:flex;"><span>loss<span style="color:#f92672">.</span>backward() </span></span><span style="display:flex;"><span>opt<span style="color:#f92672">.</span>step() </span></span></code></pre></div><p>์ด๋ฅผ ์•„๋ฌด๋Ÿฐ ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์“ฐ์ง€ ์•Š๊ณ  ๊ตฌํ˜„ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค. ์ผ๋‹จ ์—„๋ฐ€ํ•˜๊ฒŒ ๊ฐ™์€ ๊ตฌํ˜„์€ ์•„๋‹ˆ์ง€๋งŒ <code>Linear</code>์™€ <code>Sigmoid</code> ํ•จ์ˆ˜ ์ž์ฒด์˜ ๊ตฌํ˜„์€ ๋‹จ์ˆœํžˆ ํ–‰๋ ฌ๊ณฑ๊ณผ ๋ฒกํ„ฐํ™”๋œ sigmoid ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ <code>net(x)</code>๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ์–ด๋ ต์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ ์—ฌ๊ธฐ์„  <code>criterion</code>์ด ๋ฌด์—‡์ธ์ง€ ๋ช…์‹œํ•˜์ง€๋Š” ์•Š์•˜์ง€๋งŒ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ <code>MSE</code>๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ์ด ์—ญ์‹œ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ๋ฌธ์ œ๋Š” <code>opt</code>๋ถ€ํ„ฐ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค. <code>SGD</code>๋ฅผ ๊ตฌํ˜„ํ•˜๋ ค๋ฉด, ์–ด๋–ค <code>criterion</code>์ด๋‚˜ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์—์„œ๋„ gradient ์ฆ‰, ๋„ํ•จ์ˆ˜๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.</p>

๐Ÿ”– Automatic Differentiation Series

  1. ๐Ÿ’ป Numerical Differentiation
  2. ๐Ÿ–Š๏ธ Symbolic Differentiation
  3. ๐Ÿค– Automatic Differentiation

๋”ฅ๋Ÿฌ๋‹์„ ๊ตฌํ˜„ํ•จ์— ์žˆ์–ด์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์š”์†Œ๊ฐ€ ๋ญ˜๊นŒ์š”? ๋ฌผ๋ก  ๋งŽ์€ ํ•™๋ฌธ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋”ฅ๋Ÿฌ๋‹์˜ ํŠน์„ฑ์ƒ ๋ชจ๋“  ์š”์†Œ๋“ค์ด ๋‹ค ์ค‘์š”ํ•˜์ง€๋งŒ, ๊ทธ ์ค‘์—์„œ๋„ ํŠนํžˆ ์‹ ๊ฒฝ์จ์•ผํ•˜๋Š” ์š”์†Œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์Œ์˜ PyTorch ์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ด…์‹œ๋‹ค.

net = nn.Sequential(
  nn.Linear(2, 1),
  nn.Sigmoid()
)

# x = ...
# y = ...
# criterion = ...
opt = optim.SGD(net.parameters(), lr=0.01)

opt.zero_grad()
loss = criterion(net(x), y)
loss.backward()
opt.step()

์ด๋ฅผ ์•„๋ฌด๋Ÿฐ ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์“ฐ์ง€ ์•Š๊ณ  ๊ตฌํ˜„ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค. ์ผ๋‹จ ์—„๋ฐ€ํ•˜๊ฒŒ ๊ฐ™์€ ๊ตฌํ˜„์€ ์•„๋‹ˆ์ง€๋งŒ Linear์™€ Sigmoid ํ•จ์ˆ˜ ์ž์ฒด์˜ ๊ตฌํ˜„์€ ๋‹จ์ˆœํžˆ ํ–‰๋ ฌ๊ณฑ๊ณผ ๋ฒกํ„ฐํ™”๋œ sigmoid ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ net(x)๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ์–ด๋ ต์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ ์—ฌ๊ธฐ์„  criterion์ด ๋ฌด์—‡์ธ์ง€ ๋ช…์‹œํ•˜์ง€๋Š” ์•Š์•˜์ง€๋งŒ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ MSE๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ์ด ์—ญ์‹œ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ๋ฌธ์ œ๋Š” opt๋ถ€ํ„ฐ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค. SGD๋ฅผ ๊ตฌํ˜„ํ•˜๋ ค๋ฉด, ์–ด๋–ค criterion์ด๋‚˜ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์—์„œ๋„ gradient ์ฆ‰, ๋„ํ•จ์ˆ˜๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด ์•ž์—์„œ ๋ฐฐ์šด ์ˆ˜์น˜๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์œ„ ๊ฒฝ์šฐ์—๋Š” ์‹ ๊ฒฝ๋ง์ด ๋‹จ์ผ ๊ณ„์ธต์œผ๋กœ ๋˜์–ด์žˆ๊ณ  ์ž…๋ ฅ, ์ถœ๋ ฅ ์ฐจ์›๋„ ์ž‘์œผ๋ฏ€๋กœ ํฌ๊ฒŒ ๋ฌธ์ œ๋˜์ง„ ์•Š๊ฒ ์ง€๋งŒ ๋ณดํ†ต ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋“ค์€ ๋‹ค์ธต์œผ๋กœ ๋˜์–ด์žˆ์œผ๋ฉฐ ์ž…์ถœ๋ ฅ ์ฐจ์›๋„ ํฐ๋ฐ๋‹ค ์„ฑ๊ณต์ ์ธ ํ›ˆ๋ จ์„ ์œ„ํ•ด์„œ๋Š” ์ˆ˜์ฒœ, ์ˆ˜๋งŒ ๋ฒˆ ์ด์ƒ ํ›ˆ๋ จ์„ ํ•ด์•ผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿด ๊ฒฝ์šฐ, ์ˆ˜์น˜๋ฏธ๋ถ„์œผ๋กœ ์ธํ•œ ์‚ฌ์†Œํ•œ ์˜ค์ฐจ๊ฐ€ ํ›ˆ๋ จ์„ ๊ฑฐ๋“ญํ•˜๋ฉฐ ๋ˆˆ๋ฉ์ด์ฒ˜๋Ÿผ ๋ถˆ์–ด๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์•ž์—์„œ ๋ฐฐ์šด ๊ธฐํ˜ธ๋ฏธ๋ถ„์€ ์–ด๋–จ๊นŒ์š”? ๊ธฐํ˜ธ๋ฏธ๋ถ„์€ ์ •ํ™•ํ•œ ๋„ํ•จ์ˆ˜์˜ ํ˜•ํƒœ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๊ธฐ์— ๋ถ€๋™์†Œ์ˆ˜์  ์˜ค์ฐจ์™ธ์—๋Š” ๋ณ„๋„์˜ ์˜ค์ฐจ๊ฐ€ ๋ˆ„์ ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์•ž์„  ๊ธ€์—์„œ๋„ ์ง€์ ํ–ˆ๋‹ค์‹œํ”ผ ๊ธฐํ˜ธ๋ฏธ๋ถ„์€ ํ•œ ๋ฒˆ ๊ณ„์‚ฐํ• ๋•Œ๋งˆ๋‹ค ํฐ ๊ณ„์‚ฐ ๋น„์šฉ์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ„์‚ฐ ๋น„์šฉ์€ ์ˆ˜์ฒœ๋ฒˆ ์ด์ƒ์˜ ํ›ˆ๋ จ์„ ๊ฑฐ์น˜๋ฉด์„œ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋ฉฐ, ์ด๋Š” ์›ํ•˜๋Š” ์‹œ๊ฐ„ ๋‚ด์—, ํ˜น์€ ์ œํ•œ๋œ ๋ฉ”๋ชจ๋ฆฌ ์ƒํ™ฉ์—์„œ ํ›ˆ๋ จ์„ ์–ด๋ ต๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•ž์„œ ๋ฐฐ์šด ๋‘ ๊ฐ€์ง€ ๋ฏธ๋ถ„๋ฐฉ๋ฒ• ๋ชจ๋‘ ๋ถ€์ ํ•ฉํ•˜๋‹ค๋ฉด ๋Œ€์ฒด ์–ด๋–ป๊ฒŒ ๊ตฌํ˜„ํ•ด์•ผํ• ๊นŒ์š”? ๋‹คํ–‰ํžˆ ๊ณผํ•™์ž๋“ค์€ ๋”ฅ๋Ÿฌ๋‹์ด ๋‚˜์˜ค๊ธฐ ์ด์ „์— ์ด๋ฏธ ์ด๋ฅผ ์œ„ํ•œ ๋„๊ตฌ๋ฅผ ๋ฐœ๋ช…ํ•ด๋†“์•˜์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ ์ž๋™๋ฏธ๋ถ„ (Automatic Differentiation) ์ž…๋‹ˆ๋‹ค. ์ž๋™๋ฏธ๋ถ„์€ ๊ณ„์‚ฐ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ž…๋ ฅ ๋ณ€์ˆ˜์˜ ๋ณ€ํ™”์— ๋”ฐ๋ผ ๊ฐ ํ•จ์ˆ˜์— ์ „ํŒŒ๋˜๋Š” ๋„ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹์„ ์ •๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„์ด๋ผ ๋ถ€๋ฅด๋ฉฐ, ๋ฐ˜๋Œ€๋กœ ์ถœ๋ ฅ ๋ณ€์ˆ˜์˜ ๋ณ€ํ™”์— ๋”ฐ๋ผ ๋„ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹์„ ์—ญ๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„์ด๋ผ ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ๊ฐ ๋ฐฉ์‹์€ ์„œ๋กœ ๋‹ค๋ฅธ ์žฅ๋‹จ์ ์„ ๊ฐ–๊ณ  ์žˆ์–ด ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ์„ ํƒํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ํŠนํžˆ ์ •๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„์— ์ง‘์ค‘ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Truncated Taylor Series

์ž๋™๋ฏธ๋ถ„์ด๋ผ๋Š” ๊ฑฐ์ฐฝํ•œ ์ด๋ฆ„์ด ๋ถ™์—ˆ์ง€๋งŒ, ์ •๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„์˜ ๊ฐœ๋…์€ ๋Œ€ํ•™๊ต ๋ฏธ์ ๋ถ„ํ•™์‹œ๊ฐ„์— ๋ฐฐ์šฐ๋Š” ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜์—์„œ ์œ ๋ž˜ํ•ฉ๋‹ˆ๋‹ค. ๋ฏธ๋ถ„๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜ $f$์˜ ํŠน์ • ์ง€์  $x_0$์—์„œ์˜ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ „๊ฐœ๋ฉ๋‹ˆ๋‹ค.

$$ f(x) = f(x_0) + f’(x_0)(x - x_0) + \frac{f’’(x_0)}{2!}(x - x_0)^2 + \cdots $$

์ด๋•Œ $x$ ๋Œ€์‹  $x$์— ์ถฉ๋ถ„ํžˆ ์ž‘์€ (infinitesimal) ๋ณ€๋Ÿ‰ $\epsilon$์„ ๊ฐ€ํ•˜๊ณ  ์ด๋ฅผ $x$์— ๋Œ€ํ•ด์„œ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋ฅผ ์ „๊ฐœํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ f(x+ \epsilon) = f(x) + f’(x)\epsilon + \mathcal{O}(\epsilon^2) $$

$\epsilon$์ด ์ถฉ๋ถ„ํžˆ ์ž‘๋‹ค๊ณ  ๊ฐ€์ •ํ–ˆ์œผ๋ฏ€๋กœ ($|\epsilon| \ll 1$), $\epsilon^2$ ์ด์ƒ์˜ ํ•ญ๋“ค์€ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ f(x + \epsilon) = f(x) + f’(x)\epsilon $$

์ด๋ ‡๊ฒŒ ํŠน์ • ์ฐจ์ˆ˜๊นŒ์ง€๋งŒ ์ž‘์„ฑํ•˜๋Š” ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋ฅผ Truncated Taylor Series๋ผ ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ๊ผญ ์ผ์ฐจํ•ญ๊นŒ์ง€๋งŒ ๋‚จ๊ฒจ๋†“์„ ํ•„์š”๋Š” ์—†์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ฌธ์ œ๋Š” ๋„ํ•จ์ˆ˜๊นŒ์ง€์˜ ์ •๋ณด๋งŒ ์š”๊ตฌํ•˜๋ฏ€๋กœ ์—ฌ๊ธฐ์„œ๋Š” ์ผ์ฐจํ•ญ๊นŒ์ง€๋งŒ ์ž‘์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด์ œ $\epsilon$์— ๊ณ„์ˆ˜๊ฐ€ ๊ณฑํ•ด์ ธ์žˆ๋Š” ํ˜•ํƒœ๋ฅผ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค.

$$ f(x + \overline{x} \epsilon) = f(x) + f’(x)\overline{x}\epsilon $$

์œ„ ์‹์„ ๋ณด๋ฉด $\epsilon$์ด ๋งˆ์น˜ ์ผ์ข…์˜ ๊ธฐ์ €์ฒ˜๋Ÿผ ์ž‘์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. $x + \overline{x} \epsilon = x \mathbb{1} + \overline{x} \epsilon$ ์œผ๋กœ ๋ณด๋ฉด $\mathbb{1}$๊ณผ $\epsilon$์ด ๊ธฐ์ €์ธ ๊ณต๊ฐ„์—์„œ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฅผ ํ•˜๋‚˜์˜ ์ˆ˜๋กœ ์ทจ๊ธ‰ํ•˜๊ณ  ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐ„๋‹จํžˆ ํ‘œํ˜„ํ•ด๋ด…์‹œ๋‹ค.

$$ x \mathrel{\rhd} \overline{x} \equiv x + \overline{x}\epsilon $$

์ด๋Ÿฌํ•œ ์ˆ˜๋ฅผ ์ด์›์ˆ˜ (Dual number) ๋ผ ๋ถ€๋ฅด๋ฉฐ ๋ฌด๋ ค 1873๋…„์— William Clifford์— ์˜ํ•ด ์ฐฝ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์›์ˆ˜๋Š” ์ฒ˜์Œ ๋ชฉ์ ์ด ์ž๋™๋ฏธ๋ถ„์€ ์•„๋‹ˆ์—ˆ์ง€๋งŒ ์ตœ๊ทผ์— ์™€์„œ๋Š” ์ž๋™๋ฏธ๋ถ„์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋Œ€ํ‘œ์ ์ธ ์ˆ˜์ฒด๊ณ„๋กœ ์ž๋ฆฌ์žก์•˜์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ด ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹ค์‹œ ์ผ์ฐจํ•ญ๊นŒ์ง€์˜ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋ฅผ ์ž‘์„ฑํ•ด๋ด…์‹œ๋‹ค.

$$ f(x \mathrel{\rhd} \overline{x}) = f(x) \mathrel{\rhd} f’(x)\overline{x} $$

์ด์ œ ์ข€ ๋” ๊ฐ„๋‹จํ•ด์กŒ์ง€๋งŒ ํ•˜๋‚˜์˜ ๋ฌธ์ œ๊ฐ€ ๋‚จ์•„์žˆ์Šต๋‹ˆ๋‹ค. ํ•จ์ˆ˜ $f$์˜ ์ •์˜๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. $x$๋ฅผ 1์ฐจ์› ์‹ค์ˆ˜๋กœ ๊ฐ€์ •ํ•˜๊ณ  $x \mathrel{\rhd} \overline{x}$๋ฅผ ์ผ์ฐจ์› ์ด์›์ˆ˜($\mathbb{D}\mathbb{R}$)๋ผ ๊ฐ€์ •ํ•˜๋ฉด ์ขŒ๋ณ€์—์„œ์˜ ํ•จ์ˆ˜ $f$๋Š” $\mathbb{DR} \rightarrow \mathbb{DR}$ ํ•จ์ˆ˜์ด์ง€๋งŒ ์šฐ๋ณ€์—์„œ์˜ ํ•จ์ˆ˜ $f$๋Š” $\mathbb{R} \rightarrow \mathbb{R}$ ์ด๋ฏ€๋กœ ์ •์˜๊ฐ€ ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‹คํ–‰ํžˆ ์ƒˆ๋กœ์šด ์—ฐ์‚ฐ์ž์ธ $\vec{\mathcal{J}}$์„ ๋„์ž…ํ•˜๋ฉด ์‰ฝ๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ \vec{\mathcal{J}}f(x \mathrel{\rhd}\overline{x}) = f(x) \mathrel{\rhd} f’(x)\overline{x} $$

์ด๋•Œ ์—ฐ์‚ฐ์ž $\vec{\mathcal{J}}$๋Š” ๋‹ค์–‘ํ•œ ๊ด€์ ์—์„œ ํ•ด์„๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋ช‡ ๊ฐ€์ง€๋งŒ ์•„๋ž˜์— ๊ธฐ์ˆ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

  • $f$๋ผ๋Š” ์‹ค์ˆ˜๊ณต๊ฐ„์— ํŽผ์ณ์ง„ ์žฅ(field)์„ ์ด์›์ˆ˜ ๊ณต๊ฐ„์˜ ์žฅ์œผ๋กœ ๋ณ€ํ™˜ํ•œ ์—ฐ์‚ฐ์ž์ด๋ฏ€๋กœ ๋ฏธ๋ถ„๊ธฐํ•˜ํ•™์—์„œ์˜ Push-forward ๊ฐœ๋…์œผ๋กœ ๋ฐ›์•„๋“ค์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์›๋ž˜ ์‹ค์ˆ˜ ํƒ€์ž…์„ ๋ฐ›์•„์„œ ์‹ค์ˆ˜ ํƒ€์ž…์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜ $f$์— ์ž‘์šฉํ•ด์„œ ์ƒˆ๋กœ์šด ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด๋ƒˆ์œผ๋ฏ€๋กœ ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ์—์„œ์˜ lifting ๊ฐœ๋…์œผ๋กœ ๋ฐ›์•„๋“ค์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์‹ค์ˆ˜ ํ•จ์ˆ˜๋กœ ์ž‘์„ฑ๋œ ํ•จ์ˆ˜ $f$์— ์ž‘์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ํ•จ์ˆ˜ ํ˜•ํƒœ๋ฅผ ์ž‘์„ฑํ•˜๋ฏ€๋กœ Source code transformation, Operator overloading ํ˜น์€ Multiple dispatch ํ˜•ํƒœ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์ฒ˜๋Ÿผ ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์œ„ ์‹์˜ ์˜๋ฏธ๋Š” ํ•˜๋‚˜๋กœ ๊ท€๊ฒฐ๋˜๋Š”๋ฐ, ๋ฐ”๋กœ “์ •๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„"์„ ํ‘œํ˜„ํ•œ ์‹์ด๋ผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‹จ์ˆœํžˆ ์ผ์ฐจํ•ญ๊นŒ์ง€ ํ‘œํ˜„ํ•œ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฏธ๋ถ„์„ ํ‘œํ˜„ํ•˜๋Š”์ง€ ์˜์•„ํ•˜์‹คํ…Œ๋‹ˆ ํ•˜๋‚˜์˜ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ด…์‹œ๋‹ค. ์˜ˆ์‹œ๋กœ๋Š” ์•„์ฃผ ๊ฐ„๋‹จํ•œ ํ•จ์ˆ˜์ธ $v = \sin u$๋ฅผ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

1.1. ์˜ˆ์‹œ: Sin ํ•จ์ˆ˜ ์ž๋™๋ฏธ๋ถ„

$$ \begin{aligned} &\begin{aligned} v \mathrel{\rhd} \overline{v} &= \vec{\mathcal{J}}\sin (u \mathrel{\rhd} \overline{u}) \\ &= \sin u \mathrel{\rhd} (\cos u) \overline{u} \end{aligned} \\ \therefore ~ &v = \sin u,\quad \overline{v} = (\cos u) \overline{u} \end{aligned} $$

์ด๋กœ์จ $v = \sin u$์˜ ๋„ํ•จ์ˆ˜๋Š” $\overline{v} = (\cos u) \overline{u}$์ด๋ผ๋Š” ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์šฐ๋ฆฌ๊ฐ€ ํ”ํžˆ ์‚ฌ์šฉํ•˜๋Š” ์—ฐ์‡„ ๋ฒ•์น™(Chain rule)๊ณผ ์ผ๋งฅ์ƒํ†ตํ•˜๋ฏ€๋กœ ์˜ฌ๋ฐ”๋ฅธ ๊ณ„์‚ฐ์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์–ธ๋œป ๋ณด๋ฉด $\sin u$์˜ ๋„ํ•จ์ˆ˜๋ฅผ $\cos u$๋กœ ์•Œ๋ ค ์ค€ ๋‹ค์Œ ๊ณ„์‚ฐ์„ ํ•œ ์…ˆ์ด๋‹ˆ ๊ธฐํ˜ธ๋ฏธ๋ถ„๊ณผ ๋ฌด์—‡์ด ๋‹ค๋ฅธ์ง€ ์˜์•„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ง์ ‘ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด๋ณด๋ฉด ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#[derive(Debug, Copy, Clone)]
struct Dual {
    x: f64,
    dx: f64,
}

trait Sin {
    fn sin_(self) -> Self;
}

impl Sin for f64 {
    fn sin_(self) -> Self {
        self.sin()
    }
}

impl Sin for Dual {
    fn sin_(self) -> Self {
        Dual {
            x: self.x.sin(),
            dx: self.dx * self.x.cos(),
        }
    }
}

fn main() {
    let u = Dual { x: 1.0, dx: 1.0 }; // x at x=1
    let v = u.sin_();
    println!("v: {}, dv: {}", v.x, v.dx);
    // sin(1) = 0.8414..
    // sin'(x) = cos(x) = cos(1) = 0.5403..

    let w = v.sin_();
    println!("v: {}, dv: {}", v.x, v.dx);
    // sin(sin(1)) = 0.7456..
    // (sin(sin(x)))' = cos(sin(1)) * cos(1) = 0.3600..
}

Rust๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ„๋‹จํžˆ ์ด์›์ˆ˜ ๊ตฌ์กฐ์ฒด๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์‹ค์ˆ˜์™€ ์ด์›์ˆ˜ ๋ชจ๋‘์— ์ ์šฉ๋˜๋Š” Sin์ด๋ผ๋Š” trait๋ฅผ ์ž‘์„ฑํ•˜์—ฌ Method overloading์„ ์ด์šฉํ•˜์—ฌ ์ž๋™๋ฏธ๋ถ„์„ ๊ตฌํ˜„ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. mainํ•จ์ˆ˜์˜ ์ฒซ๋ฒˆ์งธ ์˜ˆ์‹œ๋ฅผ ๋ณด๋ฉด $u = 1 \mathrel{\rhd} 1$๋กœ ์ •์˜๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ๊ฐ’์ด 1์ด๊ณ  ๋„ํ•จ์ˆ˜๊ฐ€ 1์ด๋ผ๋Š” ์˜๋ฏธ์ด๋ฏ€๋กœ $x$๋ฅผ ํ‘œํ˜„ํ•œ ๊ฒƒ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์›์ˆ˜์˜ sin_ํ•จ์ˆ˜ ์ž์ฒด์— ๋„ํ•จ์ˆ˜ ๊ณ„์‚ฐ์ด ํฌํ•จ๋˜์–ด์žˆ์œผ๋ฏ€๋กœ let v = u.sin_();๋ผ๋Š” ์‹ ๋งŒ์œผ๋กœ ์ด๋ฏธ ํ•จ์ˆซ๊ฐ’๊ณผ ๋„ํ•จ์ˆซ๊ฐ’์ด ๊ณ„์‚ฐ๋˜๊ณ  ์ด๋Š” v.x์™€ v.dx๋กœ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๊ณ„์‚ฐ๋œ ๋„ํ•จ์ˆ˜์—๋Š” ๋ถ€๋™์†Œ์ˆ˜์  ์˜ค์ฐจ๋ฅผ ์ œ์™ธํ•œ ๋‹ค๋ฅธ ์˜ค์ฐจ๋Š” ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฉฐ ๊ณ„์‚ฐ ๋น„์šฉ๋„ ๋‹จ์ˆœํžˆ ๊ณฑํ•˜๊ธฐ ํ•œ ๋ฒˆ๊ณผ ์ฝ”์‚ฌ์ธ ํ•œ๋ฒˆ์„ ๊ณ„์‚ฐํ•œ๊ฒŒ ์ „๋ถ€์ž…๋‹ˆ๋‹ค.

๋‘ ๋ฒˆ์งธ ์˜ˆ์‹œ๋ฅผ ๋ณด๋ฉด ์ž๋™๋ฏธ๋ถ„์˜ ์—„์ฒญ๋‚œ ์„ฑ๋Šฅ์„ ๊ฐ€๋Š ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์•ž์„œ ๊ณ„์‚ฐํ•œ v์— ํ•œ๋ฒˆ ๋” sin_์„ ์ทจํ•˜์—ฌ w๋ฅผ ์ •์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ฆ‰, $w = \sin(\sin(u))$์ธ๋ฐ, ํ•จ์ˆ˜๊ฐ€ ๋ณต์žกํ•ด์ง„๊ฒƒ๊ณผ ๊ด€๊ณ„์—†์ด ๊ณ„์‚ฐ ์ฝ”๋“œ๋Š” ๋™์ผํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ๊ณ„์‚ฐ๋œ ๊ฒฐ๊ณผ๋Š” ๋†€๋ž๊ฒŒ๋„ ํ•ฉ์„ฑํ•จ์ˆ˜์˜ ๊ฐ’๊ณผ ๋„ํ•จ์ˆ˜๋ฅผ ์ •ํ™•ํžˆ ๊ณ„์‚ฐํ•ด๋‚ด๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ •๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„์„ ์ด์šฉํ•˜๋ฉด ์•„์ฃผ ๊ฐ„๋‹จํ•œ ๋„ํ•จ์ˆ˜ ์ „ํŒŒ ๊ทœ์น™์„ ๋ช…์‹œํ•ด๋†“๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ๋ณต์žกํ•˜๊ฒŒ ํ•ฉ์„ฑ๋œ ํ•จ์ˆ˜๋ฅผ ๋งค์šฐ ์ ์€ ๋น„์šฉ์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


2. ์ •๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„ ๊ตฌํ˜„

์œ„์ฒ˜๋Ÿผ ์ž๋™๋ฏธ๋ถ„์€ ํ•จ์ˆ˜์˜ ํ˜•ํƒœ์™€ ์ƒ๊ด€์—†์ด ์ž‘๋™ํ–ˆ๋˜ ์ˆ˜์น˜๋ฏธ๋ถ„๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ ๊ฐ ํ•จ์ˆ˜์˜ ํ˜•ํƒœ๋งˆ๋‹ค ๋„ํ•จ์ˆ˜์˜ ์ „ํŒŒ ๊ทœ์น™์„ ๋ช…์‹œํ•ด์ค˜์•ผํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฒˆ์—๋Š” ์‚ฌ์น™์—ฐ์‚ฐ์„ ํฌํ•จํ•˜์—ฌ ๋Œ€ํ‘œ์ ์ธ ํ•จ์ˆ˜ ๋ช‡ ๊ฐ€์ง€์— ๋Œ€ํ•ด ์ž๋™๋ฏธ๋ถ„์„ ๊ตฌํ˜„ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

2.1. ์‚ฌ์น™์—ฐ์‚ฐ

$w = u \pm v$๋กœ ์ •์˜๋œ $w$์— ๋Œ€ํ•ด์„œ ์•„๊นŒ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ž๋™๋ฏธ๋ถ„์„ ์ˆ˜ํ–‰ํ•ด๋ด…์‹œ๋‹ค.

$$ \begin{aligned} &\begin{aligned} w \mathrel{\rhd} \overline{w} &= u \mathrel{\rhd} \overline{u} + v \mathrel{\rhd} \overline{v} \\ &= (u + v) \mathrel{\rhd} (\overline{u} + \overline{v}) \end{aligned} \\ \therefore ~ &w = u + v,\quad \overline{w} = \overline{u} + \overline{v} \end{aligned} $$

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ด๋ฒˆ์—” $w = u \times v$์— ์ž๋™๋ฏธ๋ถ„์„ ์ ์šฉํ•ด๋ด…์‹œ๋‹ค.

$$ \begin{aligned} &\begin{aligned} w \mathrel{\rhd} \overline{w} &= (u \mathrel{\rhd} \overline{u}) \times (v \mathrel{\rhd} \overline{v}) \\ &= (u + \overline{u}\epsilon) \times (v + \overline{v}\epsilon) \\ &= uv + (u \overline{v} + \overline{u}v)\epsilon \\ &= (u \times v) \mathrel{\rhd} (u\overline{v} + \overline{u}v) \end{aligned} \\ \therefore ~ &w = u \times v,\quad \overline{w} = u\overline{v} + \overline{u}v \end{aligned} $$

์œ„ ๋‘ ๊ฒฐ๊ณผ๋Š” ๊ฐ๊ฐ ์šฐ๋ฆฌ๊ฐ€ ์ตํžˆ ์•Œ๊ณ ์žˆ๋Š” ๋ฏธ๋ถ„์˜ ์„ ํ˜•์„ฑ๊ณผ ๋ผ์ดํ”„๋‹ˆ์ธ  ๊ทœ์น™๊ณผ ์ •ํ™•ํžˆ ๋“ค์–ด๋งž์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„์—์„œ ์ •์˜ํ•œ Dual ๊ตฌ์กฐ์ฒด์— ์ ์šฉํ•ด๋ด…์‹œ๋‹ค.

use std::ops::{Add, Sub, Mul};

impl Add for Dual {
    type Output = Self;

    fn add(self, rhs: Self) -> Self {
        Self {
            x: self.x + rhs.x,
            dx: self.dx + rhs.dx,
        }
    }
}

impl Sub for Dual {
    type Output = Self;

    fn sub(self, rhs: Self) -> Self {
        Self {
            x: self.x - rhs.x,
            dx: self.dx - rhs.dx,
        }
    }
}

impl Mul for Dual {
    type Output = Self;

    fn mul(self, rhs: Self) -> Self {
        Self {
            x: self.x * rhs.x,
            dx: self.x * rhs.dx + self.dx * rhs.x,
        }
    }
}

fn main() {
    let u = Dual { x: 1f64, dx: 1f64 }; // x at x=1
    let v = Dual { x: 2f64, dx: 4f64 }; // 2x^2 at x=1

    let w = u + v;
    println!("w: {}, dw: {}", w.x, w.dx);
    // w: 3, dw: 5

    let w = u * v;
    println!("w: {}, dw: {}", w.x, w.dx);
    // w = 2x^3 at x=1 = 2
    // dw = 6x^2 at x=1 = 6
}

ย 

2.2. ์ง€์ˆ˜, ๋กœ๊ทธ, ๋‹คํ•ญ ํ•จ์ˆ˜

๋‹ค์Œ์œผ๋กœ๋Š” ์ง€์ˆ˜ ๋กœ๊ทธ ํ•จ์ˆ˜์˜ ๋Œ€ํ‘œ์ ์ธ ํ•จ์ˆ˜๋“ค์ธ $y=e^x,\,y=\ln x$์™€ ๋‹คํ•ญํ•จ์ˆ˜์ธ $y=x^n$์— ๋Œ€ํ•ด์„œ ์ž๋™๋ฏธ๋ถ„์„ ์ˆ˜ํ–‰ํ•ด๋ด…์‹œ๋‹ค.

  1. ์ง€์ˆ˜ํ•จ์ˆ˜ $$ \begin{aligned} &\begin{aligned} v \mathrel{\rhd} \overline{v} &= \exp(u \mathrel{\rhd} \overline{u}) \\ &= e^u \mathrel{\rhd} e^{u} \overline{u} \end{aligned} \\ \therefore ~ &v = e^u,\quad \overline{v} = e^{u} \overline{u} \end{aligned} $$

  2. ๋กœ๊ทธํ•จ์ˆ˜ $$ \begin{aligned} &\begin{aligned} w \mathrel{\rhd} \overline{w} &= \ln(u \mathrel{\rhd} \overline{u}) \\ &= \ln u \mathrel{\rhd} \frac{\overline{u}}{u} \end{aligned} \\ \therefore ~ &w = \ln u,\quad \overline{w} = \frac{\overline{u}}{u} \end{aligned} $$

  3. ๋‹คํ•ญํ•จ์ˆ˜ $$ \begin{aligned} &\begin{aligned} w \mathrel{\rhd} \overline{w} &= (u \mathrel{\rhd} \overline{u})^n \\ &= u^n \mathrel{\rhd} n u^{n-1} \overline{u} \end{aligned} \\ \therefore ~ &w = u^n,\quad \overline{w} = n u^{n-1} \overline{u} \end{aligned} $$

์•ž์„œ ๊ตฌํ˜„ํ•œ sinํ•จ์ˆ˜์™€ ๊ฐ™์ด ๋‹ค๋ฅธ ์‚ผ๊ฐํ•จ์ˆ˜๋“ค๋„ ๋ชจ๋‘ ๊ตฌํ˜„ํ•˜์—ฌ ํ•˜๋‚˜์˜ trait์„ ์„ ์–ธํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.


trait Ops {
    fn exp(self) -> Self;
    fn ln(self) -> Self;
    fn sin(self) -> Self;
    fn cos(self) -> Self;
    fn tan(self) -> Self;
    fn powi(self, n: i32) -> Self;
}

impl Ops for Dual {
    fn exp(self) -> Self {
        Self {
            x: self.x.exp(),
            dx: self.x * self.dx.exp(),
        }
    }

    fn ln(self) -> Self {
        Self {
            x: self.x.ln(),
            dx: self.dx / self.x,
        }
    }

    fn sin(self) -> Self {
        Self {
            x: self.x.sin(),
            dx: self.x.cos() * self.dx,
        }
    }

    fn cos(self) -> Self {
        Self {
            x: self.x.cos(),
            dx: -self.x.sin() * self.dx,
        }
    }

    fn tan(self) -> Self {
        let tan = self.x.tan();
        Self {
            x: tan,
            dx: self.dx * (tan * tan + 1.0),
        }
    }

    fn powi(self, n: i32) -> Self {
        Self {
            x: self.x.powi(n),
            dx: n as f64 * self.x.powi(n - 1) * self.dx,
        }
    }
}

ย 

2.3. ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜

์œ„์— ์žˆ๋Š” ํ•จ์ˆ˜๋“ค์— ๋ช‡ ๊ฐ€์ง€ ์—ฐ์‚ฐ ๊ตฌํ˜„์„ ๋”ํ•˜๋ฉด ๋ณ„๋„๋กœ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ์ „ํŒŒ๋ฐฉ๋ฒ•์„ ๊ตฌํ˜„ํ•˜์ง€ ์•Š๋”๋ผ๋„ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ž๋™๋ฏธ๋ถ„์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

trait Sigmoid: Sized 
    + Ops
    + Neg<Output=Self>
    + Add<f64, Output=Self> 
where
    f64: Div<Self, Output=Self> {
    fn sigmoid(self) -> Self {
        1f64 / ((-self).exp() + 1f64)
    }
}

impl Sigmoid for Dual {}

fn main() {
    let u = Dual { x: 1.0, dx: 1.0 }; // x at x=1
    let z = u.sigmoid();
    println!("z: {}, dz: {}", z.x, z.dx);
    // sigmoid(1), sigmoid'(x) = sigmoid(1) * (1 - sigmoid(1))
}

์ฃผ์˜: ์œ„์— ์ƒ๊ธฐํ•œ ์ฝ”๋“œ์™ธ์—๋„ f64์™€์˜ ์—ฐ์‚ฐ์ด ์ถ”๊ฐ€์ ์œผ๋กœ ์ •์˜๋˜์–ด์•ผ ์ž‘๋™ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ด ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ๋‹ค์Œ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

github.com/Axect/dual


๋งˆ์น˜๋ฉฐ

์ง€๊ธˆ๊นŒ์ง€ ์ด์›์ˆ˜๋ฅผ ์ด์šฉํ•œ ์ •๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•œ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด์„œ๋งŒ ๋‹ค๋ค˜๋Š”๋ฐ, ์‹ค์ œ๋กœ ์ž˜ ์ž‘๋™๋˜๋Š” ์ž๋™๋ฏธ๋ถ„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‚ฌํ•ญ๋“ค์„ ๊ณ ๋ คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

  • ๊ณ ๊ณ„ ๋„ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ž๋™๋ฏธ๋ถ„

  • ๋‹ค๋ณ€์ˆ˜ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ž๋™๋ฏธ๋ถ„

  • ํ–‰๋ ฌ๊ณผ ๋ฒกํ„ฐ ๋ณ€์ˆ˜๋ฅผ ํฌํ•จํ•œ ํ•จ์ˆ˜์˜ ์ž๋™๋ฏธ๋ถ„

์ด ์ค‘ ์ฒซ 2๊ฐ€์ง€ ์‚ฌํ•ญ์— ๋Œ€ํ•ด์„œ๋Š” Rust ์ˆ˜์น˜๊ณ„์‚ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ Peroxide์— ์ž˜ ๊ตฌํ˜„๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ์‚ฌํ•ญ์— ๋Œ€ํ•ด์„œ๋Š” ์—ญ๋ฐฉํ–ฅ ์ž๋™๋ฏธ๋ถ„๊ณผ ํ•จ๊ป˜ ๋‹ค์Œ ๊ธ€์—์„œ ๋‹ค๋ฃจ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.


์ฐธ๊ณ ๋ฌธํ—Œ

]]>
๐Ÿ“Š Piecewise Rejection Sampling https://axect.github.io/kr/posts/006_prs/ Fri, 18 Nov 2022 17:49:04 +0900 https://axect.github.io/kr/posts/006_prs/ <figure> <img src="https://axect.github.io/posts/images/006_01_test_dist.png" alt="Differential energy spectrum of ALPs from primordial black hole (PBH)${}^{[1]}$"/> <figcaption style="text-align:center"> <p>Differential energy spectrum of ALPs from primordial black hole (PBH)<a href="https://axect.github.io/kr/#footnotes">${}^{[1]}$</a></p> </figcaption> </figure> <p>โ€ƒโ€ƒ๋ˆ„๊ตฐ๊ฐ€ ์œ„์™€ ๊ฐ™์ด ์ •๊ทœํ™” ๋˜์ง€ ์•Š์€ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ฐ€์ ธ์™”๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ด…์‹œ๋‹ค. ๊ทธ๋Ÿฌ๊ณ ์„œ๋Š” ๋‹น์‹ ์—๊ฒŒ ์ด๋Ÿฌํ•œ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ 10000๊ฐœ๋ฅผ ๋งŒ๋“ค์–ด๋‹ฌ๋ผ๊ณ  ๋ถ€ํƒํ•œ๋‹ค๋ฉด, ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ• ๊นŒ์š”?</p> <p>์ผ๋‹จ, ์ž„์˜์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋ง ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๊ฐ€์žฅ ์ž˜ ์•Œ๋ ค์ง„ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ๋‹ค์Œ์˜ 2๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.</p> <ol> <li><a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">Inverse Transform Sampling</a></li> <li><a href="https://en.wikipedia.org/wiki/Rejection_sampling">Rejection Sampling</a></li> </ol> <p>Inverse Transform Sampling์€ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์˜ ๋ˆ„์ ๋ถ„ํฌํ•จ์ˆ˜๋ฅผ ๊ตฌํ•˜๊ณ , ๊ทธ ๋ˆ„์ ๋ถ„ํฌํ•จ์ˆ˜์˜ ์—ญํ•จ์ˆ˜๋ฅผ ๊ตฌํ•œ ๋’ค, ๊ทธ ์—ญํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํšจ์œจ์ ์ด์ง€๋งŒ, ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๊ฐ€ ์–ด๋–ค ํ˜•ํƒœ๋ฅผ ๊ฐ–๋А๋ƒ์— ๋”ฐ๋ผ์„œ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์—, ์ง€๊ธˆ์˜ ๊ฒฝ์šฐ์ฒ˜๋Ÿผ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์˜ ์ •ํ™•ํ•œ ๊ผด์„ ๋ชจ๋ฅผ ๋•Œ๋Š” ์‚ฌ์šฉํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.<a href="https://axect.github.io/kr/posts/006_prs/#footnotes">${}^{[2]}$</a> ๊ทธ๋Ÿฌ๋‚˜, Rejection Sampling์€ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๊ฐ€ ์–ด๋–ค ํ˜•ํƒœ๋ฅผ ๊ฐ–๋А๋ƒ์— ์ƒ๊ด€์—†์ด ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ์ด ๋ฐฉ๋ฒ•์œผ๋กœ ์‹œ์ž‘ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.</p> Differential energy spectrum of ALPs from primordial black hole (PBH)${}^{[1]}$

Differential energy spectrum of ALPs from primordial black hole (PBH)${}^{[1]}$

โ€ƒโ€ƒ๋ˆ„๊ตฐ๊ฐ€ ์œ„์™€ ๊ฐ™์ด ์ •๊ทœํ™” ๋˜์ง€ ์•Š์€ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ฐ€์ ธ์™”๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ด…์‹œ๋‹ค. ๊ทธ๋Ÿฌ๊ณ ์„œ๋Š” ๋‹น์‹ ์—๊ฒŒ ์ด๋Ÿฌํ•œ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ 10000๊ฐœ๋ฅผ ๋งŒ๋“ค์–ด๋‹ฌ๋ผ๊ณ  ๋ถ€ํƒํ•œ๋‹ค๋ฉด, ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ• ๊นŒ์š”?

์ผ๋‹จ, ์ž„์˜์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋กœ๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋ง ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๊ฐ€์žฅ ์ž˜ ์•Œ๋ ค์ง„ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ๋‹ค์Œ์˜ 2๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. Inverse Transform Sampling
  2. Rejection Sampling

Inverse Transform Sampling์€ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์˜ ๋ˆ„์ ๋ถ„ํฌํ•จ์ˆ˜๋ฅผ ๊ตฌํ•˜๊ณ , ๊ทธ ๋ˆ„์ ๋ถ„ํฌํ•จ์ˆ˜์˜ ์—ญํ•จ์ˆ˜๋ฅผ ๊ตฌํ•œ ๋’ค, ๊ทธ ์—ญํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํšจ์œจ์ ์ด์ง€๋งŒ, ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๊ฐ€ ์–ด๋–ค ํ˜•ํƒœ๋ฅผ ๊ฐ–๋А๋ƒ์— ๋”ฐ๋ผ์„œ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์—, ์ง€๊ธˆ์˜ ๊ฒฝ์šฐ์ฒ˜๋Ÿผ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์˜ ์ •ํ™•ํ•œ ๊ผด์„ ๋ชจ๋ฅผ ๋•Œ๋Š” ์‚ฌ์šฉํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.${}^{[2]}$ ๊ทธ๋Ÿฌ๋‚˜, Rejection Sampling์€ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๊ฐ€ ์–ด๋–ค ํ˜•ํƒœ๋ฅผ ๊ฐ–๋А๋ƒ์— ์ƒ๊ด€์—†์ด ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ์ด ๋ฐฉ๋ฒ•์œผ๋กœ ์‹œ์ž‘ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. Rejection Sampling

โ€ƒโ€ƒ Rejection sampling (ํ˜น์€ Acceptance-rejection method)์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋งค์šฐ ๊ฐ„๋‹จํ•œ ํŽธ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์ €ํฌ๊ฐ€ samplingํ•˜๊ณ  ์‹ถ์€ ํ™•๋ฅ ๋ถ„ํฌ์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ $f(x)$๋ผ ๋ช…๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๋Š” $f(x)$๋กœ๋ถ€ํ„ฐ ๋ฐ”๋กœ samplingํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ชจ๋ฅด๋ฏ€๋กœ sampling ํ•  ์ˆ˜ ์žˆ๋Š” ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์ธ $g(x)$๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, $g(x)$๋Š” ํŠน์ •ํ•œ ์–‘์˜ ์ƒ์ˆ˜ $M$์— ๋Œ€ํ•˜์—ฌ ์ •์˜์—ญ์˜ ๋ชจ๋“  $x$์— ๋Œ€ํ•˜์—ฌ $f(x) \leq M \cdot g(x)$ ์กฐ๊ฑด์ด ์„ฑ๋ฆฝํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ทธ๋ฆผ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ž…์ž๋ฌผ๋ฆฌ์—์„œ ์ž์ฃผ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ทธ๋ž˜ํ”„

์ž…์ž๋ฌผ๋ฆฌ์—์„œ ์ž์ฃผ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ทธ๋ž˜ํ”„

๋ณดํ†ต samplingํ•˜๊ธฐ ๊ฐ€์žฅ ์‰ฌ์šด ํ™•๋ฅ ๋ถ„ํฌ๋Š” Uniform distribution(๊ท ๋“ฑ๋ถ„ํฌ)์ด๋ฏ€๋กœ ์—ฌ๊ธฐ์„œ๋Š” $g(x)=\text{Unif}(x|0,10)$์œผ๋กœ ์ •์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ $f(x)$์˜ ์ตœ๋Œ“๊ฐ’์ด 1์ด๋ฏ€๋กœ $M=10$์œผ๋กœ ์ •ํ•˜์—ฌ $M\times g(x)$๊ฐ€ ํ•ญ์ƒ $f(x)$๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ๊ฐ™์Œ์„ ๋ณด์žฅํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ด ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•˜์—ฌ samplingํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  1. $y$๋ฅผ $g(x)$๋กœ ๋ถ€ํ„ฐ samplingํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์—” $y$๋Š” $\text{Unif}(x|0,10)$๋กœ๋ถ€ํ„ฐ sampling๋œ ๊ฐ’์ด ๋ฉ๋‹ˆ๋‹ค.

  2. ๊ทธ๋ ‡๊ฒŒ ์ถ”์ถœ๋œ $y$์— ๋Œ€ํ•˜์—ฌ ๋˜ ๋‹ค๋ฅธ ๊ท ๋“ฑ๋ถ„ํฌ์ธ $\text{Unif}(u|0,M\times g(y))$๋กœ๋ถ€ํ„ฐ $u$๋ฅผ samplingํ•ฉ๋‹ˆ๋‹ค. (์ด๋Š” $g(x)$์˜ ํ˜•ํƒœ์™€ ๋ณ„๊ฐœ๋กœ ํ•ญ์ƒ ๊ท ๋“ฑ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.)

  3. ๋งŒ์•ฝ $u \leq f(y)$์ด๋ฉด $y$๋ฅผ ์šฐ๋ฆฌ๊ฐ€ samplingํ•˜๊ณ ์ž ํ•˜๋Š” ํ™•๋ฅ ๋ถ„ํฌ $f(x)$๋กœ๋ถ€ํ„ฐ sampling๋œ ๊ฐ’์œผ๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ๋‹ค์‹œ 1๋ฒˆ์œผ๋กœ ๋Œ์•„๊ฐ€์„œ $y$๋ฅผ samplingํ•ฉ๋‹ˆ๋‹ค.

์ด sampling ๋ฐฉ๋ฒ•์ด Rejection sampling(๊ธฐ๊ฐ ์ƒ˜ํ”Œ๋ง) ์ž…๋‹ˆ๋‹ค. 3๋ฒˆ์—์„œ ๋ณด๋‹ค์‹œํ”ผ $u$๊ฐ€ $f(y)$๋ณด๋‹ค ํฌ๋‹ค๋ฉด ๊ธฐ๊ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ถ™์—ฌ์ง„ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ„๋‹จํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋งŒ์œผ๋กœ ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ํ™•๋ฅ ๋ถ„ํฌ์ธ $f(x)$๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ˆ„์ ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜(CDF)๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ™•๋ฅ ๋ณ€์ˆ˜ $X$๋ฅผ Rejection sampling์œผ๋กœ ์–ป์–ด์ง„ ํ™•๋ฅ ๋ณ€์ˆ˜๋ผ ํ•˜๋ฉด, ๋‹ค๋ฅธ ๋‘ ํ™•๋ฅ ๋ณ€์ˆ˜ $Y \sim g(y)$, $U \sim \text{Unif}(u|0, M\cdot g(y))$์— ๋Œ€ํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ด€๊ณ„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. $$ P(X \leq x) = P\left(Y \leq x \,|\, U < f(Y)\right) $$ ์šฐ๋ณ€์˜ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์€ $U$๊ฐ€ ๊ธฐ๊ฐ๋˜์ง€ ์•Š์•˜์„ ๋•Œ, $Y$๊ฐ€ $x$๋ณด๋‹ค ์ž‘์„ ํ™•๋ฅ ์ด๋ฉฐ ์ด๋Š” ์œ„์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ 3๋ฒˆ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์˜ ์ •์˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ณ€ํ˜•ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. $$ P(X \leq x) = \frac{P (Y \leq x,~U < f(Y))}{P(U < f(Y))} $$ ๋จผ์ € ์œ„ ์‹์˜ ๋ถ„์ž๋ฅผ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ „๊ฐœํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. $$ \begin{aligned} P(Y \leq x,U < f(Y)) &= \int P(Y \leq x,U < f(Y) | Y = y) \cdot g(y) \,dy \\ &= \int P(y \leq x,~U < f(y)) \cdot g(y) \,dy \\ &= \int ๐Ÿ™_{y \leq x} \cdot P(U < f(y))\cdot g(y) \, dy \\ &= \int_{-\infty}^x P(U < f(y)) \cdot g(y) \, dy \end{aligned} $$ 2๋ฒˆ์งธ ์‹์—์„œ 3๋ฒˆ์งธ๋กœ ๋„˜์–ด๊ฐˆ๋•Œ๋Š” $y \leq x$์™€ $U < f(y)$๋Š” ๋…๋ฆฝ์ด๋ผ๋Š” ์กฐ๊ฑด์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด์ œ $U \sim \text{Unif}(u|0,\,M\cdot g(y))$์ด๋ฏ€๋กœ $\displaystyle P(U < f(y)) = \frac{1}{M\cdot g(y)} \times (f(y) - 0)$์„ ๋Œ€์ž…ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. $$ \begin{aligned} P(Y \leq x,~U < f(Y)) &= \int_{-\infty}^x \frac{f(y)}{M\cdot g(y)}\cdot g(y) \, dy \\ &= \frac{1}{M} \int_{-\infty}^x f(y) \, dy \end{aligned} $$
์ด์ œ ๋ถ„๋ชจ๋ฅผ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. $$ \begin{aligned} P(U < f(Y)) &= \int P(U < f(y)) \cdot g(y) \, dy \\ &= \int \frac{f(y)}{M\cdot g(y)} \cdot g(y) \, dy \\ &= \frac{1}{M} \int f(y) \, dy \\ &= \frac{1}{M} \end{aligned} $$ ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ„์ž์™€ ๋ถ„๋ชจ๋ฅผ ๋‚˜๋ˆ„๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. $$ P(X \leq x) = \int_{-\infty}^x f(y) \, dy $$ ์ด๋Š” $f(x)$์˜ ๋ˆ„์ ๋ถ„ํฌํ•จ์ˆ˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Rejection sampling์œผ๋กœ ์–ป์–ด์ง„ ํ™•๋ฅ ๋ณ€์ˆ˜ $X$์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋Š” $f(x)$๋ผ๋Š” ๊ฒƒ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์ œ ์ˆ˜ํ•™์ ์œผ๋กœ ์ฆ๋ช…์„ ๋งˆ์ณค์œผ๋‹ˆ, ์ด๋ฒˆ์—๋Š” ์ด๊ฒƒ์ด ์‹ค์ œ๋กœ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด๋ด…์‹œ๋‹ค.

// Rust
use peroxide::fuga::*;

const M: f64 = 10.0;
const N: usize = 100_000;

fn main() {
    // Create g(x)=Unif(x|0,10) & h(y)=Unif(y|0,M)
    let g = Uniform(0.0, 10.0);
    let h = Uniform(0.0, M);

    // Rejection sampling
    let mut x_vec = vec![0f64; N];
    let mut i = 0usize;
    while i < N {
        let x = g.sample(1)[0];
        let y = h.sample(1)[0];

        if y <= f(x) {      // Accept
            x_vec[i] = x;
            i += 1;
        } else {            // Reject
            continue;
        }
    }

    // ...
}

// Test function
fn f(x: f64) -> f64 {
    1f64 / (x+1f64).sqrt() + 0.2 * (-(x-3f64).powi(2) / 0.2).exp()
}

์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž์ฒด๊ฐ€ ๊ฐ„๋‹จํ•˜๋ฏ€๋กœ ์ฝ”๋“œ ์—ญ์‹œ ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ทธ๋Ÿผ์—๋„ ๊ฒฐ๊ณผ๋Š” ๋›ฐ์–ด๋‚ฉ๋‹ˆ๋‹ค.

Result of rejection sampling

Result of rejection sampling

ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์˜ ํ˜•ํƒœ์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š์œผ๋ฉฐ ๊ตฌํ˜„๊นŒ์ง€ ์‰ฌ์šด Rejection sampling์ด์ง€๋งŒ ์น˜๋ช…์ ์ธ ๋‹จ์  ์—ญ์‹œ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๋กœ ๊ณ„์‚ฐ ํšจ์œจ์ด ๊ต‰์žฅํžˆ ๋–จ์–ด์ง„๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Rejection sampling์—์„œ sample์„ ํ™•๋ณดํ•˜๋ ค๋ฉด ๊ธฐ๊ฐ ์กฐ๊ฑด์—์„œ ์‚ด์•„๋‚จ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ฆ‰, $P(U < f(Y))$๊ฐ€ ๋†’์„ ์ˆ˜๋ก ๋น ๋ฅด๊ฒŒ sample์„ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๊ณ  ๋ฐ˜๋Œ€๋กœ ๋‚ฎ์„ ์ˆ˜๋ก ์ถฉ๋ถ„ํ•œ ์ˆ˜์˜ sample์„ ํ™•๋ณดํ•˜๊ธฐ๊นŒ์ง€ ์˜ค๋ž˜ ๊ฑธ๋ฆฐ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ Acceptance rate (์Šน์ธ ๋น„์œจ)๋ผ๊ณ  ํ•˜๋ฉฐ ์ด๋Š” ์ด๋ฏธ ์œ„์˜ ์ฆ๋ช…์—์„œ ๊ณ„์‚ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.


Acceptance rate
Rejection์˜ Acceptace rate๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค. $$ P(U < f(Y)) = \int P(U < f(y)) \cdot g(y) \, dy $$

์ €ํฌ๊ฐ€ ์‚ฌ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ๋Š” ์ด๋Š” $1/M$๊ณผ ๊ฐ™๊ณ , ์ด๋Š” $f(x)$๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋„“์ด๋ฅผ $M \cdot g(x)$๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ์ „์ฒด ๋„“์ด๋กœ ๋‚˜๋ˆˆ ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ถ„ํฌ๊ฐ„์˜ ์ฐจ์ด๊ฐ€ ํฌ๋ฉด ํด์ˆ˜๋ก Acceptance rate๊ฐ€ ๋‚ฎ์•„์ง€๊ณ  ๊ทธ๋งŒํผ sample์„ ํ™•๋ณดํ•˜๋Š”๋ฐ ์˜ค๋žœ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฐ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋‚˜๋งˆ ์ €ํฌ๊ฐ€ ์‚ฌ์šฉํ•œ ์˜ˆ์‹œ๋Š” $g(x)$์™€ $f(x)$์˜ ์ฐจ์ด๊ฐ€ ํฐ ๋ถ€๋ถ„์ด ๋งŽ์ง€ ์•Š์•„์„œ ๋น„๊ต์  ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค๋งŒ, ์‹œ์ž‘ํ•  ๋•Œ ์ œ์‹œํ•˜์˜€๋˜ ํ™•๋ฅ ๋ถ„ํฌ์ฒ˜๋Ÿผ 0์ธ ๋ถ€๋ถ„์ด ๋งŽ์€ ๊ฒฝ์šฐ์—๋Š” $g(x)$๋กœ Uniform distribution์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ๋Œ€๋ถ€๋ถ„ ๊ธฐ๊ฐ๋˜์–ด๋ฒ„๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ์‹œ๊ฐ„์ด ์˜ค๋ž˜๊ฑธ๋ฆด๋ฟ๋”๋Ÿฌ ๊ฑฐ์˜ 0 ๊ทผ์ฒ˜์˜ tail์— ๋Œ€ํ•ด์„œ๋Š” sampling์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ์ง€๊ฒฝ์—๊นŒ์ง€ ์ด๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ์ด๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?


2. Piecewise Rejection Sampling

โ€ƒโ€ƒ์‚ฌ์‹ค ์ด๋ฏธ ์—ฐ๊ตฌ์ž๋“ค์€ ์ด๋Ÿฐ ๊ฒฝ์šฐ๋ฅผ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ Adaptive Rejection Sampling (ARS) ์™€ Adaptive Rejection Metropolis Sampling (ARMS) ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์ž์˜ ๊ฒฝ์šฐ๋Š” ํšจ์œจ์ ์ด์ง€๋งŒ ํ•จ์ˆ˜๊ฐ€ ๋ฐ˜๋“œ์‹œ ๋กœ๊ทธ-์˜ค๋ชฉ(log-concave)ํ•˜๋‹ค๋Š” ๋ณด์žฅ์ด ์žˆ์–ด์•ผ ํ•˜๊ณ  ํ›„์ž์˜ ๊ฒฝ์šฐ๋Š” ์ด๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ ์ผ๋ฐ˜ํ™”ํ–ˆ์ง€๋งŒ ๊ตฌํ˜„์ด ์ƒ๋‹นํžˆ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ก  ์ด๋ฏธ ์ž˜ ๋งŒ๋“ค์–ด์ง„ R package ๋“ฑ์ด ์žˆ์œผ๋‹ˆ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ์ž์ฒด๋Š” ์–ด๋ ต์ง€ ์•Š์Šต๋‹ˆ๋‹ค๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ๋” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๋กœ Piecewise Rejection Sampling (PRS) ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ œ๊ฐ€ ๋ฌผ๋ฆฌํ•™ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋˜ ์ค‘์— ์ฒ˜์Œ์— ์ œ์‹œํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ๋งŒ๋“  ๋ฐฉ๋ฒ•์œผ๋กœ, ๊ธฐ๋ณธ์ ์ธ ํ† ๋Œ€๋Š” Rejection sampling๊ณผ ๊ฐ™์ง€๋งŒ $g(x)$๋ฅผ ๋‹จ์ˆœํ•œ Uniform distribution์ด ์•„๋‹Œ $f(x)$์— ์ตœ์ ํ™”๋œ Weighted uniform distribution (๊ฐ€์ค‘๊ท ๋“ฑ๋ถ„ํฌ)๋กœ ์ƒ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์ฐจ๊ทผ์ฐจ๊ทผ ์„ค๋ช…ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

2.1. Max-Pooling

โ€ƒโ€ƒ์•„๋งˆ ๋”ฅ๋Ÿฌ๋‹์— ๊ด€์‹ฌ์žˆ๋Š” ์‚ฌ๋žŒ์ด๋ผ๋ฉด Max Pooling ์ด๋ผ๋Š” ๊ฐœ๋…์„ ์ตํžˆ ๋“ค์–ด๋ดค์„๊ฒ๋‹ˆ๋‹ค. CNN์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” Max pooling์€ ์‚ฌ์‹ค ๊ฐœ๋…์€ ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ผ๋‹จ Max pooling์˜ ์ˆ˜ํ•™์  ์ •์˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.


Max pooling

Let $f\,:\,[a,b]\to \mathbb{R}$ be a continuous function and consider the equidistant partition of the interval $[a,b]$ $$ a = x_0 < x_1 < \cdots < x_{n-1} < x_n = b $$ The partitions size, $(b-a)/n$ is called stride. Denote by $\displaystyle M_i = \max_{[x_{i-1},x_i]}f(x)$ and consider the simple function

$$ S_n(x) = \sum_{i=1}^n M_i ๐Ÿ™_{[x_{i-1},x_i)}(x). $$

The process of approximating the function $f(x)$ by the simple function $S_n(x)$ is called max-pooling.

์—ฌ๊ธฐ์„œ ์‚ฌ์šฉํ•œ simple function์ด๋ž€ ์ธก๋„๋ก (Measure theory)์—์„œ ๋“ฑ์žฅํ•˜๋Š” ํ•จ์ˆ˜๋กœ ๊ฐ ๊ตฌ๊ฐ„์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์ƒ์ˆ˜๋ฅผ ๊ฐ€์ง€๋Š” ํ•จ์ˆ˜๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์ •์˜๋Š” Precise Machine Learning with Rust์˜ Definition 10๊ณผ Property 1์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ตญ Max-pooling์ด๋ž€ ๊ฐ„๋‹จํžˆ ๋งํ•ด $f(x)$๋ฅผ $n$๊ฐœ์˜ ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ ๊ตฌ๊ฐ„์—์„œ $f(x)$์˜ ์ตœ๋Œ€๊ฐ’์„ ๊ตฌํ•˜์—ฌ ์ด๋ฅผ ๊ฐ ๊ตฌ๊ฐ„์—์„œ์˜ ๋Œ€ํ‘ฏ๊ฐ’(Representative value)์œผ๋กœ ๊ฐ–๋Š” simple function์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์ฒ˜์Œ ์ œ์‹œํ•œ ๋ถ„ํฌ์— ๋Œ€ํ•ด ์ ์šฉํ•ด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Max-pooling for Test Distribution

Max-pooling for Test Distribution

๊ทธ๋ฆผ์˜ ๋นจ๊ฐ„์ƒ‰ ์‹ค์„ ์„ $f(x)$๋ผ ํ•˜๋ฉด ํŒŒ๋ž€์ƒ‰ ์ ์„ ์€ $f(x)$๋ฅผ max-poolingํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์ด์ฏค๋˜๋ฉด ์ด๋ฏธ ๋ˆˆ์น˜์ฑˆ ๋ถ„๋“ค๋„ ์žˆ์„ํ…๋ฐ, ์ €ํฌ๋Š” ์ด ํŒŒ๋ž€์ƒ‰ ์ ์„ ์„ Rejection sampling์—์„œ์˜ $M\cdot g(x)$๋กœ ์‚ฌ์šฉํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๊ฐ ๊ตฌ๊ฐ„๋ณ„๋กœ๋Š” ๊ท ๋“ฑ๋ถ„ํฌ์˜ ํ˜•ํƒœ๋ฅผ ๊ฐ–์ง€๋งŒ ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋Œ€ํ‘ฏ๊ฐ’์„ ๊ฐ–๋Š” ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์ •์˜ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ๋ฐ”๋กœ ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ Weighted uniform distribution (๊ฐ€์ค‘๊ท ๋“ฑ๋ถ„ํฌ)์ž…๋‹ˆ๋‹ค.

2.2. Weighted Uniform Distribution


Weighted uniform distribution
Let $(S, \mathcal{F}, \mu)$ be a measure space. For a disjoint family $\mathcal{A} = \left\{A_i\right\}_{i=1}^n \in \mathcal{F}$ of measurable sets with non-zero measure and a family $\mathbf{M} = \{M_i\}_{i=1}^n$ of non-negative real numbers (but $\sum_i M_i > 0$), define the weighted uniform distribution on $S$ by $$ \text{WUnif}(x|\mathbf{M}, \mathcal{A}) = \frac{1}{\sum_{j}M_j \cdot \mu(A_j)}\sum_i M_i ๐Ÿ™_{A_i}(x) $$

์ •์˜๋Š” ๋ญ”๊ฐ€ ๋ณต์žกํ•ด ๋ณด์ด์ง€๋งŒ, ์ด๋ฅผ 1์ฐจ์› ๊ตฌ๊ฐ„์— ๋Œ€ํ•ด์„œ ์ •์˜ํ•˜๋ฉด ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • $S = [a,b]$
  • $\mathcal{A} = \left\{[x_{i-1},x_i)\right\}_{i=1}^n$ and $\Delta x_i \equiv x_i - x_{i-1}$
  • $\displaystyle \text{WUnif}(x|\mathbf{M}, \mathcal{A}) = \frac{1}{\sum_{j}M_j \cdot \Delta x_j}\sum_i M_i ๐Ÿ™_{A_i}(x)$

์—ฌ๊ธฐ์„œ๋Š” ์–ด์ฐจํ”ผ ์ €ํฌ๊ฐ€ ์‚ฌ์šฉํ•  ๋ถ„ํฌ๊ฐ€ 1์ฐจ์› ๋ถ„ํฌ์ด๋ฏ€๋กœ ์•ž์œผ๋กœ ์ด ์ •์˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐ์„ ์ง„ํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ผ๋‹จ, ์ด ํ•จ์ˆ˜๊ฐ€ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์ž„์€ ๊ฐ„๋‹จํžˆ ์ฆ๋ช…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ \begin{aligned} \int_a^b \text{WUnif}(x|\mathbf{M}, \mathcal{A}) dx &= \int_a^b \frac{1}{\sum_{j}M_j \cdot \Delta x_j}\sum_i M_i ๐Ÿ™_{A_i}(x) dx \\ &= \frac{1}{\sum_{j}M_j \cdot \Delta x_j}\sum_i M_i \int_{a}^{b} ๐Ÿ™_{A_i}(x) dx \\ &= \frac{1}{\sum_{j}M_j \cdot \Delta x_j}\sum_i M_i \cdot \Delta x_i \\ &= 1 \end{aligned} $$

Weighted uniform distribution์€ samplingํ•˜๊ธฐ๋„ ์‰ฝ์Šต๋‹ˆ๋‹ค. sampling ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ์ „์ฒด $n$๊ฐœ์˜ ๊ตฌ๊ฐ„ ์ค‘์—์„œ ํ•œ ๊ตฌ๊ฐ„์„ ๋ฝ‘์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ๊ฐ ๊ตฌ๊ฐ„์˜ ํ™•๋ฅ ์€ $\displaystyle \frac{M_i \cdot \Delta x_i}{\sum_{j}M_j \cdot \Delta x_j}$์ž…๋‹ˆ๋‹ค.

  2. ๊ฐ ๊ตฌ๊ฐ„์€ Uniform distribution์„ ๋”ฐ๋ฅด๋ฏ€๋กœ, ๊ตฌ๊ฐ„ ๋‚ด์—์„œ Uniform distribution์œผ๋กœ ํ•˜๋‚˜์˜ ์ƒ˜ํ”Œ์„ ๋ฝ‘์Šต๋‹ˆ๋‹ค.

๋งŒ์ผ max-pooling์„ ์ ์šฉํ•˜์—ฌ $\mathbf{M}, \mathcal{A}$๋ฅผ ๊ณจ๋ž๋‹ค๋ฉด, ๊ฐ ๊ตฌ๊ฐ„์˜ ๊ธธ์ด๊ฐ€ ๋ชจ๋‘ ๋™์ผํ•˜๋ฏ€๋กœ ํ™•๋ฅ ์—์„œ ๊ตฌ๊ฐ„์˜ ๊ธธ์ด ํ•ญ์ด ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๊ฒฝ์šฐ์— ๊ฐ ๊ตฌ๊ฐ„์˜ ํ™•๋ฅ ์€ $M_i / \sum_{j}M_j$๊ฐ€ ๋˜์–ด ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•ด์ง‘๋‹ˆ๋‹ค.

2.3. Piecewise Rejection Sampling

โ€ƒโ€ƒWeighted uniform distribution๋„ sampling๋ฐฉ๋ฒ•์„ ์•Œ๊ณ  ์žˆ๋Š” ์—„์—ฐํ•œ ํ•˜๋‚˜์˜ ํ™•๋ฅ ๋ถ„ํฌ์ด๋ฏ€๋กœ, ์ด๋ฅผ Rejection sampling์—์„œ์˜ $g(x)$๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ํ•ญ์ƒ $f(x)$๋ณด๋‹ค ์ปค์•ผ๋œ๋‹ค๋Š” ์กฐ๊ฑด์„ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ ์ž„์˜์˜ $\mathbf{M}, \mathcal{A}$๋ฅผ ๊ณ ๋ฅด๋Š” ๊ฒƒ์ด ์•„๋‹Œ max-pooling์„ ์ ์šฉํ•˜์—ฌ $\mathbf{M}, \mathcal{A}$๋ฅผ ์–ป์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ์ „์ฒด ๊ตฌ๊ฐ„์„ ๋ช‡ ๊ฐœ์˜ ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆŒ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, ๊ตฌ๊ฐ„์˜ ๊ฐœ์ˆ˜๋ฅผ $n$์ด๋ผ๊ณ  ํ•˜๊ณ  ๊ฐ ๊ตฌ๊ฐ„์˜ ๊ธธ์ด๋ฅผ ๋™๋“ฑํ•˜๊ฒŒ ๋‚˜๋ˆ„์–ด $\mathcal{A}$๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

  2. ๋‚˜๋ˆ ์ง„ ๊ตฌ๊ฐ„์— ๋Œ€ํ•ด $f(x)$๋ฅผ max-poolingํ•˜์—ฌ $\mathbf{M}$์„ ์–ป์Šต๋‹ˆ๋‹ค.

  3. $\mathbf{M}, \mathcal{A}$๋ฅผ ์ด์šฉํ•˜์—ฌ Weighted uniform distribution์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

  4. ์ด๋ ‡๊ฒŒ ์ •์˜๋œ Weighted uniform distribution์„ Rejection sampling์—์„œ์˜ $g(x)$๋กœ ์‚ฌ์šฉํ•˜์—ฌ samplingํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ sampling ๋ฐฉ๋ฒ•์€ ๋งˆ์น˜ ๊ตฌ๊ฐ„์„ ๋‚˜๋ˆ„์–ด samplingํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์œผ๋ฏ€๋กœ Piecewise Rejection Sampling ์œผ๋กœ ๋ช…๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์—์„œ์˜ Acceptance rate๋ฅผ ๊ตฌํ•ด๋ณด๋ฉด ๋‹จ์ˆœํžˆ Uniform distribution์„ ์‚ฌ์šฉํ•˜์—ฌ Rejection sampling์„ ํ•  ๋•Œ์— ๋น„ํ•ด Acceptance rate๊ฐ€ ๋†’์•„์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ \begin{aligned} P(U < f(Y)) &= \int_a^b P(U < f(Y) | Y = y) \cdot g(y) \, dy \\ &= \int_a^b P(U < f(y)) \cdot \text{WUnif}(y|\mathbf{M}, \mathcal{A}) \, dy \\ &= \frac{1}{\sum_{j}M_j \cdot \Delta x_j} \sum_i M_i \int_{A_i} P(U < f(y)) dy \end{aligned} $$ $U \sim \text{Unif}(u|0, M_i)$์ด๊ณ , $P(U < f(y)) = f(y)/{M_i}$์ด๋ฏ€๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. $$ \begin{aligned} P(U < f(Y)) &= \frac{1}{\sum_{j}M_j \cdot \Delta x_j} \sum_i \int_{A_i} f(y)\, dy \\ &= \frac{1}{\sum_{j}M_j \cdot \Delta x_j} \geq \frac{1}{M_\max \cdot \sum_{j} \Delta x_j} = \frac{1}{M} \end{aligned} $$

๋งˆ์ง€๋ง‰ ๋ถ€๋“ฑ์‹์€ $M_i$ ์ค‘์—์„œ ๊ฐ€์žฅ ํฐ ๊ฐ’์— ์ „์ฒด ๊ตฌ๊ฐ„์˜ ๊ธธ์ด๋ฅผ ๊ณฑํ•˜๋ฉด ์›๋ž˜ Uniform distribution์„ ์‚ฌ์šฉํ•  ๋•Œ์˜ $M$๊ณผ ๊ฐ™๋‹ค๋Š” ๊ฒƒ์„ ์ด์šฉํ•˜์—ฌ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด Piecewise rejection sampling์˜ Acceptance rate๋Š” Uniform distribution์„ ์‚ฌ์šฉํ•  ๋•Œ์˜ Acceptance rate๋ณด๋‹ค ํ•ญ์ƒ ๋†’๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด์ œ ๋งˆ์ง€๋ง‰์œผ๋กœ ์ฒ˜์Œ์— ์ œ์‹œํ–ˆ๋˜ ๋ฌธ์ œ๋ฅผ Piecewise rejection sampling์„ ์ด์šฉํ•˜์—ฌ ํ’€์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์œ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ Rust numeric library์ธ Peroxide์— ์ด๋ฏธ ๊ตฌํ˜„์ด ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์ด๋ฅผ ์ด์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

// Before running this code, you need to add peroxide in Cargo.toml
// `cargo add peroxide --features parquet`
use peroxide::fuga::*;

#[allow(non_snake_case)]
fn main() {
    // Read parquet data file
    let df = DataFrame::read_parquet("data/test.parquet").unwrap();
    let E: Vec<f64> = df["E"].to_vec();
    let dNdE: Vec<f64> = df["dNdE"].to_vec();

    // Cubic hermite spline -> Make continuous f(x)
    let cs = cubic_hermite_spline(&E, &dNdE, Quadratic);
    let f = |x: f64| cs.eval(x);

    // Piecewise rejection sampling
    // * # samples = 10000
    // * # bins = 100
    // * tolerance = 1e-6
    let E_sample = prs(f, 10000, (E[0], E[E.len()-1]), 100, 1e-6);

    // Write parquet data file
    let mut df = DataFrame::new(vec![]);
    df.push("E", Series::new(E_sample));
    df.write_parquet(
        "data/prs.parquet", 
        CompressionOptions::Uncompressed
    ).unwrap();
}

์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ํžˆ์Šคํ† ๊ทธ๋žจ์œผ๋กœ ๊ทธ๋ ค๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Finally, we get samples!

Finally, we get samples!


References

  • Yen-Chi Chen, Lecture 4: Importance Sampling and Rejection Sampling, STAT/Q SCI 403: Introduction to Resampling Methods (2017)

  • Ovidiu Calin, Deep Learning Architectures: A Mathematical Approach, Springer (2020)

  • Tae-Geun Kim (Axect), Precise Machine Learning with Rust (2019)


A. Footnotes

[1]: ์—ฌ๊ธฐ์— ์‚ฌ์šฉ๋œ ๊ทธ๋ฆผ์€ ์›์‹œ๋ธ”๋ž™ํ™€(Primordial Black hole; PBH)๋กœ๋ถ€ํ„ฐ ํŠน์ •์‹œ๊ฐ„์— ๋ฐฉ์ถœ๋œ ์•ก์‹œ์˜จ์œ ์‚ฌ์ž…์ž๋“ค(Axion Like Particles; ALPs)์˜ ์ŠคํŽ™ํŠธ๋Ÿผ์„ ๊ทธ๋ฆฐ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค. ์ด ์—ฐ๊ตฌ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ arXiv:2212.11977์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

[2]: ๋ฌผ๋ก  ์ด ๊ฒฝ์šฐ์—๋„, ๋…ธ๋“œ๋“ค์„ cubic spline ๋“ฑ์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ทผ์‚ฌํ•˜๊ณ  ์ˆ˜์น˜์ ๋ถ„์ด๋‚˜ polynomial ์ ๋ถ„์„ ์ด์šฉํ•˜์—ฌ ๋ˆ„์ ๋ถ„ํฌํ•จ์ˆ˜๋ฅผ ๊ตฌํ•œ ํ›„, ์ด๋ฅผ ๋‹ค์‹œ interpolation์ด๋‚˜ spline๋“ฑ์˜ ๋ฐฉ๋ฒ•์œผ๋กœ fittingํ•œ ํ›„ ์—ญํ•จ์ˆ˜๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ธดํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Š” ์˜ค์ฐจ๊ฐ€ ๊ฝค ์‹ฌํ•˜๊ณ  ๋น„ํšจ์œจ์ ์ด๋ผ ์ถ”์ฒœํ•˜์ง„ ์•Š์Šต๋‹ˆ๋‹ค.

]]>
๐Ÿ’” Decorrelation + Deep learning = Generalization https://axect.github.io/kr/posts/005_decov/ Sat, 29 Oct 2022 17:39:54 +0900 https://axect.github.io/kr/posts/005_decov/ <figure> <img src="https://axect.github.io/posts/images/005_01_paper.png" alt="arXiv: 1511.06068"/> <figcaption style="text-align:center"> <p><a href="https://arxiv.org/abs/1511.06068">arXiv: 1511.06068</a></p> </figcaption> </figure> <p>โ€ƒโ€ƒ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ฐ€์žฅ ๋นˆ๋ฒˆํ•˜๊ฒŒ ์ผ์–ด๋‚˜๋Š” ๋ฌธ์ œ๋กœ <span style="background-color: rgba(255, 255, 0, 0.534);"> <b>Overfitting (๊ณผ์ ํ•ฉ)</b> </span>์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ง€ ์•Š์„ ๋•Œ, ํ•™์Šต์„ ๋งŽ์ด ํ•  ์ˆ˜๋ก ์ž˜ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ด๋ฉฐ ์ด๋กœ ์ธํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ๋Š” ์„ฑ๋Šฅ์ด ์ข‹๋”๋ผ๋„ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹์ด๋‚˜ ์‹ค์ œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ๋Š” ์„ฑ๋Šฅ์ด ์•ˆ ๋‚˜์˜ค๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์€ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆํ–ˆ๋Š”๋ฐ, ํ†ต๊ณ„ํ•™์—์„œ๋Š” ์ผ์ฐŒ๊ฐ์น˜ Ridge๋‚˜ LASSO์™€ ๊ฐ™์€ regularization ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ ๋”ฅ๋Ÿฌ๋‹์—์„œ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ weight์„ regularizeํ•˜๊ฑฐ๋‚˜ ์‹ ๊ฒฝ๋ง์— ์—ฌ๋Ÿฌ ๊ธฐ์ˆ ์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ๋“ค์„ ๋„์ž…ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.</p> arXiv: 1511.06068

arXiv: 1511.06068

โ€ƒโ€ƒ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ฐ€์žฅ ๋นˆ๋ฒˆํ•˜๊ฒŒ ์ผ์–ด๋‚˜๋Š” ๋ฌธ์ œ๋กœ Overfitting (๊ณผ์ ํ•ฉ) ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์ง€ ์•Š์„ ๋•Œ, ํ•™์Šต์„ ๋งŽ์ด ํ•  ์ˆ˜๋ก ์ž˜ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ด๋ฉฐ ์ด๋กœ ์ธํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ๋Š” ์„ฑ๋Šฅ์ด ์ข‹๋”๋ผ๋„ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹์ด๋‚˜ ์‹ค์ œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ๋Š” ์„ฑ๋Šฅ์ด ์•ˆ ๋‚˜์˜ค๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์€ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆํ–ˆ๋Š”๋ฐ, ํ†ต๊ณ„ํ•™์—์„œ๋Š” ์ผ์ฐŒ๊ฐ์น˜ Ridge๋‚˜ LASSO์™€ ๊ฐ™์€ regularization ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ ๋”ฅ๋Ÿฌ๋‹์—์„œ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ weight์„ regularizeํ•˜๊ฑฐ๋‚˜ ์‹ ๊ฒฝ๋ง์— ์—ฌ๋Ÿฌ ๊ธฐ์ˆ ์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ๋“ค์„ ๋„์ž…ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.

Bejani, M.M., Ghatee, M. A systematic review on overfitting control in shallow and deep neural networks. Artif Intell Rev 54, 6391โ€“6438 (2021)

Bejani, M.M., Ghatee, M. A systematic review on overfitting control in shallow and deep neural networks. Artif Intell Rev 54, 6391โ€“6438 (2021)

ํŠนํžˆ ์ด ์ค‘์—์„œ ๊ฐ€์žฅ ์œ ๋ช…ํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ Dropout ์ธ๋ฐ, ์ด๋Š” ์‹ ๊ฒฝ๋ง์˜ ๋‰ด๋Ÿฐ๋“ค ์ค‘ ์ผ๋ถ€๋ฅผ ์ œ๊ฑฐํ•จ์œผ๋กœ์จ ๋‰ด๋Ÿฐ๋“ค์˜ ์ค‘๋ณตํ™œ๋™์„ ์–ต์ œํ•˜๋Š” ํšจ๊ณผ๋ฅผ ์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ด๋Š” Overfitting์„ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๊ต‰์žฅํžˆ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์ด์—ˆ๊ณ , ์ด์ œ๋Š” ์‹ ๊ฒฝ๋ง ๊ตฌ์„ฑ์— ํฌํ•จ๋˜๋Š” ๊ฒƒ์ด ๋‹น์—ฐํ•œ ์ •๋„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ Dropout์ด ๋ชจ๋“  ๊ฒฝ์šฐ์— ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์€ ์•„๋‹™๋‹ˆ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๊ฐ€ ์ ๊ฑฐ๋‚˜ ๋‰ด๋Ÿฐ ์ˆ˜ ์ž์ฒด๊ฐ€ ์ ๋‹ค๋ฉด ์ž„์˜์˜ ๋‰ด๋Ÿฐ์„ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์ด ์‹ ๊ฒฝ๋ง์˜ ํ‘œํ˜„๋ ฅ์„ ์ œํ•œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ทธ ํšจ๊ณผ์™€๋Š” ๋ณ„๊ฐœ๋กœ Dropout์€ ํŠน์œ ์˜ ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ์™€ ๊ฐœ๋… ๋•์— ์ด๋ก ์ ์ธ ์ฆ๊ฑฐ์›€์€ ๋ฐ˜๊ฐ๋˜๋Š” ๋ฉด์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ๊ทธ ์ฆ๊ฑฐ์›€๊ณผ ํšจ๊ณผ๋ฅผ ๋ณด์™„ํ•ด์ฃผ๋Š” ์žฌ๋ฏธ์žˆ๋Š” ๋…ผ๋ฌธ์„ ์†Œ๊ฐœํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.


1. Why do we need decorrelation?

1.1. Covariance & Correlation

โ€ƒโ€ƒ์ œ๊ฐ€ ๋ฆฌ๋ทฐํ•  ๋…ผ๋ฌธ์€ ํ‘œ์ง€์—์„œ ๋ณด์…จ๋‹ค์‹œํ”ผ 2015๋…„์— ๋ฐœํ‘œ๋œ Reducing Overfitting in Deep networks by Decorrelating Representations ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ Decorrelation ์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ Overfitting์„ ์ค„์ด๋Š” ๊ฒƒ์— ๋Œ€ํ•ด ๋‹ค๋ฃจ์—ˆ๋Š”๋ฐ, ์ด๊ฒƒ์œผ๋กœ Dropout์˜ ์„ฑ๋Šฅ์„ ํ•œ์ธต ๋” ๋Œ์–ด์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ Decorrelation์ด๋ผ๋Š” ๊ฒƒ์ด ์ƒ์†Œํ•˜๊ธฐ๋„ ํ•˜๊ณ  ํฐ ๊ด€์‹ฌ์„ ๋Œ์ง€ ๋ชปํ•˜๋Š” ์ฃผ์ œ๋ผ ํญ๋ฐœ์ ์ธ ๋ฐ˜์‘์€ ๋Œ์–ด๋‚ด์ง€ ๋ชปํ•˜์˜€์Šต๋‹ˆ๋‹ค๋งŒ, ๊ทธ๋ž˜๋„ ๊พธ์ค€ํžˆ ์ธ์šฉ๋˜๊ณ  ์žˆ๋Š” ์ข‹์€ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค.

Decorrelation์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋จผ์ € Correlation์— ๋Œ€ํ•ด ์ดํ•ดํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Correlation์€ ๋‘ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์œผ๋กœ ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐฉ๋ฒ•์—๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ, ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” Covariance(๊ณต๋ถ„์‚ฐ)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณต๋ถ„์‚ฐ์˜ ์ •์˜๋Š” 2๊ฐœ์˜ ํ™•๋ฅ ๋ณ€์ˆ˜์˜ ์„ ํ˜• ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ \text{Cov}(X,\,Y) \equiv \mathbb{E}\left[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])\right] $$

๋ณดํ†ต ๊ณต๋ถ„์‚ฐ ๊ฐ’์ด ์–‘์ˆ˜์ด๋ฉด ๋‘ ๋ณ€์ˆ˜๋Š” ๋น„๋ก€ ๊ด€๊ณ„๊ฐ€ ์žˆ์œผ๋ฉฐ ์Œ์ˆ˜์ด๋ฉด ๋ฐ˜๋น„๋ก€ ๊ด€๊ณ„๊ฐ€ ์žˆ๊ณ , 0์— ๊ฐ€๊นŒ์šฐ๋ฉด ๊ด€๊ณ„๊ฐ€ ์—†๋‹ค๊ณ  ์—ฌ๊ฒจ์ง‘๋‹ˆ๋‹ค. ์ด๋ฅผ ์ข€ ๋” ์ •๋Ÿ‰ํ™” ํ•œ๊ฒƒ์œผ๋กœ๋Š” Pearson correlation coefficient(ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜)๊ฐ€ ์žˆ๋Š”๋ฐ ์ด๊ฒƒ์˜ ์ •์˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ \text{Corr}(X,\,Y) \equiv \frac{\text{Cov}(X,\,Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}} $$

์ด๋ ‡๊ฒŒ ์ •์˜ํ•˜๋ฉด ์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๊ฐ’์˜ ๋ฒ”์œ„๊ฐ€ -1์—์„œ 1๊นŒ์ง€๋กœ ํ•œ์ •๋˜๋Š”๋ฐ, 1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ์ •๋น„๋ก€ ๊ด€๊ณ„๊ฐ€ ์žˆ๊ณ , -1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ๋ฐ˜๋น„๋ก€ ๊ด€๊ณ„๊ฐ€ ์žˆ์œผ๋ฉฐ 0์— ๊ฐ€๊น๋‹ค๋ฉด ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ 2๊ฐœ๋ฅผ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค.

# Python
import numpy as np

x = np.array([1,2,3,4])
y = np.array([5,6,7,8])

์ด 2๊ฐœ์˜ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๊ณต๋ถ„์‚ฐ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•จ์ˆ˜๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# Python
def cov(x, y):
    N = len(x)
    m_x = x.mean()
    m_y = y.mean()
    return np.dot((x - m_x), (y - m_y)) / (N-1)

์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์œ„์˜ x, y์— ๋Œ€ํ•œ ๊ณต๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜๋ฉด 1.6666666666666667์ด๋ผ๋Š” ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„๋ผ๋Š” ๊ฒƒ์„ ์–ป์„ ์ˆ˜๋Š” ์žˆ์ง€๋งŒ ๊ฐ’์„ ํ•ด์„ํ•˜๋Š”๋ฐ๋Š” ์–ด๋ ค์›€์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ๊น”๋”ํžˆ ํ•ด์„ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์œ„์—์„œ ์–ธ๊ธ‰ํ•˜์˜€๋˜ ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ์ •์˜ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# Python
def pearson_corr(x, y):
    # ddof=1 for sample variance
    return cov(x, y) / np.sqrt(x.var(ddof=1) * y.var(ddof=1))

์ด๋ฅผ ์œ„ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜๋ฉด ์ •ํ™•ํžˆ 1์˜ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‘ ๋ณ€์ˆ˜๊ฐ€ ์™„๋ฒฝํ•˜๊ฒŒ ์„ ํ˜•์ ์œผ๋กœ ์ •๋น„๋ก€๊ด€๊ณ„์— ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์—ฌ๋Ÿฌ ๊ฐœ์˜ feature๋“ค์ด ์žˆ๋‹ค๋ฉด ํ–‰๋ ฌ์„ ๋งŒ๋“ค์–ด ์ด๋“ค์˜ ๊ณต๋ถ„์‚ฐ์„ ํ•œ ๋ฒˆ์— ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ(Covariance matrix)์ด๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

$$ C_{ij} = \text{Cov}(x_i,\,x_j) $$

๋”ฐ๋ผ์„œ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ์€ feature์˜ ๊ฐœ์ˆ˜์™€ ๊ฐ™์€ ํ–‰๊ณผ ์—ด์„ ๊ฐ–๋Š” ์ •์‚ฌ๊ฐํ–‰๋ ฌ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.

ย 

1.2. Overfitting & Decorrelation

โ€ƒโ€ƒ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ฐ‘์ž๊ธฐ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๊นŒ๋‹ญ์€ ๋‘ ๋ณ€์ˆ˜ ํ˜น์€ ์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๊ฐ€ ์„œ๋กœ ์œ ์˜๋ฏธํ•œ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค๋ฉด ์ด๊ฒƒ์ด ๋ชจ๋ธ์— ์•…์˜ํ–ฅ์„ ๋ผ์น˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์€ ๊ฐ feature๋“ค์— ํ• ๋‹น๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋ฉฐ ํ•™์Šตํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ๋งŒ์ผ ๋‘ feature๊ฐ€ ์ •ํ™•ํžˆ ๊ฐ™์€ ์—ญํ• ์„ ํ•œ๋‹ค๋ฉด ๋‘˜ ์ค‘ ์–ด๋–ค ๊ฐ€์ค‘์น˜๋ฅผ ๋ณ€๊ฒฝํ•ด๋„ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์ผ์ข…์˜ degeneracy๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ •ํ™•ํ•œ ํ•™์Šต์„ ๋ฐฉํ•ดํ•˜๊ณ  ์‹ ๊ฒฝ๋ง์„ ํŽธํ–ฅ์‹œํ‚ค๋Š”๋ฐ, ์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๊ฒƒ์ด ๊ณผ์ ํ•ฉ์„ ๋ฐœ์ƒ์‹œํ‚ค๋Š” ์›์ธ์œผ๋กœ ๊ฐ„์ฃผํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ณผ์ ํ•ฉ ์ •๋„๋ฅผ ํ›ˆ๋ จ ์ •ํ™•๋„์™€ ๊ฒ€์ฆ ์ •ํ™•๋„์˜ ์ฐจ์ด๋กœ ์ •์˜ํ•˜๊ณ  Decorrelation ์ •๋„๋ฅผ ์ˆ˜์น˜ํ™”ํ•œ ๊ฒƒ์„ DeCov๋ผ๋Š” ๋ณ€์ˆ˜๋กœ ๋‘”๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

Overfitting๊ณผ Covariance์˜ ์ƒ๊ด€๊ด€๊ณ„

Overfitting๊ณผ Covariance์˜ ์ƒ๊ด€๊ด€๊ณ„

์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” ํ›ˆ๋ จ ์ƒ˜ํ”Œ์„ ๋Š˜๋ฆฌ๋ฉด์„œ Overfitting ์ •๋„๊ฐ€ ์ž‘์•„์งˆ ๋•Œ, Cross-Covariance ์ •๋„๋„ ๊ฐ™์ด ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ด์šฉํ•˜๋ฉด Covariance๋ฅผ ์ค„์ธ๋‹ค๋ฉด, ์ฆ‰, Decorrelation์„ ํ•œ๋‹ค๋ฉด Overfitting์„ ์ค„์ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


2. How to decorrelate?

โ€ƒโ€ƒ๋…ผ๋ฌธ์—์„œ๋Š” hidden layer์˜ output๋“ค์„ activationํ•œ ๊ฐ’๋“ค์„ decorrelateํ•ด์•ผ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๋‹ค์Œ weight๋“ค๊ณผ ๊ณฑํ•ด์ง€๋Š” ๊ฒƒ์€ ์ด๋“ค์ด๋ฏ€๋กœ ํ•ฉ๋ฆฌ์ ์ธ ์ƒ๊ฐ์ด๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ•œ hidden layer์˜ activation ๊ฐ’๋“ค์„ $h^n \in \mathbb{R}^d$๋ผ๊ณ  ์ •์˜ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ด๋“ค์˜ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ์„ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ($n$์€ batch index๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.)

$$ C_{ij} = \frac{1}{N} \sum_n (h_i^n - \mu_i)(h_j^n - \mu_j) $$

์ด์ œ ์ด ๊ฐ’์„ ์ด์šฉํ•˜์—ฌ ์„œ๋กœ์˜ ๊ณต๋ถ„์‚ฐ์„ ์ค„์ด๊ธฐ๋งŒ ํ•œ๋‹ค๋ฉด ๋ชฉ์ ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Š” ์—„์—ฐํžˆ ํ–‰๋ ฌ์ด๋ฏ€๋กœ Loss ํ•จ์ˆ˜์— ํฌํ•จํ•˜๋ ค๋ฉด ๋‹จ์ˆœํ•œ ์Šค์นผ๋ผ๋กœ ์ด๋ฅผ ํ‘œํ˜„ํ•ด์•ผํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์€ Loss ํ•จ์ˆ˜๋กœ ํ‘œํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

$$ \mathcal{L}_{\text{DeCov}} = \frac{1}{2}(\lVert C \rVert_F^2 - \lVert\text{diag}(C) \rVert_2^2) $$

$\lVert \cdot \rVert_F$๋Š” ํ–‰๋ ฌ์˜ Frobenius norm์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋ฉฐ $\lVert \cdot \rVert_2$๋Š” $l^2$ norm์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋Œ€๊ฐ์„ฑ๋ถ„๋งŒ ๋”ฐ๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ ๋บ€ ๊นŒ๋‹ญ์€ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ์˜ ๋Œ€๊ฐ์„ฑ๋ถ„์€ ๋‹จ์ˆœํžˆ ๊ฐ feature์˜ ๋ถ„์‚ฐ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๋ผ, decorrelation๊ณผ ๊ด€๊ณ„๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ด์ œ ์ด Loss ํ•จ์ˆ˜๋ฅผ ๋งˆ์น˜ regularization ํ•ญ์„ ์ถ”๊ฐ€ํ•˜๋“ฏ์ด ์ €ํฌ์˜ Loss ํ•จ์ˆ˜์— ์ถ”๊ฐ€ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด Decorrelation์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ์ด๋ฅผ ์‹ค์ œ ๋”ฅ๋Ÿฌ๋‹์—์„œ๋Š” ์–ด๋–ป๊ฒŒ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”? ์–ธ๋œป ์ƒ๊ฐํ•ด๋ณด๋ฉด ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ๊ณผ ๊ทธ์— ๋”ฐ๋ฅธ norm์˜ gradient ๊ณ„์‚ฐ์„ ์ „๋ถ€ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์€ ๊ต‰์žฅํžˆ ๋ณต์žกํ•ด๋ณด์ด์ง€๋งŒ, ๋‹คํ–‰ํžˆ PyTorch์—๋Š” covariance matrix์™€ norm์— ๋Œ€ํ•œ ์ž๋™๋ฏธ๋ถ„์ด ์ „๋ถ€ ๊ตฌํ˜„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐ„๋‹จํ•œ ์ฝ”๋“œ๋กœ ์ด๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# Python
import torch

def decov(h):
    C = torch.cov(h)
    C_diag = torch.diag(C, 0)
    return 0.5 * (torch.norm(C, 'fro')**2 - torch.norm(C_diag, 2)**2)

3. Apply to Regression

โ€ƒโ€ƒ์› ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ์—ฌ๋Ÿฌ๊ฐ€์ง€ image ๋ฐ์ดํ„ฐ๋“ค์— ์ ์šฉํ•ด๋ณด์•˜์ง€๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ๊ทธ ํšจ๊ณผ๋ฅผ ์ข€ ๋” ์‰ฝ๊ฒŒ ์‹ค๊ฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ํšŒ๊ท€๋ฌธ์ œ์— ์ ์šฉํ•ด๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์—๊ฒŒ ์ถœ์ œํ•  ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋น„์„ ํ˜• ๋ฐ์ดํ„ฐ [์ฐธ๊ณ : Peroxide_Gallery]

๋น„์„ ํ˜• ๋ฐ์ดํ„ฐ [์ฐธ๊ณ : Peroxide_Gallery]

์ด๋ฅผ ๊ฐ„๋‹จํ•œ ์‹ ๊ฒฝ๋ง๊ณผ DeCov๋ฅผ ๊ตฌํ˜„ํ•œ ์‹ ๊ฒฝ๋ง ๊ฐ๊ฐ์— ๋Œ€ํ•ด ํ’€์–ด๋ณด๊ฒŒ ํ•˜์˜€๋Š”๋ฐ, DeCov ์‹ ๊ฒฝ๋ง์˜ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

import pytorch_lightning as pl
from torch import nn
import torch.nn.functional as F
# ...

class DeCovMLP(pl.LightningModule):
    def __init__(self, hparams=None):
        # ...
        
        self.fc_init = nn.Sequential(
            nn.Linear(1, self.hidden_nodes),
            nn.ReLU(inplace=True)
        )
        
        self.fc_mid = nn.Sequential(
            nn.Linear(self.hidden_nodes, self.hidden_nodes),
            nn.ReLU(inplace=True),
            nn.Linear(self.hidden_nodes, self.hidden_nodes),
            nn.ReLU(inplace=True),
            nn.Linear(self.hidden_nodes, self.hidden_nodes),
            nn.ReLU(inplace=True),
        )
        
        self.fc_final = nn.Linear(self.hidden_nodes, 1)
        
        # ...
        
    def forward(self, x):
        x = self.fc_init(x)
        x = self.fc_mid(x)
        return self.fc_final(x)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        
        h0 = self.fc_init(x)
        loss_0 = decov(h0)
        
        h1 = self.fc_mid(h0)
        loss_1 = decov(h1)
        
        y_hat = self.fc_final(h1)
        loss = F.mse_loss(y,y_hat) + loss_0 + loss_1
        
        return loss
    
    # ...

์ด๋ฅผ ๊ฐ„๋‹จํžˆ ์‚ดํŽด๋ณด๋ฉด fc_init ๋‹จ์ผ์ธต์„ ์ง€๋‚ฌ์„ ๋•Œ์˜ ๊ฐ’์„ ์ด์šฉํ•˜์—ฌ loss_0๋ฅผ ์ •์˜ํ•˜์˜€๊ณ , fc_mid๋ผ๋Š” 3๊ฐœ ์ธต์„ ์ง€๋‚ฌ์„ ๋•Œ์˜ ๊ฐ’์„ ์ด์šฉํ•˜์—ฌ loss_1์„ ์ •์˜ํ•˜์—ฌ ์ด๋ฅผ MSE loss์— regularization ํ•ญ์ฒ˜๋Ÿผ ์ ์šฉํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ธฐ ์ „์— SimpleMLP์™€ DeCovMLP์˜ ํ›ˆ๋ จ๊ณผ์ •์„ ๋ณด๋ฉด ์žฌ๋ฏธ์žˆ๋Š” ๊ฒƒ์„ ๊ด€์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

SimpleMLP losses (wandb.ai)

SimpleMLP losses (wandb.ai)

DeCovMLP losses (wandb.ai)

DeCovMLP losses (wandb.ai)

SimpleMLP์—์„œ๋Š” ํ›ˆ๋ จ์ด ๊ฑฐ๋“ญ๋ ์ˆ˜๋ก decov_1์ด ์ฆ๊ฐ€ํ•˜๋‹ค ๋” ์ด์ƒ ์ค„์–ด๋“ค์ง€ ์•Š๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ DeCovMLP์—์„œ๋Š” decov_1์˜ ๊ฐ’์ด ๊พธ์ค€ํžˆ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋…ผ๋ฌธ์—์„œ ์˜ˆ์ƒํ–ˆ๋˜ ๊ฒƒ๊ณผ ๊ฐ™์ด overfitting ๋˜๋ฉด feature ๊ฐ„์˜ correlation ์—ญ์‹œ ๋Š˜์–ด๋‚˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ์–‘์ƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ด์ œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Results!

Results!

๋ถ‰์€ ์„ ์€ SimpleMLP์˜ ๊ฒฐ๊ณผ์ด๊ณ  ํŒŒ๋ž€ ์„ ์€ DeCovMLP์˜ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๋ถ‰์€ ์„ ์ด Overfit ๋˜์–ด ์š”๋™์ด ์‹ฌํ•œ ๊ฒƒ์— ๋น„ํ•ด ํŒŒ๋ž€ ์„ ์€ true์—์„œ ๋ณ„๋กœ ๋ฒ—์–ด๋‚˜์ง€ ์•Š๊ณ  ์š”๋™๋„ ์‹ฌํ•˜์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ๊นŒ์ง€๋งŒ ํ•ด๋„ ์ถฉ๋ถ„ํžˆ ๋†€๋ผ์šด ๊ฒฐ๊ณผ์ด์ง€๋งŒ ์กฐ๊ธˆ๋งŒ extrapolate์„ ํ•ด๋ณด๋ฉด ๋” ์žฌ๋ฏธ์žˆ๋Š” ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Extrapolate!

Extrapolate!

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์— ๊ณผ์ ํ•ฉ๋œ ๋ถ‰์€ ์„ ์€ ์ •์˜์—ญ์ด ์กฐ๊ธˆ๋งŒ ๋ฒ—์–ด๋‚˜๋„ ์™„์ „ํžˆ ์–ด๊ธ‹๋‚œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐ˜๋ฉด ํ‘ธ๋ฅธ ์„ ์€ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜์ง€ ์•Š๋Š” ์–‘์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํ™•์‹คํžˆ DeCovMLP๊ฐ€ ๋” ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜๋‹ค๋Š” ๊ฒƒ์„ ๊ทธ๋ฆผ์œผ๋กœ๋ถ€ํ„ฐ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


4. Further more..

์ด ๋…ผ๋ฌธ์€ 2015๋…„์— ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์œผ๋กœ ํ˜„์žฌ 2022๋…„ ๊ธฐ์ค€์œผ๋กœ ๋ณด๋ฉด ์ƒ๋‹นํžˆ ์˜ค๋ž˜๋œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ดํ›„์—๋„ Decorrelation์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋Š” ๊พธ์ค€ํžˆ ์žˆ์–ด์™€์„œ ์ตœ๊ทผ์—๋Š” Decorrelated Batch Normalization ๋“ฑ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋“ค์ด ๋ฐœํ‘œ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ Decorrelation์€ ๊ณผ๊ฑฐ๋ถ€ํ„ฐ ํ†ต๊ณ„ํ•™์—์„œ ๋งŽ์ด ๋‹ค๋ค„์ ธ์™€์„œ ์ด๋ก ์  ๊ทผ๊ฑฐ์™€ ๋ถ„์„์ด ํƒ„ํƒ„ํ•˜๊ธฐ์— ์‹ ๊ฒฝ๋ง์˜ ๊ฒฐ๊ณผ๋ฅผ ์ข€ ๋” ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์œ„ Regression ์ฝ”๋“œ๋Š” ๋‹ค์Œ ๋งํฌ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

]]>
๐Ÿฆ€ Rust 1.62.0 ์—…๋ฐ์ดํŠธ์˜ ์‹ ๊ธฐ๋Šฅ 3๊ฐ€์ง€ https://axect.github.io/kr/posts/004_rust_1.62.0/ Fri, 01 Jul 2022 11:56:41 +0900 https://axect.github.io/kr/posts/004_rust_1.62.0/ <figure> <img src="https://axect.github.io/posts/images/rustacean.svg" alt="Ferris the crab" width="2000"/> <figcaption style="text-align:center"> <p><a href="https://rustacean.net/">Ferris the crab</a></p> </figcaption> </figure> <p>โ€ƒโ€ƒRust ์–ธ์–ด๋Š” 2015๋…„ 1.0 ๋ฒ„์ „์ด ์ถœ์‹œ๋œ ์ด๋ž˜๋กœ <span style="background-color: rgba(255, 255, 0, 0.534);"> <b>Stable, Beta, Nightly</b> </span> ์„ธ๊ฐ€์ง€ ์ฑ„๋„๋กœ ๋‚˜๋ˆ„์–ด ๊พธ์ค€ํžˆ ์—…๋ฐ์ดํŠธ ์ค‘์ž…๋‹ˆ๋‹ค. ๋ฏธ๋ฆฌ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์จ๋ณด๊ณ  ์‹ถ์€ ๊ฐœ๋ฐœ์ž๋“ค์€ Beta์™€ Nightly๋ฅผ ์จ๋ณผ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ ๋ฐฐํฌํ•˜๊ฑฐ๋‚˜, ์‹ค์ œ ์ œํ’ˆ์„ ๋งŒ๋“ค์–ด์•ผ ํ•  ๋•Œ์—๋Š” Stable์„ ์„ ํƒํ•  ์ˆ˜ ๋ฐ–์— ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Stable ์ฑ„๋„์˜ ์—…๋ฐ์ดํŠธ๋Š” Rust ํ™˜๊ฒฝ ์ „๋ฐ˜์˜ ์—…๋ฐ์ดํŠธ์™€ ๊ฐ™๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๊ณ , ์ƒˆ๋กœ์šด ๋ฒ„์ „์ด ์ถœ์‹œ๋ ๋•Œ๋งˆ๋‹ค Rust ๊ฐœ๋ฐœ์ž๋“ค์˜ ์ด๋ชฉ์ด ์ง‘์ค‘๋ฉ๋‹ˆ๋‹ค.</p> <p>์—…๋ฐ์ดํŠธ์—๋Š” ์‚ฌ์†Œํ•œ ๋ฒ„๊ทธ ์ˆ˜์ •๋„ ์žˆ๊ณ , Beta์™€ Nightly์—์„œ ์‚ฌ์šฉ๋˜์—ˆ๋˜ ๊ธฐ๋Šฅ๋“ค์˜ ์•ˆ์ •ํ™”๋„ ํฌํ•จ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๊ผญ ์–ธ์–ด์™€ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ จ์ด ์—†๋”๋ผ๋„ ๋นŒ๋“œ์— ์‚ฌ์šฉ๋˜๋Š” ๋„๊ตฌ๋“ค์˜ ์—…๋ฐ์ดํŠธ๋„ ํฌํ•จ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ค ์—…๋ฐ์ดํŠธ๋Š” ๋„“๊ณ  ์ž”์ž”ํ•œ ํ˜ธ์ˆ˜์— ์ž‘์€ ๋Œ๋ฉฉ์ด ํ•˜๋‚˜๋ฅผ ๋˜์ง„ ๊ฒƒ๊ณผ ๊ฐ™์€ ์ž‘์€ ํŒŒ๋ฌธ๊ณผ ๊ฐ™๋‹ค๋ฉด, ์–ด๋–ค ์—…๋ฐ์ดํŠธ๋Š” ์‚ฌ๋žŒ๋“ค์ด ์˜ค๋žซ๋™์•ˆ ๊ธฐ๋‹ค๋ฆฌ๋˜ ์ปค๋‹ค๋ž€ ๋Œ์˜ ์ˆ˜๋ฌธ ๊ฐœ๋ฐฉ๊ณผ ๊ฐ™์€ ์‹œ์›ํ•จ์„ ์ฃผ๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฒˆ 2022๋…„ 6์›” 30์ผ์— ๋ฐœํ‘œ๋œ <span style="background-color: rgba(255, 255, 0, 0.534);"> <b>1.62.0</b> </span> ์—…๋ฐ์ดํŠธ๋Š” ๋ช…๋ฐฑํžˆ ํ›„์ž์˜€์Šต๋‹ˆ๋‹ค.</p> Ferris the crab

Ferris the crab

โ€ƒโ€ƒRust ์–ธ์–ด๋Š” 2015๋…„ 1.0 ๋ฒ„์ „์ด ์ถœ์‹œ๋œ ์ด๋ž˜๋กœ Stable, Beta, Nightly ์„ธ๊ฐ€์ง€ ์ฑ„๋„๋กœ ๋‚˜๋ˆ„์–ด ๊พธ์ค€ํžˆ ์—…๋ฐ์ดํŠธ ์ค‘์ž…๋‹ˆ๋‹ค. ๋ฏธ๋ฆฌ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์จ๋ณด๊ณ  ์‹ถ์€ ๊ฐœ๋ฐœ์ž๋“ค์€ Beta์™€ Nightly๋ฅผ ์จ๋ณผ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ ๋ฐฐํฌํ•˜๊ฑฐ๋‚˜, ์‹ค์ œ ์ œํ’ˆ์„ ๋งŒ๋“ค์–ด์•ผ ํ•  ๋•Œ์—๋Š” Stable์„ ์„ ํƒํ•  ์ˆ˜ ๋ฐ–์— ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Stable ์ฑ„๋„์˜ ์—…๋ฐ์ดํŠธ๋Š” Rust ํ™˜๊ฒฝ ์ „๋ฐ˜์˜ ์—…๋ฐ์ดํŠธ์™€ ๊ฐ™๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๊ณ , ์ƒˆ๋กœ์šด ๋ฒ„์ „์ด ์ถœ์‹œ๋ ๋•Œ๋งˆ๋‹ค Rust ๊ฐœ๋ฐœ์ž๋“ค์˜ ์ด๋ชฉ์ด ์ง‘์ค‘๋ฉ๋‹ˆ๋‹ค.

์—…๋ฐ์ดํŠธ์—๋Š” ์‚ฌ์†Œํ•œ ๋ฒ„๊ทธ ์ˆ˜์ •๋„ ์žˆ๊ณ , Beta์™€ Nightly์—์„œ ์‚ฌ์šฉ๋˜์—ˆ๋˜ ๊ธฐ๋Šฅ๋“ค์˜ ์•ˆ์ •ํ™”๋„ ํฌํ•จ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๊ผญ ์–ธ์–ด์™€ ์ง์ ‘์ ์œผ๋กœ ๊ด€๋ จ์ด ์—†๋”๋ผ๋„ ๋นŒ๋“œ์— ์‚ฌ์šฉ๋˜๋Š” ๋„๊ตฌ๋“ค์˜ ์—…๋ฐ์ดํŠธ๋„ ํฌํ•จ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ค ์—…๋ฐ์ดํŠธ๋Š” ๋„“๊ณ  ์ž”์ž”ํ•œ ํ˜ธ์ˆ˜์— ์ž‘์€ ๋Œ๋ฉฉ์ด ํ•˜๋‚˜๋ฅผ ๋˜์ง„ ๊ฒƒ๊ณผ ๊ฐ™์€ ์ž‘์€ ํŒŒ๋ฌธ๊ณผ ๊ฐ™๋‹ค๋ฉด, ์–ด๋–ค ์—…๋ฐ์ดํŠธ๋Š” ์‚ฌ๋žŒ๋“ค์ด ์˜ค๋žซ๋™์•ˆ ๊ธฐ๋‹ค๋ฆฌ๋˜ ์ปค๋‹ค๋ž€ ๋Œ์˜ ์ˆ˜๋ฌธ ๊ฐœ๋ฐฉ๊ณผ ๊ฐ™์€ ์‹œ์›ํ•จ์„ ์ฃผ๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฒˆ 2022๋…„ 6์›” 30์ผ์— ๋ฐœํ‘œ๋œ 1.62.0 ์—…๋ฐ์ดํŠธ๋Š” ๋ช…๋ฐฑํžˆ ํ›„์ž์˜€์Šต๋‹ˆ๋‹ค.


1. cargo add

โ€ƒโ€ƒRust 1.62.0 ์—…๋ฐ์ดํŠธ์—์„œ ๊ฐ€์žฅ ์ฃผ๋ชฉ๋ฐ›์€ ๊ธฐ๋Šฅ์€ ๋‹จ์—ฐ์ฝ” cargo add ์˜€์Šต๋‹ˆ๋‹ค. ์›๋ž˜๋Š” Cargo.toml์—์„œ ํ•ญ์ƒ ์ง์ ‘ crate๋“ค์„ ์†์œผ๋กœ ์ถ”๊ฐ€ํ•ด์คฌ์–ด์•ผ ํ•˜์ง€๋งŒ, ์ด์ œ Terminal์—์„œ cargo add CRATE_NAME์„ ์ž…๋ ฅํ•˜๋ฉด crate๋“ค์ด ์ž๋™์œผ๋กœ ์ถ”๊ฐ€๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์ง์ ‘ ํ•ด๋ณด๊ธฐ ์œ„ํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ”„๋กœ์ ํŠธ๋ฅผ ํ•˜๋‚˜ ๋งŒ๋“ค์–ด๋ด…์‹œ๋‹ค.

crate ๋งŒ๋“ค๊ธฐ

crate ๋งŒ๋“ค๊ธฐ

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด rust_1_62_0์ด๋ผ๋Š” ํด๋”๊ฐ€ ์ƒ์„ฑ๋˜๊ณ  ๊ทธ ์•ˆ์— src/main.rs ํŒŒ์ผ๊ณผ Cargo.toml์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด์ œ ํด๋”์— ๋“ค์–ด๊ฐ€์„œ Cargo.toml ํŒŒ์ผ์„ ์—ด์–ด๋ด…์‹œ๋‹ค.

Cargo.toml

Cargo.toml

์•„์ง [dependencies]์— ์•„๋ฌด๊ฒƒ๋„ ์ถ”๊ฐ€๊ฐ€ ๋˜์ง€ ์•Š์€ ์ƒํƒœ์ธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์›๋ž˜๋Š” ๋‹ค๋ฅธ crate๋ฅผ ์“ฐ๊ธฐ ์œ„ํ•˜์—ฌ ์—ฌ๊ธฐ์— ์ง์ ‘ ์ถ”๊ฐ€ํ•ด์ค˜์•ผ ํ–ˆ์ง€๋งŒ, ์ด์ œ cargo add ๋ช…๋ น์–ด๋กœ ๊ฐ„๋‹จํžˆ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” Rust ์ˆ˜์น˜๊ณ„์‚ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ Peroxide๋ฅผ ์ถ”๊ฐ€ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

cargo add peroxide

cargo add peroxide

์ด์ œ Cargo.toml์„ ๋‹ค์‹œ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ถ”๊ฐ€๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Revisit Cargo.toml

Revisit Cargo.toml

์ด ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ cargo add๋Š” crate๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, ํŠน์ • features๋ฅผ ์„ ํƒํ•˜์—ฌ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Peroxide๋ฅผ ์ถ”๊ฐ€ํ•˜๋˜, netcdf ํ˜•์‹ ํŒŒ์ผ๋“ค์˜ I/O๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” nc feature๋ฅผ ์„ ํƒํ•˜์—ฌ ์ถ”๊ฐ€ํ•ด๋ด…์‹œ๋‹ค.

cargo add peroxide –features nc

cargo add peroxide –features nc

์ด์ œ ๋‹ค์‹œ Cargo.toml๋ฅผ ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ”๋€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Revisit Cargo.toml again

Revisit Cargo.toml again


2. [default] enum

โ€ƒโ€ƒRust์—์„œ #[derive(...)] ๊ตฌ๋ฌธ์€ struct์— trait์„ ์ฝ”๋“œ์—†์ด ๊ตฌํ˜„ํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ์•„์ฃผ ํŽธ๋ฆฌํ•œ ๊ตฌ๋ฌธ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ด์ฐจ์› ๋ฒกํ„ฐ ๊ตฌ์กฐ์ฒด๋ฅผ ๋งŒ๋“ ๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#[derive(Debug, Clone, Copy, Default)]
struct Vec2D {
    x: f64,
    y: f64
}

์ด๋ ‡๊ฒŒ ๊ตฌํ˜„ํ•˜๋ฉด Vec2D ๊ตฌ์กฐ์ฒด๋Š” .clone()๊ณผ ::default() method๋ฅผ ๊ฐ–๊ฒŒ ๋˜๊ณ , ์†Œ์œ ๊ถŒ ์ „๋‹ฌ ์—†์ด ๋ณต์‚ฌ๊ฐ€ ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ๊ฐ€๋Šฅํ•œ ์ด์œ ๋Š” x์™€ y ๋‘˜ ๋ชจ๋‘ ์ด๋ฏธ Clone, Copy, Default๊ฐ€ ๊ตฌํ˜„๋˜์–ด ์žˆ๋Š” f64ํƒ€์ž…์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด์ œ ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด๋ณด๋ฉด ์‹ค์ œ๋กœ default method์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

fn main() {
    let p = Vec2D::default();
    println!("{:?}", p);
}
// Output: Vec2D { x: 0.0, y: 0.0 }

๊ทธ๋Ÿฐ๋ฐ ๊ทธ๋™์•ˆ Rust์—์„œ๋Š” Enum ๋งŒํผ์€ #[derive(Default)]๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Enum์—์„œ Default๋ฅผ ๊ตฌํ˜„ํ• ๋•Œ์—๋Š” ํ•ญ์ƒ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ช…์‹œ์ ์œผ๋กœ ๊ตฌํ˜„ํ•ด์ฃผ์–ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

#[derive(Debug)]
enum Physicist {
    Newton,
    Einstein,
    Heisenberg,
    Feynman,
    Weinberg
}

impl Default for Physicist {
    fn default() -> Self {
        Physicist::Newton
    }
}

๊ณ ์ž‘ 5์ค„ ์ •๋„ ์ถ”๊ฐ€์— ์ง€๋‚˜์ง€์•Š์ง€๋งŒ ๋งค๋ฒˆ ์ด๋ฅผ ์†์œผ๋กœ ํ•ด์ฃผ๋Š” ๊ฒƒ์ด ๊ท€์ฐฎ์€ ์ผ์ž„์€ ๋ถ„๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ๋žŒ๋“ค์€ 2020๋…„๋ถ€ํ„ฐ ๊พธ์ค€ํžˆ ์ด ๊ธฐ๋Šฅ์˜ ์ถ”๊ฐ€๋ฅผ ์—ผ์›ํ–ˆ๊ณ , ๊ฒฐ๊ตญ 1.62.0์— ์™€์„œ์•ผ ๊ตฌํ˜„์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ 1.62.0 ๋ฒ„์ „๋ถ€ํ„ฐ๋Š” ์•„๊นŒ์˜ ์ฝ”๋“œ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#[derive(Debug, Default)]
enum Physicist {
    #[default]
    Newton,
    Einstein,
    Heisenberg,
    Feynman,
    Weinberg
}

3. Total Order for Floating point numbers

โ€ƒโ€ƒ์ˆ˜ํ•™์—์„œ Order ๊ฐœ๋…์€ ํฌ๊ฒŒ Partial order ์™€ Total order (Linear order) ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ„์–ด์ง‘๋‹ˆ๋‹ค. ํ›„์ž๋ฅผ ๋”ฐ๋ฅด๋Š” ์ง‘ํ•ฉ์€ ๋ชจ๋“  ์›์†Œ๋“ค์— ์ˆœ์„œ๊ฐ€ ์กด์žฌํ•ด์•ผ ํ•˜์ง€๋งŒ, ์ „์ž๋ฅผ ๋”ฐ๋ฅด๋Š” ์ง‘ํ•ฉ์€ ์–ด๋–ค ์›์†Œ๋“ค์€ ์ˆœ์„œ๋ฅผ ๋งค๊ธฐ์ง€ ๋ชปํ•˜๋”๋ผ๋„ ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค. Rust์—์„œ Order ๊ฐœ๋…์€ PartialOrd trait๊ณผ Ord trait์œผ๋กœ ๊ตฌํ˜„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. Ord trait์ด ๊ตฌํ˜„๋˜์–ด ์žˆ๋Š” ํƒ€์ž…๋“ค์€ cmp method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, PartialOrd trait์ด ๊ตฌํ˜„๋˜์–ด ์žˆ๋Š” ํƒ€์ž…๋“ค์€ partial_cmp๋กœ ๋น„๊ตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ๊นŒ์ง€๋Š” ์ˆ˜ํ•™์  ๊ฐœ๋…๊ณผ Rust์—์„œ์˜ ๊ตฌํ˜„์ด ๋™๋“ฑํ•œ ๊ฒƒ ๊ฐ™์ง€๋งŒ, ๊ฐ€์žฅ ํฐ ์ฐจ์ด๊ฐ€ ํ•˜๋‚˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ˆ˜ํ•™์—์„œ๋Š” ์‹ค์ˆ˜์ง‘ํ•ฉ์ด Total ordered ์ง‘ํ•ฉ์ด์ง€๋งŒ, Rust์—์„œ๋Š” ๋ถ€๋™์†Œ์ˆ˜์ ์œผ๋กœ ๋Œ€ํ‘œ๋˜๋Š” ์‹ค์ˆ˜๋“ค์€ Partial ordered ์ง‘ํ•ฉ์ด๋ผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด๋Š” ๋น„๋‹จ Rust๋งŒ์˜ ๋ฌธ์ œ๋Š” ์•„๋‹™๋‹ˆ๋‹ค. ๋ถ€๋™์†Œ์ˆ˜์ ์„ ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ์ค€์ธ IEEE 754์—์„œ ์ •์˜ํ•œ ๋ฌธ์ œ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ˆ˜ํ•™์—์„œ ์‹ค์ˆ˜๋Š” ์•„๋ฌด ์ˆซ์ž 2๊ฐœ๋ฅผ ์žก์œผ๋ฉด ๋ฌด์กฐ๊ฑด ๋น„๊ต๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ์ปดํ“จํ„ฐ์—์„œ ๋ถ€๋™์†Œ์ˆ˜์  ํƒ€์ž…์—๋Š” ๋‹จ์ˆœ ๋น„๊ต ๋ถˆ๊ฐ€๋Šฅํ•œ ์ˆซ์ž๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๋กœ NaN ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์  ๋•๋ถ„์— ์ •์ˆ˜์—์„œ๋Š” ์ž‘๋™ํ•˜๋Š” ์ฝ”๋“œ๊ฐ€ ๋ถ€๋™์†Œ์ˆ˜์ ์—์„œ๋Š” ์ปดํŒŒ์ผ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์ƒํ™ฉ์ด ๋งŽ์ด ๋ฐœ์ƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ IEEE 754๋Š” 2008๋…„ Revision์„ ํ†ตํ•ด NaN์„ ํฌํ•จํ•˜์—ฌ ์ˆœ์„œ๋ฅผ ๋งค๊ธฐ๋Š” Total order ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๊ณ  ๊ทธ๊ฒƒ์ด Rust 1.62.0์— ํฌํ•จ๋˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ Floating point๋ฅผ ๋น„๊ตํ•  ๋•Œ, total_cmp๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด NaN๊นŒ์ง€ ํฌํ•จํ•˜์—ฌ ์ „๋ถ€ ์ˆœ์„œ๋ฅผ ๋งค๊ธธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ˆœ์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • negative quiet NaN
  • negative signaling NaN
  • negative infinity
  • negative numbers
  • negative subnormal numbers
  • negative zero
  • positive zero
  • positive subnormal numbers
  • positive numbers
  • positive infinity
  • positive signaling NaN
  • positive quiet NaN.

์ด๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด๋ด…์‹œ๋‹ค. (peroxide๋Š” print๋ฅผ ์œ„ํ•˜์—ฌ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.)

use peroxide::fuga::*;

fn main() {
    let mut x = vec![1f64, 0f64, -0f64, f64::NAN, -f64::NAN, f64::INFINITY, f64::NEG_INFINITY];
    x.sort_by(|a, b| a.total_cmp(b));
    x.print();
}
// Output: [NaN, -inf, -0, 0, 1, inf, NaN]

์ด์™ธ์—๋„ ์—ฌ๋Ÿฌ ์—…๋ฐ์ดํŠธ๋“ค์ด ์žˆ์—ˆ์ง€๋งŒ ์œ„ 3๊ฐ€์ง€๊ฐ€ ๊ฐ€์žฅ ์ธ์ƒ๊นŠ์–ด์„œ ์ •๋ฆฌํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์—…๋ฐ์ดํŠธ๋“ค๋„ ๊ถ๊ธˆํ•˜์‹œ๋‹ค๋ฉด ๋‹ค์Œ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

]]>
๐Ÿซ ๊ณ ๋“ฑํ•™๊ต ์ˆ˜ํ•™์œผ๋กœ ์ดํ•ดํ•˜๋Š” ์„ ํ˜•ํšŒ๊ท€ https://axect.github.io/kr/posts/003_highschool_linreg/ Tue, 09 Mar 2021 22:01:39 +0900 https://axect.github.io/kr/posts/003_highschool_linreg/ <figure> <img src="https://axect.github.io/posts/images/breakthrough2016.gif" alt="2016 Breakthrough of the year"/> <figcaption style="text-align:center"> <p><a href="https://www.youtube.com/watch?v=2ncTCM7t79o">2016 Breakthrough of the year</a></p> </figcaption> </figure> <p>โ€ƒโ€ƒ์„ธ๊ณ„์—์„œ ๊ฐ€์žฅ ์œ ๋ช…ํ•˜๊ณ  ๊ถŒ์œ„์žˆ๋Š” ๊ณผํ•™์ €๋„์ธ ์‚ฌ์ด์–ธ์Šค(Science)์—์„œ๋Š” ๋งค๋…„ ๊ทธ ํ•ด์˜ ๊ฐ€์žฅ ์„ฑ๊ณต์ ์ด์—ˆ๋‹ค๊ณ  ์—ฌ๊ฒจ์ง€๋Š” ๊ณผํ•™์„ฑ๊ณผ๋ฅผ ๋ฐœํ‘œํ•ฉ๋‹ˆ๋‹ค. 2016๋…„ 12์›” 22์ผ์—๋„ <span style="background-color: rgba(255, 255, 0, 0.534);"> <b>2016 Breakthrough of the year</b> </span> ๋ฅผ ๋ฐœํ‘œํ•˜๋ฉด์„œ 2016๋…„์— ์žˆ์—ˆ๋˜ ๊ณผํ•™ ์„ฑ๊ณผ ์ค‘ ๊ฐ€์žฅ ๋ˆˆ์—ฌ๊ฒจ๋ด์•ผ ํ•  10๊ฐœ์˜ ๊ณผํ•™์„ฑ๊ณผ๋ฅผ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. ์ˆœ์œ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.${}^{[1]}$</p> <center> <div class="animated-border-quote"> <blockquote> <p style="text-align:left"><strong>1. ์ค‘๋ ฅํŒŒ ๋ฐœ๊ฒฌ</strong><br> <strong>2. ์™ธ๊ณ„ํ–‰์„ฑ &lsquo;ํ”„๋ก์‹œ๋งˆb&rsquo; ๋ฐœ๊ฒฌ</strong><br> <strong>3. ์ธ๊ณต์ง€๋Šฅ &lsquo;์•ŒํŒŒ๊ณ &rsquo;์™€ ์ด์„ธ๋Œ 9๋‹จ์˜ ๋Œ€๊ฒฐ</strong><br> <strong>4. ์„ธํฌ ๋…ธํ™” ๋ฐ ํšŒ์ถ˜ ์—ฐ๊ตฌ</strong><br> <strong>5. ์œ ์ธ์›์˜ ๋งˆ์Œ ์ฝ๊ธฐ ๋Šฅ๋ ฅ ์—ฐ๊ตฌ</strong><br> <strong>6. ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ์„ค๊ณ„ ๊ธฐ์ˆ </strong><br> <strong>7. ๋ฐฐ์•„์ค„๊ธฐ์„ธํฌ๋กœ ๋งŒ๋“  ์ธ๊ณต๋‚œ์ž</strong><br> <strong>8. ์ดˆ๊ธฐ ์ธ๋ฅ˜์˜ ํ™•์‚ฐ ๊ฒฝ๋กœ ์—ฐ๊ตฌ</strong><br> <strong>9. ํœด๋Œ€์šฉ DNA ๋ถ„์„๊ธฐ</strong><br> <strong>10. ์ดˆ๋ฐ•๋ง‰ ๋ฉ”ํƒ€๋ Œ์ฆˆ ๊ธฐ์ˆ </strong></p> 2016 Breakthrough of the year

2016 Breakthrough of the year

โ€ƒโ€ƒ์„ธ๊ณ„์—์„œ ๊ฐ€์žฅ ์œ ๋ช…ํ•˜๊ณ  ๊ถŒ์œ„์žˆ๋Š” ๊ณผํ•™์ €๋„์ธ ์‚ฌ์ด์–ธ์Šค(Science)์—์„œ๋Š” ๋งค๋…„ ๊ทธ ํ•ด์˜ ๊ฐ€์žฅ ์„ฑ๊ณต์ ์ด์—ˆ๋‹ค๊ณ  ์—ฌ๊ฒจ์ง€๋Š” ๊ณผํ•™์„ฑ๊ณผ๋ฅผ ๋ฐœํ‘œํ•ฉ๋‹ˆ๋‹ค. 2016๋…„ 12์›” 22์ผ์—๋„ 2016 Breakthrough of the year ๋ฅผ ๋ฐœํ‘œํ•˜๋ฉด์„œ 2016๋…„์— ์žˆ์—ˆ๋˜ ๊ณผํ•™ ์„ฑ๊ณผ ์ค‘ ๊ฐ€์žฅ ๋ˆˆ์—ฌ๊ฒจ๋ด์•ผ ํ•  10๊ฐœ์˜ ๊ณผํ•™์„ฑ๊ณผ๋ฅผ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. ์ˆœ์œ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.${}^{[1]}$

1. ์ค‘๋ ฅํŒŒ ๋ฐœ๊ฒฌ
2. ์™ธ๊ณ„ํ–‰์„ฑ ‘ํ”„๋ก์‹œ๋งˆb’ ๋ฐœ๊ฒฌ
3. ์ธ๊ณต์ง€๋Šฅ ‘์•ŒํŒŒ๊ณ ’์™€ ์ด์„ธ๋Œ 9๋‹จ์˜ ๋Œ€๊ฒฐ
4. ์„ธํฌ ๋…ธํ™” ๋ฐ ํšŒ์ถ˜ ์—ฐ๊ตฌ
5. ์œ ์ธ์›์˜ ๋งˆ์Œ ์ฝ๊ธฐ ๋Šฅ๋ ฅ ์—ฐ๊ตฌ
6. ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ์„ค๊ณ„ ๊ธฐ์ˆ 
7. ๋ฐฐ์•„์ค„๊ธฐ์„ธํฌ๋กœ ๋งŒ๋“  ์ธ๊ณต๋‚œ์ž
8. ์ดˆ๊ธฐ ์ธ๋ฅ˜์˜ ํ™•์‚ฐ ๊ฒฝ๋กœ ์—ฐ๊ตฌ
9. ํœด๋Œ€์šฉ DNA ๋ถ„์„๊ธฐ
10. ์ดˆ๋ฐ•๋ง‰ ๋ฉ”ํƒ€๋ Œ์ฆˆ ๊ธฐ์ˆ 

2016๋…„์—๋Š” ์ค‘๋ ฅํŒŒ ๊ด€์ธก์ด๋ผ๋Š” ์—„์ฒญ๋‚œ ์—…์ ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์•„์ธ์Šˆํƒ€์ธ์ด 1916๋…„ ์ผ๋ฐ˜์ƒ๋Œ€์„ฑ์ด๋ก ์„ ๋ฐœํ‘œํ•œ ์ง€ ์ •ํ™•ํžˆ 100๋…„์ด ๋˜๋Š” ํ•ด์— ์ด๋ฃฌ, ์ธ๋ฅ˜์˜ ๋ฌผ๋ฆฌํ•™ ์—ญ์‚ฌ์— ๊ธธ์ด ๋‚จ์„ ์—…์ ์ด์—ˆ๊ณ , ์‹ค์ œ๋กœ ์ค‘๋ ฅํŒŒ ๊ด€์ธก์— ์ง€๋Œ€ํ•œ ๊ณต์„ ์„ธ์šด ์„ธ ๋ช…์˜ ๊ณผํ•™์ž๋Š” 1๋…„๋งŒ์— 2017๋…„ ๋…ธ๋ฒจ ๋ฌผ๋ฆฌํ•™์ƒ์„ ์ˆ˜์ƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ์— ๊ฑธ๋งž๊ฒŒ ์—ฌ๋Ÿฌ ๋งค์ฒด์—์„œ๋„ ๋™์‹œ์— ๋ณด๋„ํ•˜์—ฌ, ๊ณผํ•™์— ๊ด€์‹ฌ์žˆ๋Š” ์‚ฌ๋žŒ์ด๋ผ๋ฉด 2016๋…„์„ ์ค‘๋ ฅํŒŒ์˜ ํ•ด๋กœ ๊ธฐ์–ตํ•˜๊ณ  ์žˆ์„ ๊ฒ๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ๋งํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒƒ์€ 3๋ฒˆ์ž…๋‹ˆ๋‹ค. ์•„๋งˆ๋„ ๋Œ€์ค‘์—๊ฒŒ๋Š” 1๋ฒˆ๋ณด๋‹ค 3๋ฒˆ์ด ๋” ์œ ๋ช…ํ•œ ์‚ฌ๊ฑด์ด์—ˆ๊ณ , ํŠนํžˆ ํ•œ๊ตญ์ธ๋“ค์—๊ฒŒ๋Š” ๋”๋”์šฑ ์ž˜ ์•Œ๋ ค์ง„ ์‚ฌ๊ฑด์ž…๋‹ˆ๋‹ค. ์ค‘๋ ฅํŒŒ์˜ ๋ฐœ๊ฒฌ์ด ์ดํ›„ ์ฒœ๋ฌธํ•™๊ณผ ๋ฌผ๋ฆฌํ•™์˜ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ๋ฐฉํ–ฅ์„ ์ œ์‹œํ–ˆ๋‹ค๋ฉด, ์•ŒํŒŒ๊ณ ์˜ ๋ฐ”๋‘‘ ์ •๋ณต์€ ์ดํ›„ ์šฐ๋ฆฌ์˜ ๋ชจ๋“  ์‚ถ์— ๋จธ์‹ ๋Ÿฌ๋‹์„ ์นจํˆฌ์‹œํ‚ค๋Š” ๊ณ„๊ธฐ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ๋Š” ์–ด๋А ์„œ๋น„์Šค์—์„œ๋‚˜ ๋จธ์‹ ๋Ÿฌ๋‹์„ ์œ„์‹œํ•œ ์ธ๊ณต์ง€๋Šฅ์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๋ง์ด ๋“ค๋ ค์˜ค๊ณ , ๋Œ€ํ•™๋“ค์€ ์•ž๋‹คํˆฌ์–ด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ฐ ์ธ๊ณต์ง€๋Šฅ ๊ด€๋ จ ํ•™๊ณผ๋ฅผ ์„ค์น˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋จธ์‹ ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•ด์•ผ๊ฒ ๋‹ค๊ณ  ๋งˆ์Œ๋จน๊ณ  ๊ฒ€์ƒ‰ํ•ด๋ณด๋ฉด, ์—ฌ๋Ÿฌ ๋ธ”๋กœ๊ทธ์—์„œ ํ…์„œํ”Œ๋กœ์šฐ(Tensorflow), ํŒŒ์ดํ† ์น˜(Pytorch), ์ผ€๋ผ์Šค(Keras) ๋“ฑ์œผ๋กœ ์†๊ธ€์”จ ์ธ์‹ํ•˜๊ธฐ ๋“ฑ์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ์‘์šฉ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์™œ ๊ทธ๋ ‡๊ฒŒ ๋˜๋Š” ๊ฑด์ง€์— ๋Œ€ํ•ด์„œ ์ฐพ๋‹ค๋ณด๋ฉด ์ˆ˜ ๋งŽ์€ ๋…ผ๋ฌธ๋“ค์ด ๋“ฑ์žฅํ•˜๋ฉฐ ์—ฌ๋Ÿฌ ์ „๋ฌธ์šฉ์–ด๋“ค์ด ์Ÿ์•„์ ธ์„œ ์ผ๋ฐ˜์ธ๋“ค์—๊ฒŒ๋Š” ์ ‘๊ทผ ์ž์ฒด๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๊ฒŒ์‹œ๋ฌผ์—์„œ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด ๋˜๋Š” MLE(Maximum Likelihood Estimation; ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„ ์ถ”์ •) ์— ๋Œ€ํ•ด ๊ณ ๋“ฑํ•™๊ต ์ˆ˜์ค€์˜ ์ˆ˜ํ•™ ๋งŒ ๊ฐ€์ง€๊ณ  ๋‹ค๋ค„๋ณด๋ ค ํ•ฉ๋‹ˆ๋‹ค.


1. ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ํ™”

์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฐœ๋…

  • ๊ณ ๊ต ๊ต๊ณผ๊ณผ์ • - ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„

1.1. ๊ธฐ๋ณธ ํ‘œ๊ธฐ๋ฒ•

โ€ƒโ€ƒMLE๋ฅผ ์‹œ์ž‘ํ•˜๊ธฐ์— ์•ž์„œ ๊ณ ๋“ฑํ•™๊ต๋•Œ ๋ฐฐ์šด ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์—ฐ์Šตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ํ‘œ๊ธฐ ๋ฐฉ์‹์€ ์•„์ฃผ ์œ ๋ช…ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ์ฑ…์ธ Bishop์˜ PRML(Pattern Recognition and Machine Learning; ํŒจํ„ด์ธ์‹๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹)์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํ‘œ๊ธฐ๋ฒ• ์„ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„ ๊ณ ๋“ฑํ•™๊ต ํ‘œ๊ธฐ๋ฐฉ์‹๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค๋งŒ, ์กฐ๊ธˆ์˜ ์ฐจ์ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ €ํฌ๊ฐ€ ์‚ฌ์šฉํ•  ํ‘œ๊ธฐ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๊ฐœ๋… ํ‘œ๊ธฐ๋ฒ• ๊ณ ๊ต ํ‘œ๊ธฐ
๋‹จ์ผ ์Šค์นผ๋ผ ํ™•๋ฅ ๋ณ€์ˆ˜ $x \in \mathbb{R}$ ์ง‘ํ•ฉ $X$
ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜ $p(x) \geq 0$ $P(X=x) \geq 0$
๊ฒฐํ•ฉํ™•๋ฅ ๋ถ„ํฌ $p(x,y)$
์กฐ๊ฑด๋ถ€ํ™•๋ฅ ๋ถ„ํฌ $p(x | y)$
๊ท ๋“ฑ๋ถ„ํฌ $\text{Unif}(x|a,b)$
์ดํ•ญ๋ถ„ํฌ $\text{Bin}(x|N, \mu)$ $B(n, p)$
์ •๊ทœ๋ถ„ํฌ $\mathcal{N}(x|\mu,\sigma^2)$ $N(m,\sigma^2)$

์œ„์—์„œ ๋ณด๋ฉด ์•Œ๊ฒ ์ง€๋งŒ, ๊ณ ๋“ฑํ•™๊ต์—์„œ ๋‹ค๋ฃฌ ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜๋Š” ์˜ค๋กœ์ง€ ๋‹จ์ผ ํ™•๋ฅ ๋ถ„ํฌ์ผ๋ฟ, ๊ฒฐํ•ฉ์ด๋‚˜ ์กฐ๊ฑด๋ถ€ ๋“ฑ์˜ ๋ณ€์ˆ˜๊ฐ€ 2๊ฐœ ์ด์ƒ์ธ ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ๊ทธ์— ์ƒ์‘ํ•˜๋Š” ํ™•๋ฅ ์ธ ๊ณฑ์‚ฌ๊ฑด์˜ ํ™•๋ฅ ($P(A\cap B)$์ด๋‚˜ ์กฐ๊ฑด๋ถ€ํ™•๋ฅ ($P(A|B)$) ์ž์ฒด๋Š” ๋‹ค๋ฃจ์—ˆ์œผ๋‹ˆ ๋น„์Šทํ•œ ๋งฅ๋ฝ์œผ๋กœ ์ ‘๊ทผํ•˜๋ฉด ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์กฐ๊ฑด๋ถ€ํ™•๋ฅ ๋ถ„ํฌ์™€ ๊ฒฐํ•ฉํ™•๋ฅ ๋ถ„ํฌ ์‚ฌ์ด์—๋Š” ๋‹ค์Œ์˜ ๊ด€๊ณ„์‹์ด ์„ฑ๋ฆฝํ•ฉ๋‹ˆ๋‹ค.

$$ p(x|y) = \frac{p(x,y)}{p(y)} $$

์ด๋Š” ๊ณ ๋“ฑํ•™๊ต์—์„œ ๋‹ค๋ฃจ์—ˆ๋˜ ์กฐ๊ฑด๋ถ€ํ™•๋ฅ ์˜ ์ •์˜์™€ ๋ถ€ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

$$ P(X | Y) = \frac{P(X\cap Y)}{P(Y)} $$

์ด์ฏค์—์„œ ํ—ท๊ฐˆ๋ฆฌ๊ธฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๋„๋Œ€์ฒด, ํ™•๋ฅ ๋ถ„ํฌ์™€ ํ™•๋ฅ ์˜ ์ฐจ์ด๋Š” ๋ญ˜๊นŒ์š”?

ย 

1.2. ํ™•๋ฅ  vs ํ™•๋ฅ ๋ถ„ํฌ

๋‹ค์Œ ๊ณ ๋“ฑํ•™๊ต ๋ฌธ์ œ๋ฅผ ๋ด…์‹œ๋‹ค.

Q. ์ฃผ์‚ฌ์œ„๋ฅผ 720๋ฒˆ ๋˜์ ธ์„œ 1์˜ ๋ˆˆ์ด 140๋ฒˆ ์ด์ƒ ๋‚˜์˜ฌ ํ™•๋ฅ ์€ ์–ผ๋งˆ์ธ๊ฐ€?

์ด ๋ฌธ์ œ๋Š” ํฐ ์ˆ˜์˜ ๋ฒ•์น™์„ ๋‹ค๋ฃจ๋Š” ์ „ํ˜•์ ์ธ ๋ฌธ์ œ๋กœ ์ดํ•ญ๋ถ„ํฌ์—์„œ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ตฌํ•œ ํ›„ ์ •๊ทœ๋ถ„ํฌ๋กœ ๊ทผ์‚ฌํ•˜์—ฌ ํ‘ธ๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ฃผ์‚ฌ์œ„์˜ 1์˜ ๋ˆˆ์ด ๋‚˜์˜ค๋Š” ํšŸ์ˆ˜๋ฅผ ํ™•๋ฅ ๋ณ€์ˆ˜ $X$๋ผ ๋‘์ž. ์ด๋•Œ, ์ฃผ์‚ฌ์œ„๋ฅผ ๋˜์ง€๋Š” ๊ฒƒ์€ ๋ชจ๋‘ ๋…๋ฆฝ์‹œํ–‰์ด๋ฏ€๋กœ ์ด ํ™•๋ฅ ๋ณ€์ˆ˜๋Š” ์ดํ•ญ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ฒŒ ๋œ๋‹ค. $$p(x) = \text{Bin}(x | 720,\frac{1}{6})$$ ์ดํ•ญ๋ถ„ํฌ $\text{Bin}(x | n,p)$์˜ ํ‰๊ท ์€ $np$์ด๋ฉฐ ๋ถ„์‚ฐ์€ $npq$์ด๋ฏ€๋กœ, ํ‰๊ท ์€ $120$, ํ‘œ์ค€ํŽธ์ฐจ๋Š” $10$์ด๋‹ค. ์ด๋•Œ, $720$์€ ์ถฉ๋ถ„ํžˆ ํฐ ์ˆ˜์ด๋ฏ€๋กœ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ๋‹ค์Œ์˜ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ๋‹ค. $$p(x) \simeq \mathcal{N}(x | 120, 10^2)$$ ์ด๋•Œ, 140๋ฒˆ ์ด์ƒ ๋‚˜์˜ฌ ํ™•๋ฅ ์„ ๋‚˜ํƒ€๋‚ด๊ณ , ํ‘œ์ค€ํ™”ํ•˜๋Š” ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. $$p(x \geq 300) = p(z \geq \frac{140 - 120}{10}) = p(z \geq 2)$$ ์ด๋ฅผ ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌํ‘œ๋กœ ๊ณ„์‚ฐํ•˜๋ฉด $0.0228$์ด ๋‚˜์˜จ๋‹ค.

์–ธ๋œป๋ณด๋ฉด ์œ„ ๋ฌธ์ œ๋Š” 720๋ฒˆ์˜ ์‹œํ–‰์ด ์ „์ œ๋˜์–ด์žˆ์œผ๋ฏ€๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฌธ์ œ๋ฅผ ํ’€๋•Œ์—๋Š” ์˜ค๋กœ์ง€ ๋‹จ์ผ ํ™•๋ฅ ๋ณ€์ˆ˜๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ๊ต์—์„œ๋Š” ์ด๋Ÿฐ ๋ฌธ์ œ๋“ค๋งŒ ๋‹ค๋ฃจ๊ธฐ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š”๋ฐ์— ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํ™•๋ฅ ๋ณ€์ˆ˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์ดํ•ดํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์œ„ ๋ฌธ์ œ๋Š” ์ „ํ˜€ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฌ ๋ฌธ์ œ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์˜ค๋กœ์ง€ ์ •๊ทœ๋ถ„ํฌ์— ๊ทผ๊ฑฐํ•œ ์ˆ˜ํ•™์  ํ™•๋ฅ ์„ ๋ฌป๋Š” ๋ฌธ์ œ์ผ ๋ฟ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฌ ์ดํ•ญ๋ถ„ํฌ ๋ฌธ์ œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Q. ์ฃผ์‚ฌ์œ„๋ฅผ 120๋ฒˆ ๋˜์ง€๋Š” ์‹œํ–‰์„ 6๋ฒˆ ๋ฐ˜๋ณตํ•˜์—ฌ 1์˜ ๋ˆˆ์ด ๋‚˜์˜จ ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์‹œํ–‰ 1 2 3 4 5 6
ํšŸ์ˆ˜ 10 23 18 15 26 21

์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ฃผ์‚ฌ์œ„๋ฅผ 720๋ฒˆ ๋˜์กŒ์„ ๋•Œ, 1์˜ ๋ˆˆ์ด 140๋ฒˆ ์ด์ƒ ๋‚˜์˜ฌ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜์‹œ์˜ค.

๋ฌธ์ œ๊ฐ€ ์‚ฌ๋ญ‡ ๋‹ฌ๋ผ์กŒ์Šต๋‹ˆ๋‹ค. ์•ž์„œ์„œ ํ’€์—ˆ๋˜ ๊ณ ๊ต๋ฌธ์ œ๋Š” ์˜ค๋กœ์ง€ ์ˆ˜ํ•™์  ํ™•๋ฅ ์„ ๊ฐ€์ •ํ•˜์—ฌ ์ฃผ์‚ฌ์œ„์˜ 1์˜ ๋ˆˆ์ด ๋‚˜์˜ฌ ํ™•๋ฅ ์€ $1/6$์œผ๋กœ ๊ณ ์ •ํ•˜๊ณ  ์ดํ•ญ๋ถ„ํฌ๋กœ ํ’€์—ˆ์ง€๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ ์ดํ•ญ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•˜์—ฌ ํ’€์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ถ”์ •๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ๊ฐ ์‹œํ–‰ ํšŸ์ˆ˜๋ฅผ ํ™•๋ฅ ๋ณ€์ˆ˜ $x_1,,x_2,,\cdots,,x_6$์œผ๋กœ ๋‘”๋‹ค.
  2. ๊ฐ ํ™•๋ฅ ๋ณ€์ˆ˜๋“ค์€ ๋ชจ๋‘ ๋…๋ฆฝ์ด๋ผ ๊ฐ€์ •ํ•˜๋ฉฐ ๋™์ผํ•œ ์ดํ•ญ๋ถ„ํฌ $\text{Bin}(120,p)$๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.
  3. ์ตœ๋Œ€๊ฐ€๋Šฅ๋„์ถ”์ •์ด๋‚˜ ๋ฒ ์ด์ฆˆ ์ถ”๋ก ์„ ์ด์šฉํ•˜์—ฌ $p$๋ฅผ ์ถ”์ •ํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ๋Š” 3๋ฒˆ์—์„œ ์ตœ๋Œ€๊ฐ€๋Šฅ๋„์ถ”์ •์„ ์ด์šฉํ• ํ…๋ฐ, ์•„์ง ์ตœ๋Œ€๊ฐ€๋Šฅ๋„์ถ”์ •์„ ๋ฐฐ์šฐ์ง€ ์•Š์•˜์œผ๋‹ˆ, ๊ณ„์‚ฐ๊ฒฐ๊ณผ๋งŒ ๋ช…์‹œํ•˜๋ฉด $p$๋Š” $113/720$์ด๋ผ๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•˜๋ฉด ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์ด ์ด์ „๊ณผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ํ‰๊ท ์€ $113$์œผ๋กœ ์ฃผ์–ด์ง€๊ณ , ํ‘œ์ค€ํŽธ์ฐจ๋Š” ์•ฝ $9.76$์ •๋„๋กœ ์ฃผ์–ด์ง‘๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ p(x \geq 140) = p(z \geq \frac{140 - 113}{9.76}) \simeq p(z \geq 2.77) \simeq 0.0028 $$

์•ž์„  ๊ฒฐ๊ณผ์™€ ๊ฑฐ์˜ 10๋ฐฐ์— ํ•ด๋‹นํ•˜๋Š” ์ฐจ์ด๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ 6๊ฐœ ๋ฐ–์— ๋˜์ง€ ์•Š์•„ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ 10000๊ฐœ ์ •๋„๋กœ ๋Š˜๋ฆฐ ํ›„์— ๊ณ„์‚ฐํ•œ ํ™•๋ฅ ์€ $0.0220$์ •๋„๋กœ ์ˆ˜ํ•™์  ํ™•๋ฅ ๊ณผ ์ข€ ๋” ๋น„์Šทํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ Julia ์ฝ”๋“œ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# https://github.com/Axect/Blog_Code
using Distributions

b = Binomial(120, 1/6);     # ์ดํ•ญ๋ถ„ํฌ ์„ ์–ธ
x = rand(b, 10000);         # ๋ฐ์ดํ„ฐ ์ถ”์ถœ
y = x ./ 120;               # ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜

p = mean(y);                # ์ตœ๋Œ€๊ฐ€๋Šฅ๋„์ถ”์ •์œผ๋กœ ๊ตฌํ•œ p

b2 = Binomial(720, p);      # ๊ตฌํ•œ ์ดํ•ญ๋ถ„ํฌ
m = mean(b2);               # ํ‰๊ท 
ฯƒ = std(b2);                # ํ‘œ์ค€ํŽธ์ฐจ
t = (140 - m) / ฯƒ           # 140์˜ ํ‘œ์ค€ํ™”

n = Normal(0,1)             # ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ
result = 1 - cdf(n, t)      # ๊ฒฐ๊ณผ
@show result                # ๊ฒฐ๊ณผ ์ถœ๋ ฅ

๊ทธ๋Ÿผ ์ด์ œ ๋ณธ๊ฒฉ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ๋ถ„ํฌ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ค„๋ด…์‹œ๋‹ค.

ย 

1.3. ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ๋ถ„ํฌ

์•ž์„œ ๋ดค๋‹ค์‹œํ”ผ, ๋ฐ์ดํ„ฐ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํ™•๋ฅ ๋ณ€์ˆ˜๋“ค์˜ ์ง‘ํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ \mathcal{D} = \left\{x_1,\,x_2,\,\cdots,\,x_n\right\} $$

์‹ค์ œ๋กœ ๋‹ค๋ฃฐ๋•Œ์—๋Š” ์ง‘ํ•ฉ๋ณด๋‹ค๋Š” ๋ฒกํ„ฐ๋กœ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์ด ๋” ํšจ์œจ์ ์ด๋ฏ€๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฒกํ„ฐ๋กœ ํ‘œ๊ธฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์„ธ๋กœ๋กœ ํ‘œ๊ธฐํ•œ ์ด์œ ๋Š” ๋งŽ์€ ์ˆ˜์น˜ ํ”„๋กœ๊ทธ๋žจ์—์„œ ์—ด๋ฒกํ„ฐ ํ˜•์‹์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ธ๋ฐ, ๊ทธ๋ƒฅ ๋ฒกํ„ฐ๋ฅผ ์„ธ๋กœ๋กœ ํ‘œ๊ธฐํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.)

$$ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} $$

์ผ๋‹จ, ์—ฌ๊ธฐ์„œ๋Š” ์ดํ•ด๋ฅผ ์‰ฝ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ง‘ํ•ฉ์œผ๋กœ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ๋ถ„ํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์—ฌ๋Ÿฌ ํ™•๋ฅ ๋ณ€์ˆ˜๋“ค์˜ ๊ฒฐํ•ฉ๋ถ„ํฌ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ p(\mathcal{D}) = p(x_1,\,x_2,\,\cdots,\,x_n) $$

์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ์—๋Š” ์—ฌ๋Ÿฌ ์ž„์˜์˜ ํ™•๋ฅ ๋ณ€์ˆ˜๋“ค์˜ ๊ฒฐํ•ฉ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ์ผ์ด๋ฏ€๋กœ ์ €ํฌ๋Š” ๋ฐ์ดํ„ฐ์— ์•„์ฃผ ๊ฐ•๋ ฅํ•œ ์ „์ œ์กฐ๊ฑด์„ ๋ถ€์—ฌํ•  ๊ฒ๋‹ˆ๋‹ค. ๋ฐ”๋กœ, i.i.d.(independent and identically distributed; ๋…๋ฆฝํ•ญ๋“ฑ๋ถ„ํฌ) ์ž…๋‹ˆ๋‹ค.

i.i.d๋Š” ๋ชจ๋“  ํ™•๋ฅ ๋ณ€์ˆ˜๋“ค์ด ๋…๋ฆฝ์ด๋ฉฐ, ๋™์ผํ•œ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฐ€์ •์œผ๋กœ ๊ต‰์žฅํžˆ ๊ฐ•๋ ฅํ•œ ๊ฐ€์ •์ด์ง€๋งŒ ์˜์™ธ๋กœ ์ž์—ฐ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” ํฐ ๋ฌธ์ œ๊ฐ€ ์—†๋Š” ๊ฐ€์ •์ž…๋‹ˆ๋‹ค. i.i.d๋ฅผ ๊ฐ€์ •ํ•˜๋ฉด ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์•„์ฃผ ๋งŽ์ด ๊ฐœ์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ p(\mathcal{D}) = p(x_1)\times p(x_2)\times \cdots \times p(x_n) = \prod_{i=1}^n p(x_i) $$

๋งˆ์ง€๋ง‰์— ์žˆ๋Š” ์›์ฃผ์œจ $\pi$์˜ ๋Œ€๋ฌธ์ž์ธ $\Pi$๋Š” $i=1$๋ถ€ํ„ฐ $i=n$๊นŒ์ง€์˜ ๊ณฑ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด์ œ ์ € ํ™•๋ฅ ๋ณ€์ˆ˜๋“ค์ด ์–ด๋–ค ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง€๋Š”์ง€ ์•Œ๋ฉด ๊ทธ์˜ ๊ณฑ์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


2. ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™๊ณผ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ

์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฐœ๋…

  • ๊ณ ๊ต ๊ต๊ณผ๊ณผ์ • - ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„

โ€ƒโ€ƒ ์ง€๊ธˆ๊นŒ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ฅ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์•˜์œผ๋‹ˆ, ์ด์ œ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ์„ ํ˜•์œผ๋กœ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ์„์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋ คํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฐ˜๋“œ์‹œ ์•Œ์•„์•ผํ•  ๋‘ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ํ™•๋ฅ  ๋ฒ•์น™์ด ์žˆ๋Š”๋ฐ, ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™(์ „ํ™•๋ฅ  ์ •๋ฆฌ)๊ณผ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ ์ž…๋‹ˆ๋‹ค.

ย 

2.1. ํŒŒํ‹ฐ์…˜ (Partition)

์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™๊ณผ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ์—๋Š” ์ค‘์š”ํ•œ ์ „์ œ์กฐ๊ฑด์ด ์žˆ๋Š”๋ฐ, ๋ฐ”๋กœ ํŒŒํ‹ฐ์…˜(Partition) ์„ ์ฐพ์•„์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํŒŒํ‹ฐ์…˜์ด๋ผ๋Š” ์ด๋ฆ„์€ ๋“ค์–ด๋ณด์‹  ๋ถ„๋„ ์žˆ์„ ํ…๋ฐ, ์•„๋งˆ ์ œ์ผ ์ž˜ ์•Œ๋ ค์ง„ ํŒŒํ‹ฐ์…˜์€ ์•„๋ž˜ ์‚ฌ์ง„์ผ ๊ฒ๋‹ˆ๋‹ค.

ํšŒ์‚ฌ ์นธ๋ง‰์ด (์ถœ์ฒ˜: PIXNIO)

ํšŒ์‚ฌ ์นธ๋ง‰์ด (์ถœ์ฒ˜: PIXNIO)

๊ทธ ์™ธ์—๋„ ๋””์Šคํฌ ํŒŒํ‹ฐ์…˜์ด๋‚˜ ๊ณต๊ฐ„์„ ๋ถ„ํ• ํ•˜๋Š” ์žฅ์‹์žฅ ์—ญํ• ์„ ํ•˜๋Š” ํŒŒํ‹ฐ์…˜ ๋“ฑ ์—ฌ๋Ÿฌ ํŒŒํ‹ฐ์…˜๋“ค์ด ์žˆ๋Š”๋ฐ, ์ด๋“ค์€ ๋ชจ๋‘ ๊ณตํ†ต์ ์ธ ์„ฑ์งˆ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ๋ฐ”๋กœ, ์–ด๋–ค ๊ฒƒ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๋‚˜๋ˆˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ†ต๊ณ„์—์„œ์˜ ํŒŒํ‹ฐ์…˜์€ ํ‘œ๋ณธ ๊ณต๊ฐ„์„ ์ „๋ถ€ ๊ฒน์น˜์ง€ ์•Š๊ฒŒ ๋ถ„ํ• ํ•˜๋Š” ์‚ฌ๊ฑด๋“ค์˜ ์ง‘ํ•ฉ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ๋Š” $A_1,\,A_2,\,A_3,\,A_4$๊ฐ€ $S$์˜ ํŒŒํ‹ฐ์…˜์ž…๋‹ˆ๋‹ค.

ํ†ต๊ณ„์—์„œ์˜ ํŒŒํ‹ฐ์…˜

ํ†ต๊ณ„์—์„œ์˜ ํŒŒํ‹ฐ์…˜

์ด๋ฅผ ์ˆ˜ํ•™์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.


ํŒŒํ‹ฐ์…˜

ํ‘œ๋ณธ ๊ณต๊ฐ„ $S$ ์— ๋Œ€ํ•˜์—ฌ ๊ทธ ๋ถ€๋ถ„์ง‘ํ•ฉ๋“ค์˜ ์ˆ˜์—ด $\left\{A_i \right\}_{i=1}^n$์ด ๋‹ค์Œ ์„ฑ์งˆ๋“ค์„ ๋งŒ์กฑํ•˜๋ฉด $S$์˜ ํŒŒํ‹ฐ์…˜(Partition)์ด๋ผ ๋ถ€๋ฅธ๋‹ค.

  1. $A_i \cap A_j = \emptyset \quad (i,j=1,2,\cdots,n)$
  2. $A_1 \cup A_2 \cdots \cup A_n = S$

ย 

2.2. ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™

์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™์€, ์šฉ์–ด๋Š” ์–ด๋ ค์›Œ๋ณด์ด์ง€๋งŒ ์‚ฌ์‹ค ์•„์ฃผ ๊ฐ„๋‹จํ•œ ๋ฒ•์น™์ž…๋‹ˆ๋‹ค. ์œ„ ํŒŒํ‹ฐ์…˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด $S$์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ $B$๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ B = (A_1 \cap B) \cup (A_2 \cap B) \cup (A_3 \cap B) \cup (A_4 \cap B) $$

์ด๋•Œ, $A_1,A_2,A_3,A_4$๋Š” ๋ชจ๋‘ ํŒŒํ‹ฐ์…˜์ด๋ฏ€๋กœ ๊ต์ง‘ํ•ฉ์ด ๊ณต์ง‘ํ•ฉ์ด๋ฉฐ ๋”ฐ๋ผ์„œ ์‚ฌ๊ฑด $B$์˜ ํ™•๋ฅ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ p(B) = \sum_{i=1}^4 p(A_i \cap B) $$

์ด๋ฅผ ํ™•๋ฅ ์˜ ๊ณฑ์…ˆ์ •๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ p(B) = \sum_{i=1}^4 p(A_i \cap B) = \sum_{i=1}^4 p(B | A_i)p(A_i) $$

์ด๊ฒƒ์ด ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์š”์•ฝํ•˜๋ฉด ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™์€ ์–ด๋–ค ํŒŒํ‹ฐ์…˜์ด ์ •์˜๋œ๋‹ค๋ฉด, ์ž„์˜์˜ ์‚ฌ๊ฑด์— ๋Œ€ํ•ด์„œ ๊ทธ ์‚ฌ๊ฑด์ด ์ผ์–ด๋‚  ํ™•๋ฅ ์„ ํŒŒํ‹ฐ์…˜๊ณผ์˜ ๊ฒฐํ•ฉ ํ™•๋ฅ ์˜ ํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ˆ˜ํ•™์ ์œผ๋กœ ์ •์˜ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.


์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™

ํ‘œ๋ณธ ๊ณต๊ฐ„ $S$์— ๋Œ€ํ•ด, $\left\{A_i\right\}_{i=1}^n$์ด $S$์˜ ํŒŒํ‹ฐ์…˜์ด๋ผ๋ฉด ์ž„์˜์˜ ์‚ฌ๊ฑด $B\subset S$์˜ ํ™•๋ฅ ์€ ํ•ญ์ƒ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

$$ p(B) = \sum_{i=1}^n p(A_i \cap B) = \sum_{i=1}^n p(B | A_i)p(A_i) $$

์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™์€ ์‹ค์ œ๋กœ ๊ณ ๋“ฑํ•™๊ต ๋ฌธ์ œ์—๋„ ์ž์ฃผ ์“ฐ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ๋ฌธ์ œ๋ฅผ ๋ด…์‹œ๋‹ค.

Q. ๊นƒํ—™๊ณ ๋“ฑํ•™๊ต๋Š” 1ํ•™๋…„ 20%, 2ํ•™๋…„ 40%, 3ํ•™๋…„ 40%๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. 1ํ•™๋…„ ์ค‘ ๋‚จํ•™์ƒ์˜ ๋น„์œจ์€ 40%, 2ํ•™๋…„ ์ค‘ ๋‚จํ•™์ƒ์˜ ๋น„์œจ์€ 50%, 3ํ•™๋…„ ์ค‘ ๋‚จํ•™์ƒ์˜ ๋น„์œจ์€ 60%๋ผ๋ฉด, ์ „์ฒด ๋‚จํ•™์ƒ์˜ ๋น„์œจ์€ ์–ผ๋งˆ์ธ๊ฐ€?

์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์˜ ๊ต‰์žฅํžˆ ์ „ํ˜•์ ์ธ ๋ฌธ์ œ๋กœ, ์‰ฝ๊ฒŒ ํ’€๋ฆฌ๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ํ’€์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๊ณ 1, ๊ณ 2, ๊ณ 3์€ ์ƒํ˜ธ ๋ฐฐํƒ€์ ์ด๊ณ  ๋ชจ๋‘ ํ•ฉ์น˜๋ฉด ์ „์ฒด๊ฐ€ ๋˜๋ฏ€๋กœ ํŒŒํ‹ฐ์…˜์˜ ์„ฑ์งˆ์„ ๋งŒ์กฑํ•œ๋‹ค. ์ด๋ฅผ $A_1,\,A_2,\,A_3$๋ผ ํ•˜๊ณ , ๋‚จํ•™์ƒ์ผ ์‚ฌ๊ฑด์„ $B$๋ผ๊ณ  ํ•˜์ž. ๊ทธ๋ ‡๋‹ค๋ฉด ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™์— ์˜ํ•ด ๋‚จํ•™์ƒ์˜ ๋น„์œจ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. $$ \begin{aligned} p(B) &= p(A_1 \cap B) + p(A_2\cap B) + p(A_3 \cap B) \\ &= p(B|A_1)p(A_1) + p(B|A_2)p(A_2) + p(B|A_3)p(A_3) \\ &= 0.4 \times 0.2 + 0.5 \times 0.4 + 0.6 \times 0.4 = 0.52 \end{aligned} $$

ย 

2.3. ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ

๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋Š” ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™์˜ ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„ ๋ฌธ์ œ์—์„œ ๋ดค๋‹ค์‹œํ”ผ, ๋ณดํ†ต ์šฐ๋ฆฌ๋Š” ํŒŒํ‹ฐ์…˜์„ ์ „์ œํ–ˆ์„ ๋•Œ, ๋‹ค๋ฅธ ์‚ฌ๊ฑด์˜ ํ™•๋ฅ ์ด๋‚˜ ๋น„์œจ($p(B|A_i)$)์˜ ์ •๋ณด๋ฅผ ๊ฐ–๊ณ  ๋‹ค๋ฅธ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ „์ฒด ํ™•๋ฅ ์˜ ๋ฒ•์น™์€ ํ•ด๋‹น ์‚ฌ๊ฑด์˜ ํ™•๋ฅ ($p(B)$)์„ ๊ตฌํ•˜๋ ค๋Š” ๊ฒƒ์ด์—ˆ๋‹ค๋ฉด, ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋Š” ๋ฐ˜๋Œ€๋กœ ํ•ด๋‹น ์‚ฌ๊ฑด์„ ์ „์ œํ•˜์˜€์„ ๋•Œ์˜ ํŒŒํ‹ฐ์…˜์˜ ํ™•๋ฅ ($p(A_i|B)$)์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์˜ ์ •์˜์™€ ์ „์ฒดํ™•๋ฅ ์˜ ๋ฒ•์น™์„ ์ด์šฉํ•˜๋ฉด ๊ฐ„๋‹จํžˆ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ p(A_i | B) = \frac{p(A_i \cap B)}{p(B)} = \frac{p(B|A_i)p(A_i)}{\displaystyle \sum_{j=1}^n p(B|A_j)p(A_j)} $$

์ข€ ๋” ์ˆ˜ํ•™์ ์œผ๋กœ ์ •์˜ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.


๋ฒ ์ด์ฆˆ ์ •๋ฆฌ

ํ‘œ๋ณธ ๊ณต๊ฐ„ $S$์— ๋Œ€ํ•ด, $\{A_i\}_{i=1}^n$์ด $S$์˜ ํŒŒํ‹ฐ์…˜์ด๋ผ๋ฉด ์ž„์˜์˜ ์‚ฌ๊ฑด $B \subset S$์— ๋Œ€ํ•ด ๋‹ค์Œ์˜ ๋“ฑ์‹์ด ์„ฑ๋ฆฝํ•œ๋‹ค.

$$ p(A_i|B) = \frac{p(B|A_i) p(A_i)}{\displaystyle \sum_{j=1}^n p(B|A_j)p(A_j)} $$

๋ฒ ์ด์ฆˆ ์ •๋ฆฌ์˜ ์ง„๊ฐ€๋Š” ๋ฐ์ดํ„ฐ์™€ ๊ฒฐ๋ถ€๋˜์—ˆ์„ ๋•Œ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. $\{C_i\}_{i=1}^n$๊ฐ€ ํŒŒํ‹ฐ์…˜์ด๊ณ , ๋ฐ์ดํ„ฐ๊ฐ€ $\mathcal{D}$๋กœ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ด์— ๋Œ€ํ•œ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋ฅผ ์“ฐ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ p(C_i | \mathcal{D}) \propto p(\mathcal{D} | C_i)p(C_i) $$

์ด๋•Œ, ๋ถ„๋ชจ๋ฅผ ์ƒ๋žตํ•œ ๊นŒ๋‹ญ์€ ์ขŒ๋ณ€์€ $C_i$์— ๋Œ€ํ•œ ํ™•๋ฅ ์ธ๋ฐ, ๋ถ„๋ชจ๋Š” $p(\mathcal{D})$์ด๋ฏ€๋กœ $C_i$์— ๋Œ€ํ•œ ์˜์กด์„ฑ์ด ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋‹จ์ˆœ ์ƒ์ˆ˜ ์ทจ๊ธ‰์„ ํ•˜์—ฌ ์œ„ ๋น„๋ก€์‹์„ ์ ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„ ์‹์˜ ํ•ญ๋“ค์€ ๋ณดํ†ต ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•ด์„๋ฉ๋‹ˆ๋‹ค.

  • $C_i$ : $i$๋ฒˆ์งธ ๋ชจ๋ธ(๋ฒ”์ฃผ)
  • $\mathcal{D}$ : ๋ฐ์ดํ„ฐ
  • $p(C_i)$ : ๋ชจ๋ธ์˜ ์‚ฌ์ „ ํ™•๋ฅ  (Prior probability)
  • $p(\mathcal{D} | C_i)$ : ๊ฐ€๋Šฅ๋„ (Likelihood)
  • $p(C_i | \mathcal{D})$ : ์‚ฌํ›„ ํ™•๋ฅ  (Posterior probability)

๋‚ฏ์„  ์šฉ์–ด๋“ค์ด ๋งŽ์•„ ํ—ท๊ฐˆ๋ฆด ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ณ ์–‘์ด์™€ ๊ฐœ์˜ ์‚ฌ์ง„์„ ๊ตฌ๋ถ„ํ•˜๋Š” ์ž‘์—…์„ ์˜ˆ๋กœ ๋“ค์–ด๋ด…์‹œ๋‹ค.

  • $C_1$ = ๊ฐœ, $C_2$ = ๊ณ ์–‘์ด
  • $\mathcal{D}$ : ๊ฐœ๋‚˜ ๊ณ ์–‘์ด ํ˜น์€ ๋‹ค๋ฅธ ๊ฒƒ๋“ค์ด ์„ž์ธ ์‚ฌ์ง„๋“ค
  • $p(C_i)$ : ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋“ค์˜ ๊ฐœ, ๊ณ ์–‘์ด ์‚ฌ์ง„ ๋น„์œจ์— ๋Œ€ํ•œ ์‚ฌ์ „ ์ง€์‹
  • $p(\mathcal{D} | C_i)$ : ๊ฐœ๋‚˜ ๊ณ ์–‘์ด๋ฅผ ์ „์ œํ–ˆ์„ ๋•Œ์˜ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ  ๋ถ„ํฌ (๊ฐœ๋‚˜ ๊ณ ์–‘์ด์ผ ๊ฐ€๋Šฅ์„ฑ)
  • $p(C_i | \mathcal{D})$ : ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐœ๋‚˜ ๊ณ ์–‘์ด์ผ ํ™•๋ฅ 

์œ„ ์˜ˆ๋ฅผ ๋ณด๋ฉด ์•Œ๊ฒ ์ง€๋งŒ, ์šฐ๋ฆฌ์˜ ์ตœ์ข… ๋ชฉํ‘œ๋Š” ์‚ฌํ›„ ํ™•๋ฅ ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ณ  ๊ทธ ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ๋ฒ”์ฃผ์— ์†ํ•  ์ง€ ๋ถ„๋ฅ˜ํ•˜๊ฑฐ๋‚˜ ํ˜น์€ ํ™•๋ฅ ๋ถ„ํฌ์— ํ•„์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด์ฃ . ์œ„์—์„œ๋Š” ๋ถ„๋ฅ˜๋กœ ์˜ˆ๋ฅผ ๋“ค์—ˆ์ง€๋งŒ, ์—ฌ๊ธฐ์„œ ํ•ด๋ณผ ๊ฒƒ์€ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ถ”์ •์ž…๋‹ˆ๋‹ค.


3. ์„ ํ˜•ํšŒ๊ท€

์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฐœ๋…

  • ๊ณ ๊ต ๊ต๊ณผ๊ณผ์ • - ๋ฏธ์ ๋ถ„
  • ๊ณ ๊ต ๊ต๊ณผ๊ณผ์ • - ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„

3.1 ๋…ธ์ด์ฆˆ (Noise)

์ด๋ฒˆ์—๋„ ๊ณ ๋“ฑํ•™๊ต ํ™•๋ฅ ๊ณผ ํ†ต๊ณ„ ๋ฌธ์ œ๋กœ ์˜ˆ๋ฅผ ๋“ค๋ฉด์„œ ์‹œ์ž‘ํ•ด๋ด…์‹œ๋‹ค.

Q. ์–ด๋А ๋ฐ˜์˜ ์ˆ˜ํ•™์„ฑ์ ์ด ํ‰๊ท ์ด 60, ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 20์ธ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ํ•  ๋•Œ, 1๋“ฑ๊ธ‰์ด ๋‚˜์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ตœ์†Œ ๋ช‡ ์  ์ด์ƒ์„ ๋ฐ›์•„์•ผ ํ•˜๋Š”๊ฐ€? (๋‹จ, $p(0 \leq Z \leq 1.75)=0.46$)

ํ’€์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

1๋“ฑ๊ธ‰์ด ๋‚˜์˜ค๊ธฐ ์œ„ํ•œ ์ตœ์†Œ ์ ์ˆ˜๋ฅผ $a$๋ผ ํ•˜์ž. ๊ทธ๋ ‡๋‹ค๋ฉด ๋‹ค์Œ ๋“ฑ์‹์„ ๋งŒ์กฑํ•ด์•ผ ํ•œ๋‹ค. $$ p(X \geq a) = 0.04 $$ ์ด๋ฅผ ํ‘œ์ค€ํ™”ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. $$ p(X \geq a) = p(Z \geq \frac{a - 60}{20}) = 0.04 = p(Z \geq 1.75) $$ ๋”ฐ๋ผ์„œ $a=95$์ด๋‹ค.

๋ฌธ์ œ๋Š” ์‰ฌ์› ์ง€๋งŒ, ์ด๊ฒƒ์ด ์‹ค์ œ๋กœ ๊ฐ€๋Šฅํ•œ ๋ฌธ์ œ์ผ๊นŒ์š”? ๋งŒ์ผ, ๋ณธ์ธ ๋ฐ˜์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด ๋ณธ์ธ์˜ ๋“ฑ๊ธ‰์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”? ๊ฒฐ๋ก ๋ถ€ํ„ฐ ๋งํ•˜์ž๋ฉด, ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด์œ ๋Š” ์ˆ˜ํ•™ ์„ฑ์ ์ด ์•„๋ฌด๋ฆฌ ์ •๊ทœ๋ถ„ํฌ์™€ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜์˜ค๋”๋ผ๋„ ์ •ํ™•ํžˆ ์ •๊ทœ๋ถ„ํฌ์ผ ํ™•๋ฅ ์€ ์•„์ฃผ ์ž‘๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์ •ํ™•ํžˆ ์ •๊ทœ๋ถ„ํฌ์ธ ์„ฑ์ ์€ ์ž˜ ๋‚˜์˜ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ •ํ™•ํžˆ ์ •๊ทœ๋ถ„ํฌ์ธ ์„ฑ์ ์€ ์ž˜ ๋‚˜์˜ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์‹ค์ œ๋กœ ํ‘œ๋ณธ์ด ์—„์ฒญ๋‚˜๊ฒŒ ๋งŽ์€ ์ˆ˜๋Šฅ ์„ฑ์ ์กฐ์ฐจ ์ •ํ™•ํžˆ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์œ„ ๋ฌธ์ œ์ฒ˜๋Ÿผ ๋“ฑ๊ธ‰์ปท์„ ์ถ”๋ก ํ•ด๋ณด์•„๋„ ์‹ค์ œ ๋“ฑ๊ธ‰์ปท๊ณผ๋Š” ๊ดด๋ฆฌ๋ฅผ ๋ณด์ด์ฃ . 2019๋…„ 11์›” 14์ผ์— ์น˜๋ค„์ง„ 2020๋…„ ์ˆ˜๋Šฅ ๊ตญ์–ด์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•˜์Šต๋‹ˆ๋‹ค.${}^{[2]}$

์‹œํ—˜ ๊ณผ๋ชฉ ํ‰๊ท  ํ‘œ์ค€ํŽธ์ฐจ 1๋“ฑ๊ธ‰ ์ปท
2020 ์ˆ˜๋Šฅ ๊ตญ์–ด 59.87 20.22 91

์ด๋ฅผ ์ •๊ทœ๋ถ„ํฌ๋กœ ๋ฐ”๊พธ์–ด 91์  ์ด์ƒ์ธ ํ•™์ƒ๋“ค์˜ ๋น„์œจ์„ ๊ณ„์‚ฐํ•˜๋ฉด $0.062$, ์ฆ‰, 6.2%๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค. 95์  ์ด์ƒ์ธ ํ•™์ƒ๋“ค์˜ ๋น„์œจ์„ ๊ณ„์‚ฐํ•ด์•ผ ๋น„๋กœ์†Œ $0.0412$ ์ •๋„๋กœ ๋‚˜์˜ค๋ฏ€๋กœ ๋งŒ์ผ, 2020 ์ˆ˜๋Šฅ ๊ตญ์–ด๊ฐ€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ž๋‹ค๋ฉด 95์ ์ด 1๋“ฑ๊ธ‰ ์ปท ์ ์ˆ˜์˜€์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ˆ˜๋Šฅ๊นŒ์ง€๋„ ๊ฐˆ ํ•„์š”๊ฐ€ ์—†๊ณ  ์ฃผ์‚ฌ์œ„๋ฅผ ๋˜์ ธ์„œ 1์˜ ๋ˆˆ์˜ ์ˆ˜๋ฅผ ํ™•์ธํ•œ๋‹ค๊ณ  ํ•ด๋ด๋„ ์ •ํ™•ํžˆ ์ดํ•ญ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ†ต๊ณ„ํ•™์—์„œ๋Š” ์‹ค์ œ ๊ฐ’๊ณผ ์ด๋ก  ๊ฐ’์€ ๋”ฑํžˆ ๋‹ค๋ฅธ ์š”์ธ์ด ์—†์–ด๋„ ์ฐจ์ด๋ฅผ ๋ณด์ด๊ฒŒ ๋˜๋Š”๋ฐ, ์ด๋•Œ์˜ ์ฐจ์ด๋ฅผ ๋…ธ์ด์ฆˆ(Noise) ๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

๋…ธ์ด์ฆˆ๋Š” ๊ด€์ธก ๊ธฐ๊ธฐ๋‚˜ ์‹คํ—˜์—์„œ ์ผ์–ด๋‚˜๋Š” ์˜ค๋ฅ˜์™€๋Š” ๋‹ค๋ฅด๊ฒŒ, ์ž์—ฐ์— ํ•ญ์ƒ ์กด์žฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋” ์ •๋ฐ€ํ•˜๊ฒŒ ์ธก์ •ํ•˜๊ฑฐ๋‚˜ ์‹คํ—˜ ๊ธฐ๋ฒ•์„ ๋ฐ”๊พผ๋‹ค๊ณ  ํ•ด์„œ ์ค„์–ด๋“ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํœด๋Œ€ํฐ ์นด๋ฉ”๋ผ ๋Œ€์‹  ๋” ํ™”์งˆ ์ข‹์€ DSLR์„ ๋“ค๊ณ  ์˜จ๋‹ค ํ•ด๋„ ๊ด€์ธกํ•œ 1์˜ ๋ˆˆ์ด ๋‚˜์˜จ ํšŸ์ˆ˜๋Š” ๋ฐ”๋€Œ์ง€ ์•Š์ฃ .

์ž์—ฐ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋…ธ์ด์ฆˆ์˜ ํ™•๋ฅ ๋ถ„ํฌ๋Š” ๋งŽ์€ ๊ฒฝ์šฐ์— ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ๋ถ„ํฌ๋“ค์ด ํ‘œ๋ณธ์ด ์ปค์ง€๊ฒŒ ๋˜๋ฉด ์ •๊ทœ๋ถ„ํฌ๋กœ ๊ทผ์‚ฌ๋˜๋Š” ๊ฒƒ๊ณผ ์ •๊ทœ๋ถ„ํฌ ์ž์ฒด๊ฐ€ ์‹คํ—˜์˜ค์ฐจ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์—์„œ ์œ ๋ž˜ํ–ˆ๋‹ค๋Š” ๊ฒƒ์„ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์–ด๋А ์ •๋„ ๋‚ฉ๋“์ด ๋  ๊ฒ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๊ฒŒ์‹œ๋ฌผ์—์„œ ๋‹ค๋ฃฐ ๋…ธ์ด์ฆˆ๋“ค์€ ๋ชจ๋‘ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค๊ณ  ๊ฐ€์ •ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

โ€‚

3.2. ์„ ํ˜• ๋ชจ๋ธ

์šฐ๋ฆฌ๋Š” ์„ ํ˜•ํšŒ๊ท€๋ฅผ ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๊ธฐ์— $(x,y)$ ์ˆœ์„œ์Œ์œผ๋กœ ํ‘œ๊ธฐ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ด€๊ณ„์‹์„ ๊ธฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

$$ y = ax + b $$

์˜ˆ๋ฅผ ๋“ค์–ด, ๋ฐ์ดํ„ฐ๊ฐ€ $(1,3), (2,5), (3,7)$๋กœ ์ฃผ์–ด์กŒ์œผ๋ฉด, ๊ด€๊ณ„์‹์€ $y=2x+1$์ด ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์•ž์„œ ๋งํ–ˆ๋“ฏ์ด ๋ชจ๋“  ๋ฐ์ดํ„ฐ์—๋Š” ๋…ธ์ด์ฆˆ๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋ณดํ†ต ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์ง„๋‹ค๊ณ  ๋ณด๋ฉด ๋ฉ๋‹ˆ๋‹ค.

$y=2x+1$์— ๋…ธ์ด์ฆˆ๋ฅผ ๋”ํ•œ ๋ฐ์ดํ„ฐ

$y=2x+1$์— ๋…ธ์ด์ฆˆ๋ฅผ ๋”ํ•œ ๋ฐ์ดํ„ฐ

์ฆ‰, ๋‹ค์‹œ ๊ด€๊ณ„์‹์„ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ y = ax + b + \epsilon $$

์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ๊ฒƒ์€ $\epsilon$์€ ํ™•๋ฅ ๋ณ€์ˆ˜๋ผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•ž์„œ ์–ธ๊ธ‰ํ•œ๋Œ€๋กœ $\epsilon$์˜ ํ™•๋ฅ ๋ถ„ํฌ๋Š” ์ •๊ทœ๋ถ„ํฌ๋กœ ๊ฐ€์ •ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ์˜ ํ‰๊ท ์€ ๋‹น์—ฐํ•˜๊ฒŒ๋„ $0$์ผ ๊ฒƒ์ด๋ฏ€๋กœ ์ด๋ฅผ ์„œ์ˆ ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ p(\epsilon) = \mathcal{N}(\epsilon | 0, \sigma^2) $$

$a,b$๋Š” ์•„์ง ๊ฒฐ์ •๋˜์ง€๋Š” ์•Š์•˜์ง€๋งŒ, ์ƒ์ˆ˜์ผ ๊ฒƒ์ด๊ณ  $x$๋Š” ๋‹จ์ˆœํžˆ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ๊ฐ„์ฃผํ•  ๊ฒƒ์ด๋ฏ€๋กœ ํ™•๋ฅ ๋ณ€์ˆ˜๋กœ ์ทจ๊ธ‰ํ•˜์ง€ ์•Š์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ $y$๋Š” ๋‹ค์Œ์˜ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

$$ p(y) = \mathcal{N}(y| ax+b, \sigma^2) $$

โ€‚

3.3 ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„ ์ถ”์ • (MLE)

์•ž์„œ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ ๋‹จ์›์—์„œ ์–ธ๊ธ‰ํ–ˆ๋‹ค์‹œํ”ผ, ์šฐ๋ฆฌ์˜ ๋ชฉ์ ์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์‚ฌํ›„ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ ์ง‘๋‹ˆ๋‹ค.

  1. ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„ ์ถ”์ • (Maximum Likelihood Estimation)
  2. ๋ฒ ์ด์ฆˆ ์ถ”๋ก  (Bayesian Inference)

์—ฌ๊ธฐ์„œ๋Š” 1๋ฒˆ์˜ ๋ฐฉ๋ฒ•์„ ๋”ฐ๋ผ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์•ž์„œ ๋ณธ ๊ฒƒ์ฒ˜๋Ÿผ, ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋ฉด ์‚ฌํ›„ํ™•๋ฅ ๋ถ„ํฌ์˜ ๋น„๋ก€์‹์„ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ p(C_i | \mathcal{D}) \propto p(\mathcal{D} | C_i)p(C_i) $$

์ด๋•Œ, $p(C_i)$๋Š” ๋ฐ์ดํ„ฐ์™€ ์ƒ๊ด€์—†๋Š” ์‚ฌ์ „ํ™•๋ฅ ๋ถ„ํฌ์ด๋ฏ€๋กœ $p(\mathcal{D}|C_i)$๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋ฉด $p(C_i|\mathcal{D})$ ์—ญ์‹œ ์ตœ๋Œ€๊ฐ€ ๋˜์ง€ ์•Š๊ฒ ๋ƒ๋Š” ๊ฒƒ์ด ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„ ์ถ”์ •์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๊ฒฝ์šฐ์—” ์‚ฌํ›„ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜๋Š” ๊ตฌํ•  ์ˆ˜ ์—†๊ณ  ๋‹จ์ˆœํžˆ ์‚ฌํ›„ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์ตœ๋Œ€๊ฐ€ ๋˜๋Š” ์ง€์ ๋งŒ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•ด์„œ๋Š” ์žฅ๋‹จ์ ์ด ์žˆ๋Š”๋ฐ, ๊ฐ„๋‹จํ•œ ์„ ํ˜•ํšŒ๊ท€์—์„œ๋Š” ์ด๊ฒƒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.

์•ž์„œ ์„ธ์šด ์„ ํ˜•๋ชจ๋ธ์„ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋กœ ์ ์–ด๋ณด๋ฉด, ๊ฐ€๋Šฅ๋„(likelihood)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ p(\mathcal{D}|a,b) = p(\mathbf{y}|\mathbf{x},a,b)p(\mathbf{x}) = \left\{ \prod_{i=1}^n\mathcal{N}(y_i|ax_i+b,\sigma^2)\right\} \times p(\mathbf{x}) $$

์œ„ ์‹์—์„œ $\mathbf{x},~ \mathbf{y}$๋Š” ๊ฐ๊ฐ $(x_1,\cdots,x_n),~(y_1,\cdots,y_n)$์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด์ œ ์ด๊ฒƒ์„ ์ตœ๋Œ€๋กœ ๋งŒ๋“œ๋Š” $a,~b$๋ฅผ ์ฐพ๊ธฐ๋งŒ ํ•˜๋ฉด ๋˜๋Š”๋ฐ, ์ด๋Š” ๊ณ ๋“ฑํ•™๊ต ๋ฏธ์ ๋ถ„ ๋ฌธ์ œ์ฒ˜๋Ÿผ ์ ‘๊ทผํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ทน๋Œ€, ๊ทน์†Œ๋ฅผ ๋จผ์ € ์ฐพ๊ณ , ๊ทธ๊ฒƒ์ด ์ตœ๋Œ€์ธ์ง€ ์ตœ์†Œ์ธ์ง€ ๊ตฌ๋ถ„ํ•˜๋ฉด ๋˜๋Š” ๊ฒƒ์ด์ฃ . ๋‹ค๋งŒ, ์œ„ ์‹์ฒ˜๋Ÿผ $n$๊ฐœ์˜ ๊ณฑ์œผ๋กœ ๋˜์–ด์žˆ๋Š” ๊ฒฝ์šฐ์—๋Š” ๋ฏธ๋ถ„ํ•˜๊ธฐ๊ฐ€ ํž˜๋“œ๋ฏ€๋กœ ๋จผ์ € ๋กœ๊ทธ๋ฅผ ์ทจํ•œ ํ›„ ๋ฏธ๋ถ„ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$$ \begin{aligned} \ln p(\mathcal{D}|a,b) &= \sum_{i=1}^n \ln \left\{\mathcal{N}(y_i|ax_i + b, \sigma^2)\right\} + \ln p(\mathbf{x}) \\ &= \sum_{i=1}^n \left\{- \ln(\sqrt{2\pi\sigma^2}) - \frac{(y_i - (ax_i+b))^2}{2\sigma^2} \right\} + \ln p(\mathbf{x}) \end{aligned} $$

๊ณฑ์ด ํ•ฉ์œผ๋กœ ๋ฐ”๋€Œ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ด๋ฅผ $a,b$๋กœ ๊ฐ๊ฐ ๋ฏธ๋ถ„ํ•˜์—ฌ 0์ด ๋˜๋Š” ๊ฐ’์„ ๊ตฌํ•ด๋ณผ ๊ฒ๋‹ˆ๋‹ค. ์ด๋•Œ, ๋‘ ๋ฒˆ์งธ์ค„์˜ ์ฒซ ํ•ญ๊ณผ ๋งˆ์ง€๋ง‰ ํ•ญ์€ $a,b$์™€ ์ƒ๊ด€์—†์œผ๋‹ˆ ๋ฌด์‹œํ•˜๊ณ  ๊ณ„์‚ฐํ•ฉ์‹œ๋‹ค.

1) $b$๋กœ ๋ฏธ๋ถ„

$$ \begin{aligned} &\frac{\partial}{\partial b} \ln p(\mathcal{D}|a,b) = -\frac{1}{\sigma^2}\sum_{i=1}^n (y_i - ax_i - b) = 0 \\ \Rightarrow~&\therefore b = \overline{y} - a\overline{x} \qquad (\overline{x} \equiv \frac{1}{n}\sum_{i=1}^n x_i,~\overline{y} = \frac{1}{n}\sum_{i=1}^n y_i) \end{aligned} $$

2) $a$๋กœ ๋ฏธ๋ถ„

$$ \begin{aligned} &\frac{\partial}{\partial a} \ln p(\mathcal{D}|a,b) = -\frac{1}{\sigma^2}\sum_{i=1}^n (ax_i + b - y_i) x_i = 0 \\ \Rightarrow~& a \sum_{i=1}^n x_i^2 + b \sum_{i=1}^n x_i - \sum_{i=1}^n x_iy_i = 0 \\ \Rightarrow~& a\overline{x^2} - (a\overline{x} - \overline{y}) \overline{x} - \overline{xy} = 0 \\ \\ \Rightarrow~&\therefore a = \frac{\overline{xy} - \overline{x}\overline{y}}{\overline{x^2} - \overline{x}^2} \end{aligned} $$

์œ„ ๊ฒฐ๊ณผ๋ฅผ ์š”์•ฝํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.


์„ ํ˜•๋ชจ๋ธ์˜ ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„ ์ถ”์ •

๋ฐ์ดํ„ฐ $\mathcal{D} = \left\{(x_1,y_1),\,\cdots,\, (x_n,y_n)\right\}$์œผ๋กœ ์ฃผ์–ด์กŒ์„๋•Œ, ์ด๋ฅผ ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„ ์ถ”์ •์„ ํ†ตํ•ด ์„ ํ˜•๋ชจ๋ธ $y=ax+b+\epsilon$๋กœ ๊ทผ์‚ฌํ•œ๋‹ค๋ฉด ์ด์— ๋Œ€ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜ $a,b$๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

$$ a = \frac{\overline{xy} - \overline{x}\overline{y}}{\overline{x^2} - \overline{x}^2},\quad b = \overline{y} - a\overline{x} $$

์ด์ œ ์ด๋ฅผ ์ฝ”๋“œ๋กœ ๋‚˜ํƒ€๋‚ด๋ด…์‹œ๋‹ค.

โ€‚

3.4. ์ฝ”๋“œ ๊ตฌํ˜„

์ฝ”๋“œ๋Š” ํŽธ์˜๋ฅผ ์œ„ํ•ด Julia๋ฅผ ์ด์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฒƒ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Pre-requisites

  • Julia
    • NCDataFrame
    • Statistics
    • DataFrames
  • Python
    • NetCDF4
    • matplotlib
  • libnetcdf

ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋Š” ์œ„์—์„œ ์„ ํ˜• ๋ชจ๋ธ์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์ถ”์ถœํ•˜์˜€๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ถ”์ถœ ์ฝ”๋“œ๋Š” ๋ถ€๋ก์— ์ˆ˜๋กํ•ด๋†“์•˜์Šต๋‹ˆ๋‹ค.

# Julia
# https://git.io/Jm2gf
using NCDataFrame, Statistics, DataFrames

# ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
df = readnc("linear.nc")

# ํ‘œ๋ณธํ‰๊ท  ๊ตฌํ•˜๊ธฐ
x_bar = mean(df[!,:x])
y_bar = mean(df[!,:y])
xยฒ_bar = mean(df[!,:x] .^ 2)
xy_bar = mean(df[!,:x] .* df[!,:y])

# ์ตœ๋Œ€๊ฐ€๋Šฅ๋„์ถ”์ •
a = (xy_bar - x_bar * y_bar) / (xยฒ_bar - x_bar^2)
b = y_bar - a * x_bar

# a,b ์ถœ๋ ฅ
@show a
@show b

# ๊ทธ๋ฆผ ๊ทธ๋ฆด ์ค€๋น„
x_plot = -1.0:0.01:1.0
y_plot = a .* x_plot .+ b

# ๋ฐ์ดํ„ฐ ์“ฐ๊ธฐ
dg = DataFrame(x=x_plot, y=y_plot)
writenc(dg, "linear_plot.nc")

์ด๋ ‡๊ฒŒ ๋‚˜์˜จ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ–๊ณ  ๊ทธ๋ฆผ์„ ๊ทธ๋ฆฌ๋Š” ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# Python
# https://git.io/Jm2gs
from netCDF4 import Dataset
import matplotlib.pyplot as plt

# Import netCDF file
ncfile = './linear.nc'
data = Dataset(ncfile)
var = data.variables

# Prepare Data to Plot
x = var['x'][:]
y = var['y'][:]

# Import netCDF file
ncfile = './linear_plot.nc'
data = Dataset(ncfile)
var = data.variables

# Prepare Data to Plot
x_reg = var['x'][:]
y_reg = var['y'][:]
a = var['a'][:][0]
b = var['b'][:][0]

# Use latex
plt.rc('text', usetex=True)
plt.rc('font', family='serif')

# Prepare Plot
plt.figure(figsize=(10,6), dpi=300)
plt.title(r"Linear Regression", fontsize=16)
plt.xlabel(r'$x$', fontsize=14)
plt.ylabel(r'$y$', fontsize=14)


# Plot with Legends
plt.scatter(x, y, label=r'$y=2x+1+\epsilon$', alpha=0.7)
plt.plot(x_reg, y_reg, label=r'$y={:.2f}x+{:.2f}$'.format(a, b))

# Other options
plt.legend(fontsize=12)
plt.grid()
plt.savefig("linear_reg.png", dpi=300)

์ด๋ ‡๊ฒŒ ๋‚˜์˜จ ๊ทธ๋ฆผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋“œ๋””์–ด ์„ ํ˜• ํšŒ๊ท€!

๋“œ๋””์–ด ์„ ํ˜• ํšŒ๊ท€!

4. ๋งˆ์น˜๋ฉฐ

์ตœ๋Œ€ํ•œ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์ ์œผ๋ ค ํ–ˆ๋Š”๋ฐ, ๋‚ด์šฉ์ด ๋‚ด์šฉ์ด๋‹ค๋ณด๋‹ˆ ๋ง์ด ๋งŽ์ด ๊ธธ์–ด์กŒ๋„ค์š”. ๊ณ ๋“ฑํ•™๊ต ๊ณผ์ •์— ๊ตญํ•œํ•ด์„œ ์ ๋‹ค๋ณด๋‹ˆ ๋น ์ง„ ๋‚ด์šฉ๋“ค๋„ ๊ฝค ๋งŽ์€๋ฐ, ํ˜น์‹œ๋‚˜ ์ข€ ๋” ๊ณต๋ถ€ํ•˜๊ณ  ์‹ถ์€ ๋ถ„๋“ค์€ Bishop์˜ PRML์„ ๋ณด์‹œ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.


๋ถ€๋ก

1. ์ˆ˜ํ•™ ์„ฑ์  ํžˆ์Šคํ† ๊ทธ๋žจ ์ฝ”๋“œ

// 1. Rust๋กœ Data ์ƒ์„ฑํ•˜๊ธฐ
// https://git.io/JqXQb
extern crate peroxide;
use peroxide::fuga::*;

fn main() {
    let n = Normal(60, 20);     // ์ •๊ทœ๋ถ„ํฌ ์ƒ์„ฑ
    let y = n.sample(40)        // 40๊ฐœ์˜ ์ƒ˜ํ”Œ ์ƒ์„ฑ
      .iter()
      .map(|t| t.round())       // ๋ฐ˜์˜ฌ๋ฆผ (์ ์ˆ˜๋Š” ์ •์ˆ˜)
      .filter(|x| *x <= 100f64) // 100์  ์ดํ•˜๋งŒ ์ฑ„ํƒ
      .collect();
    
    let mut df = DataFrame::new(vec![]);  // ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ
    df.push("y", Series::new(y));         // y ์ž…๋ ฅ

    df.print();

    df.write_nc("data.nc")      // netcdf ํŒŒ์ผํฌ๋งท์œผ๋กœ ์ €์žฅ
      .expect("Can't write nc");
}
# 2. Python์œผ๋กœ ํžˆ์Šคํ† ๊ทธ๋žจ ๊ทธ๋ฆฌ๊ธฐ
# https://git.io/JqX7Z
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import seaborn as sns

# Import netCDF file
ncfile = './data.nc'
data = Dataset(ncfile)
var = data.variables

# Use latex
plt.rc('text', usetex=True)
plt.rc('font', family='serif')

# Prepare Histogram
plt.figure(figsize=(10,6), dpi=300)
plt.title(r"Math Score", fontsize=16)
plt.xlabel(r'Score', fontsize=14)
plt.ylabel(r'Density', fontsize=14)

# Prepare Data to Plot
y = var['y'][:]  

# Draw Histogram
sns.distplot(y, label=r"Score", bins=10)

# Other options
plt.legend(fontsize=12)
plt.grid()

# Save
plt.savefig("hist.png", dpi=300)

โ€ƒ

2. 2020 ์ˆ˜๋Šฅ ๊ตญ์–ด ๋“ฑ๊ธ‰์ปท ๊ณ„์‚ฐ ์ฝ”๋“œ

// Rust
// https://git.io/JqXHi
extern crate peroxide;
use peroxide::fuga::*;

fn main() {
    let n = Normal(59.87, 20.22); // ์ •๊ทœ๋ถ„ํฌ ์ƒ์„ฑ
    (1f64 - n.cdf(91)).print();   // p(X >= 91) ๊ณ„์‚ฐ
    (1f64 - n.cdf(95)).print();   // p(X >= 95) ๊ณ„์‚ฐ
}

โ€ƒ

3. ์„ ํ˜• ๋ชจ๋ธ ๋ฐ์ดํ„ฐ ์ฝ”๋“œ

# Julia
using NCDataFrame, DataFrames;

function f(x::S) where {T <: Number, S <: AbstractVector{T}}
	2x .+ 1
end

x = -1.0:0.01:1.0;
ฯต = randn(length(x));
y = f(x) + ฯต;

df = DataFrame(x=x, y=y);
writenc(df, "linear.nc")

์ถœ์ฒ˜

[1] : ์„œ์šธ์‹ ๋ฌธ - ์˜ฌํ•ด์˜ ๊ณผํ•™ ์„ฑ๊ณผ 1์œ„๋Š” ‘์ค‘๋ ฅํŒŒ’ ํƒ์ง€

[2] : ๋ฉ”๊ฐ€์Šคํ„ฐ๋”” - ์—ญ๋Œ€ ๋“ฑ๊ธ‰์ปท ๊ณต๊ฐœ

  • C. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag, 2006
]]>
๐Ÿ–Š๏ธ Rust์™€ ๋ฏธ๋ถ„ํ•˜๊ธฐ 02: ๊ธฐํ˜ธ ๋ฏธ๋ถ„ https://axect.github.io/kr/posts/002_ad_2/ Sat, 03 Oct 2020 03:36:49 +0900 https://axect.github.io/kr/posts/002_ad_2/ <blockquote> <p><strong>๐Ÿ”– Automatic Differentiation Series</strong></p> <ol> <li><a href="../002_ad_1">๐Ÿ’ป Numerical Differentiation</a></li> <li><a href="../002_ad_2">๐Ÿ–Š๏ธ Symbolic Differentiation</a></li> <li><a href="../007_ad_3">๐Ÿค– Automatic Differentiation</a></li> </ol> </blockquote> <h2 id="-์ˆ˜์น˜์ -๋ฏธ๋ถ„์˜-ํ•œ๊ณ„">๐Ÿ“‰ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์˜ ํ•œ๊ณ„</h2> <p>์ €๋ฒˆ ํฌ์ŠคํŠธ์—์„œ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์„ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์„ ๋‹ค๋ค„๋ณด์•˜๋Š”๋ฐ, ์–ด๋– ์…จ๋‚˜์š”? ์•„๋งˆ, ์ฝ”๋”ฉ์— ๋Œ€ํ•œ ์กฐ๊ธˆ์˜ ์ง€์‹๋งŒ ์žˆ์œผ๋ฉด ์˜คํžˆ๋ ค ๊ณ ๋“ฑํ•™๊ต๋•Œ์˜ ๋ฏธ๋ถ„๋ณด๋‹ค ํ›จ์”ฌ ์‰ฝ๊ฒŒ ๋А๊ปด์ง€์…จ์„ ๊ฒ๋‹ˆ๋‹ค. ์ €ํฌ๊ฐ€ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด๋ผ๊ณ ๋Š” ๊ทธ์ € ๋„ํ•จ์ˆ˜์˜ ์ •์˜์— ๋”ฐ๋ผ ํ•จ์ˆ˜์— ๊ฐ ๊ตฌ๊ฐ„ ๊ฐ’์„ ๋Œ€์ž…ํ•œ ๊ฒƒ์ด ์ „๋ถ€์˜€๋Š”๋ฐ, ์ด๋ฅผ ์ฝ”๋“œ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๊ฒฐ๊ตญ ๋‹ค์Œ์˜ ์ฝ”๋“œ์— ์ง€๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Python</span> </span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">differentiation</span>(f, x, h<span style="color:#f92672">=</span><span style="color:#ae81ff">1e-06</span>): </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> (f(x <span style="color:#f92672">+</span> h) <span style="color:#f92672">-</span> f(x)) <span style="color:#f92672">/</span> h </span></span></code></pre></div><p>๋‚˜๋จธ์ง€๋Š” ์ด๋ฅผ ๊ฐ์ฒด์ง€ํ–ฅ์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜๊ฑฐ๋‚˜, ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ์œผ๋กœ ๊ตฌํ˜„ํ•˜๊ฑฐ๋‚˜ ์ œ๋„ˆ๋ฆญ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์„ ๋„์ž…ํ•˜๋Š” ๋“ฑ์˜ ๊ตฌํ˜„๋ฐฉ๋ฒ•์˜ ์ฐจ์ด์ผ ๋ฟ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ˆ˜์น˜์  ๋ฏธ๋ถ„ ๋ฐฉ๋ฒ•์€ ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•œ ๊ตฌํ˜„๊ณผ ์—„์ฒญ ๋น ๋ฅธ ๊ณ„์‚ฐ์†๋„๋ฅผ ๊ฐ€์ ธ์„œ ๋ˆ„๊ตฌ๋‚˜ ์‰ฝ๊ฒŒ ๋ฏธ๋ถ„์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค๋งŒ, ์˜ค์ฐจ๊ฐ€ ํ•„์—ฐ์ ์œผ๋กœ ๋ฐœ์ƒํ•˜๊ฒŒ ๋˜๋Š” ๋‹จ์ ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์˜ค์ฐจ์— ํฌ๊ฒŒ ๋ฏผ๊ฐํ•˜์ง€ ์•Š์€ ๋ฌธ์ œ๋‚˜, Step ์ˆ˜๊ฐ€ ์ ์–ด์„œ ์˜ค์ฐจ๊ฐ€ ํฌ๊ฒŒ ์Œ“์ด์ง€ ์•Š๋Š” ๋ฏธ๋ถ„๋ฐฉ์ •์‹์„ ํ‘ธ๋Š” ๊ฒฝ์šฐ์—” ์ถฉ๋ถ„ํ•˜์ง€๋งŒ, ์˜ค์ฐจ์— ๋ฏผ๊ฐํ•˜๊ฑฐ๋‚˜ Step ์ˆ˜๊ฐ€ ๋งŽ์•„์„œ ์˜ค์ฐจ๊ฐ€ ์Œ“์—ฌ ์œ ์˜๋ฏธํ•œ ์ฐจ์ด๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฏธ๋ถ„๋ฐฉ์ •์‹์˜ ๊ฒฝ์šฐ์—” ํฐ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋กœ &ldquo;๋กœ๋ Œ์ฆˆ์˜ ๋‚˜๋น„&quot;๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.</p>

๐Ÿ”– Automatic Differentiation Series

  1. ๐Ÿ’ป Numerical Differentiation
  2. ๐Ÿ–Š๏ธ Symbolic Differentiation
  3. ๐Ÿค– Automatic Differentiation

๐Ÿ“‰ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์˜ ํ•œ๊ณ„

์ €๋ฒˆ ํฌ์ŠคํŠธ์—์„œ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์„ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์„ ๋‹ค๋ค„๋ณด์•˜๋Š”๋ฐ, ์–ด๋– ์…จ๋‚˜์š”? ์•„๋งˆ, ์ฝ”๋”ฉ์— ๋Œ€ํ•œ ์กฐ๊ธˆ์˜ ์ง€์‹๋งŒ ์žˆ์œผ๋ฉด ์˜คํžˆ๋ ค ๊ณ ๋“ฑํ•™๊ต๋•Œ์˜ ๋ฏธ๋ถ„๋ณด๋‹ค ํ›จ์”ฌ ์‰ฝ๊ฒŒ ๋А๊ปด์ง€์…จ์„ ๊ฒ๋‹ˆ๋‹ค. ์ €ํฌ๊ฐ€ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด๋ผ๊ณ ๋Š” ๊ทธ์ € ๋„ํ•จ์ˆ˜์˜ ์ •์˜์— ๋”ฐ๋ผ ํ•จ์ˆ˜์— ๊ฐ ๊ตฌ๊ฐ„ ๊ฐ’์„ ๋Œ€์ž…ํ•œ ๊ฒƒ์ด ์ „๋ถ€์˜€๋Š”๋ฐ, ์ด๋ฅผ ์ฝ”๋“œ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๊ฒฐ๊ตญ ๋‹ค์Œ์˜ ์ฝ”๋“œ์— ์ง€๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

# Python
def differentiation(f, x, h=1e-06):
  return (f(x + h) - f(x)) / h

๋‚˜๋จธ์ง€๋Š” ์ด๋ฅผ ๊ฐ์ฒด์ง€ํ–ฅ์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜๊ฑฐ๋‚˜, ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ์œผ๋กœ ๊ตฌํ˜„ํ•˜๊ฑฐ๋‚˜ ์ œ๋„ˆ๋ฆญ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์„ ๋„์ž…ํ•˜๋Š” ๋“ฑ์˜ ๊ตฌํ˜„๋ฐฉ๋ฒ•์˜ ์ฐจ์ด์ผ ๋ฟ์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ˆ˜์น˜์  ๋ฏธ๋ถ„ ๋ฐฉ๋ฒ•์€ ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•œ ๊ตฌํ˜„๊ณผ ์—„์ฒญ ๋น ๋ฅธ ๊ณ„์‚ฐ์†๋„๋ฅผ ๊ฐ€์ ธ์„œ ๋ˆ„๊ตฌ๋‚˜ ์‰ฝ๊ฒŒ ๋ฏธ๋ถ„์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค๋งŒ, ์˜ค์ฐจ๊ฐ€ ํ•„์—ฐ์ ์œผ๋กœ ๋ฐœ์ƒํ•˜๊ฒŒ ๋˜๋Š” ๋‹จ์ ์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์˜ค์ฐจ์— ํฌ๊ฒŒ ๋ฏผ๊ฐํ•˜์ง€ ์•Š์€ ๋ฌธ์ œ๋‚˜, Step ์ˆ˜๊ฐ€ ์ ์–ด์„œ ์˜ค์ฐจ๊ฐ€ ํฌ๊ฒŒ ์Œ“์ด์ง€ ์•Š๋Š” ๋ฏธ๋ถ„๋ฐฉ์ •์‹์„ ํ‘ธ๋Š” ๊ฒฝ์šฐ์—” ์ถฉ๋ถ„ํ•˜์ง€๋งŒ, ์˜ค์ฐจ์— ๋ฏผ๊ฐํ•˜๊ฑฐ๋‚˜ Step ์ˆ˜๊ฐ€ ๋งŽ์•„์„œ ์˜ค์ฐจ๊ฐ€ ์Œ“์—ฌ ์œ ์˜๋ฏธํ•œ ์ฐจ์ด๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฏธ๋ถ„๋ฐฉ์ •์‹์˜ ๊ฒฝ์šฐ์—” ํฐ ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋กœ “๋กœ๋ Œ์ฆˆ์˜ ๋‚˜๋น„"๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿฆ‹ ๋กœ๋ Œ์ฆˆ์˜ ๋‚˜๋น„

Lorenz Butterfly

์—๋“œ์›Œ๋“œ ๋กœ๋ Œ์ฆˆ๋Š” ๊ฑธ์ถœํ•œ ์ˆ˜ํ•™์ž๋กœ, ํŠนํžˆ ์นด์˜ค์Šค ์ด๋ก ์˜ ์„ ๊ตฌ์ž๋กœ ์œ ๋ช…ํ•˜์‹  ๋ถ„์ž…๋‹ˆ๋‹ค. ๊ทธ๋Š” 1963๋…„์— ๋Œ€๊ธฐ ๋Œ€๋ฅ˜์˜ ๊ฐ„๋‹จํ•œ ์ˆ˜ํ•™์  ๋ชจํ˜•์„ ๋งŒ๋“ค์—ˆ๋Š”๋ฐ, ์ด ๋ชจ๋ธ์€ ๋‹ค์Œ์˜ 3๊ฐœ์˜ ์ƒ๋ฏธ๋ถ„๋ฐฉ์ •์‹์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ \begin{align} \frac{dx}{dt} &= \sigma(y-x) \\ \frac{dy}{dt} &= x (\rho - z) - y \\ \frac{dz}{dt} &= xy - \beta z \end{align} $$

๋ถ„๋ช… ์•„์ฃผ ๊ฐ„๋‹จํ•œ ๋ฏธ๋ถ„๋ฐฉ์ •์‹์ธ๋ฐ, ๋†€๋ž๊ฒŒ๋„ ์•„์ฃผ ๋ณต์žกํ•œ ํ˜•ํƒœ์˜ ํ•ด๊ฐ€ ๋„์ถœ๋ฉ๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๋Œ€ํ‘œ์ ์ธ ํ•ด์˜ ํ˜•ํƒœ๊ฐ€ ์œ„์—์„œ ์ฒจ๋ถ€ํ•œ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ๊ต‰์žฅํžˆ ์˜ˆ๋ฏผํ•œ๋ฐ, ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ฐ’์„ ์กฐ๊ธˆ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ํ˜น์€ Step size๋ฅผ ์กฐ๊ธˆ๋งŒ ๋ฐ”๊ฟ”๋„ ํ•ด์˜ ํ˜•ํƒœ๋Š” ์˜ˆ์ธกํ•  ์ˆ˜ ์—†๋Š” ํ˜•ํƒœ๋กœ, ๊ทธ๊ฒƒ๋„ ๊ต‰์žฅํžˆ ํŒŒ๊ฒฉ์ ์œผ๋กœ ๋ณ€ํ˜•๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์œ„์˜ ๊ทธ๋ฆผ์€ ์˜ค์ผ๋Ÿฌ(Euler) ๋ฐฉ๋ฒ•์ด๋ผ๋Š” ์ˆ˜์น˜์  ๋ฏธ๋ถ„๋ฐฉ์ •์‹ ํ•ด๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ ํ’€์—ˆ๋Š”๋ฐ, ์œ„์™€ ๋ชจ๋“  ์กฐ๊ฑด์„ ๋™์ผํ•˜๊ฒŒ ๋†“๊ณ  ๋ฐฉ๋ฒ•๋งŒ ๋ฃฝ๊ฒŒ-์ฟ ํƒ€(Runge-Kutta 4th order) ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐ”๊พธ๋ฉด ๋‹ค์Œ์˜ ๊ทธ๋ฆผ์ด ๋‚˜์˜ต๋‹ˆ๋‹ค.

Lorenz Butterfly (RK4)

์˜ค๋กœ์ง€ ๋ฐฉ๋ฒ•๋งŒ ๋ฐ”๊พธ์—ˆ์„ ๋ฟ์ธ๋ฐ ๊ฒฐ๊ณผ๊ฐ€ ์ƒ๋‹นํžˆ ๋งŽ์ด ๋‹ค๋ฅธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌผ๋ก  ์ด ๊ฒฝ์šฐ์—๋Š” ์ „์ฒด์ ์ธ ํ˜•ํƒœ๋Š” ๋ฐ”๋€Œ์ง€ ์•Š์ง€๋งŒ, ํŠน์ • ๋งค๊ฐœ๋ณ€์ˆ˜ ์ฃผ๋ณ€์—์„œ๋Š” ์•„์˜ˆ ํ˜•ํƒœ ์ „์ฒด๊ฐ€ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ๋ณ€ํ˜•๋˜๋Š” ๊ฒฝ์šฐ๋„ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ์—๋Š” ์˜ค์ฐจ๊ฐ€ ํ•„์—ฐ์ ์œผ๋กœ ๋ฐœ์ƒํ•˜๋Š” ์ˆ˜์น˜์  ๋ฏธ๋ถ„ ๋ฐฉ๋ฒ•์ด ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ด๋Ÿฐ ๊ฒฝ์šฐ์—๋Š” ์–ด๋–ป๊ฒŒ ํ’€์–ด์•ผ ํ• ๊นŒ์š”?


๐Ÿ‡ฌ๐Ÿ‡ท ๊ธฐํ˜ธ์  ๋ฏธ๋ถ„ (Symbolic Differentiation)

์ธ๊ฐ„์€ ์ ์ ˆํ•œ ๊ต์œก๋งŒ ๋ฐ›๋Š”๋‹ค๋ฉด ๋ฏธ๋ถ„์„ ์•„๋ฌด๋Ÿฐ ์˜ค์ฐจ์—†์ด ๊ณ„์‚ฐํ•ด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (๋ฌผ๋ก , ๊ณ„์‚ฐ์‹ค์ˆ˜๋กœ ์ธํ•œ ์˜ค์ฐจ๋Š” ์ข…์ข… ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.) ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ ํ•จ์ˆ˜์˜ ๋„ํ•จ์ˆ˜๋ฅผ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค.

$$ y = x^2 $$

์ด์ „ ๊ธ€์—์„œ ๋‹ค๋ฃจ์—ˆ๋‹ค์‹œํ”ผ ์ˆ˜์น˜์  ๋ฏธ๋ถ„ ๊ตฌํ˜„์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋„ˆ๋ฌด ๋˜‘๊ฐ™์œผ๋ฉด ์‹ฌ์‹ฌํ•˜๋‹ˆ ์š”์ฆ˜ ๊ฐ๊ด‘๋ฐ›๋Š” ์ˆ˜์น˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์ธ Julia๋กœ ํ‘œํ˜„ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

# Julia
function df(x, f, h=1e-06)
  return (f(x+h) - f(x)) / h
end

# Derivative
dx2(x) = df(x, x -> x^2)

# Print
println(dx2(1)) # 2.0000009999243673

์ด๋ฒˆ์—” ๊ณ ๋“ฑํ•™์ƒ์ด ํ‘ธ๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ด…์‹œ๋‹ค. (๋Œ€ํ•œ๋ฏผ๊ตญ ๊ณ ๋“ฑํ•™๊ต 2ํ•™๋…„ ์ˆ˜ํ•™2 ๊ณผ์ •์„ ์ด์ˆ˜ํ•œ ํ•™์ƒ์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.)

$$ \begin{align} \frac{d}{dx}(x^2) &= \lim_{h \rightarrow 0} \frac{(x+h)^2 - x^2}{h} \\ &= \lim_{h\rightarrow 0} \frac{2hx + h^2}{h} \\ &= 2x \end{align} $$

์—ฌ๊ธฐ์— 1์„ ๋Œ€์ž…ํ•˜๋ฉด ์ •ํ™•ํžˆ 2๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค. ์œ„์—์„œ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์˜ ๊ฒฐ๊ณผ์™€ ๋‹ฌ๋ฆฌ ์˜ค์ฐจ๋Š” ํฌํ•จ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋ฏธ๋ถ„์„ ๋ฐฐ์šด ์‚ฌ๋žŒ์ผ ๊ฒฝ์šฐ, ์œ„ ํ’€์ด๋Š” ์ „ํ˜€ ์–ด๋ ค์šด ํ’€์ด๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ๊ทœ์น™๋งŒ ์ž˜ ์ง€ํ‚จ๋‹ค๋ฉด ๋‹ค๋ฅธ ํ•จ์ˆ˜๋“ค์„ ๋ฏธ๋ถ„ํ•  ๋•Œ์—๋„ ํฐ ์–ด๋ ค์›€์€ ์—†์„ ๊ฒ๋‹ˆ๋‹ค.

๋ฌผ๋ก  ๊ทœ์น™์ด ์กฐ๊ธˆ ๋งŽ๊ธด ํ•ฉ๋‹ˆ๋‹ค ใ…Žใ…Ž..

๋ฌผ๋ก  ๊ทœ์น™์ด ์กฐ๊ธˆ ๋งŽ๊ธด ํ•ฉ๋‹ˆ๋‹ค ใ…Žใ…Ž..

๊ทธ๋ ‡๋‹ค๋ฉด ์ปดํ“จํ„ฐ์—๊ฒŒ ๊ทœ์น™์„ ๊ฐ€๋ฅด์น˜๋ฉด ์–ด๋–จ๊นŒ์š”? ์–ด๋–ป๊ฒŒ ๊ฐ€๋ฅด์น˜๋ƒ๊ฐ€ ๊ด€๊ฑด์ด๊ฒ ์ง€๋งŒ ์ผ๋‹จ ๊ฐ€๋ฅด์น  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์˜ค์ฐจ์—†๋Š” ์™„๋ฒฝํ•œ ๋ฏธ๋ถ„์„ ์ปดํ“จํ„ฐ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ๋„ ์‚ฌ๋žŒ๋“ค์€ ์ด๋ฏธ ๊ทธ๊ฒƒ์„ ๊ตฌํ˜„ํ•˜์˜€๊ณ  ์ด๋ฅผ CAS(Computer Algebra System)๋ผ ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

๋Œ€ํ‘œ์ ์ธ CAS๋กœ๋Š” Mathematica, Matlab, Maple ๋“ฑ์˜ ์ƒ์—…์šฉ ํ”„๋กœ๊ทธ๋žจ๋“ค๊ณผ Python์œผ๋กœ ๊ตฌํ˜„๋œ Sympy, Sagemath ๋“ฑ์˜ ๋ฌด๋ฃŒ ํ”„๋กœ๊ทธ๋žจ ํ˜น์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. CAS๋Š” ์‹ค์ œ๋กœ ์ธ๊ฐ„์ด ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ฏธ๋ถ„, ์ ๋ถ„, ๋Œ€์ˆ˜ ๋ฟ ์•„๋‹ˆ๋ผ ์‹ฌ์ง€์–ด ๋ฏธ๋ถ„๊ธฐํ•˜ ๋“ฑ์˜ ๊ณ ๊ธ‰ ์ˆ˜ํ•™ ๋ฌธ์ œ๊นŒ์ง€๋„ ํ’€์–ด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฌด๋ ค ์ด๋ฆ„๋„ SageManifolds ์ž…๋‹ˆ๋‹ค.

๋ฌด๋ ค ์ด๋ฆ„๋„ SageManifolds ์ž…๋‹ˆ๋‹ค.

์•„๋ž˜๋Š” sagemath๋ฅผ ์ด์šฉํ•œ ๊ฐ„๋‹จํ•œ ๋„ํ•จ์ˆ˜ ๊ตฌํ˜„์ž…๋‹ˆ๋‹ค.

var('x')        # ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•ฉ๋‹ˆ๋‹ค.
f(x) = x^2      # ํ•จ์ˆ˜๋ฅผ ์„ ์–ธํ•ฉ๋‹ˆ๋‹ค.
df = diff(f, x) # ๋„ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
print(df(1))    # 2

์ •ํ™•ํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ฐ„๋‹จํ•˜๊ธฐ๊นŒ์ง€ ํ•˜๋‹ˆ ๋” ์ด์ƒ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์„ ๊ณ ์ง‘ํ•  ์ด์œ ๋Š” ์—†์–ด๋ณด์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋ ‡๊ฒŒ ์—„์ฒญ๋‚œ CAS์—๋„ ์น˜๋ช…์ ์ธ ๋‹จ์ ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๋กœ ์†๋„์ž…๋‹ˆ๋‹ค. ๊ธฐํ˜ธ์  ๋ฏธ๋ถ„ ์ž์ฒด๋Š” ๊ณ„์‚ฐ ์†๋„๊ฐ€ ๋น ๋ฅผ ์ˆ˜ ์žˆ์ง€๋งŒ ๊ทธ๊ฒƒ์— ์ˆ˜์น˜ ๊ฐ’๋“ค์„ ๋Œ€์ž…ํ•  ๋•Œ ํ˜„์ €ํ•˜๊ฒŒ ์†๋„ ์ €ํ•˜๊ฐ€ ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๊ฐ„๋‹จํ•œ ๋ฏธ๋ถ„ ๊ณ„์‚ฐ์— ํฌ๊ธฐ๊ฐ€ ํฐ ๋ฐฐ์—ด ๊ฐ’์„ ๋Œ€์ž…ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. Peroxide๋Š” Rust์˜ ์ˆ˜์น˜๊ณ„์‚ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ด๋ฆ„์ด๋ฉฐ ํ›„์— ๋‹ค๋ฃฐ ์ž๋™๋ฏธ๋ถ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜์˜€๊ณ , numpy๋Š” Python์˜ ์œ ๋ช…ํ•œ ์ˆ˜์น˜๊ณ„์‚ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์œผ๋กœ ๊ณ„์‚ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ Sagemath๋Š” ๊ธฐํ˜ธ์  ๋ฏธ๋ถ„์œผ๋กœ ๊ณ„์‚ฐ ํ›„ ์ˆ˜์น˜ ๊ฐ’์„ ๋Œ€์ž…ํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค.

Linear scale ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค.

Linear scale ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค.

Log scale ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค.

Log scale ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค.

๋ฌผ๋ก  ์–ด๋–ค ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ–ˆ๋Š”์ง€์— ๋”ฐ๋ผ ์‹ค์ œ ์ˆ˜์น˜ ๊ณ„์‚ฐ์—์„œ์˜ ๊ฒฐ๊ณผ๋Š” ์กฐ๊ธˆ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ Julia ์–ธ์–ด ํŒ€์—์„œ ์‹ค์‹œํ•œ Benchmark ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

Log scale์ž„์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. (์ถœ์ฒ˜: https://julialang.org/benchmarks/)

Log scale์ž„์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. (์ถœ์ฒ˜: https://julialang.org/benchmarks/)

๊ทธ๋ฆผ์„ ๋ณด๋ฉด Matlab์˜ ์˜คํ”ˆ์†Œ์Šค ๊ฒฉ์ธ Octave๋Š” ์˜ˆ์™ธ๋กœ ์น˜๋”๋ผ๋„ Mathematica๊ฐ€ ์ƒ๊ฐ๋ณด๋‹จ ๋А๋ฆฌ์ง€ ์•Š์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (๊ทธ๋ž˜๋„ C๋ณด๋‹ค ๊ฑฐ์˜ 10~100๋ฐฐ ๋А๋ฆฌ๊ธด ํ•˜์ง€๋งŒ์š”.) Mathematica๋„ ํ–‰๋ ฌ ๊ณ„์‚ฐ์€ BLAS๋ฅผ ์ด์šฉํ•˜๊ณ  ๊ฐ–๊ฐ€์ง€ ํƒ์›”ํ•œ ์ˆ˜์น˜ ๊ณ„์‚ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๊ธฐ์— ํŠน์ • ๊ณ„์‚ฐ๋“ค์€ ์‹ฌ์ง€์–ด numpy๋ฅผ ์ด์šฉํ•œ Python๋ณด๋‹ค ๋น ๋ฅด๊ธฐ๊นŒ์ง€ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ, Mathematica์—์„œ๋„ ๊ธฐํ˜ธ์  ๋ฏธ๋ถ„๊ณผ ์ˆ˜์น˜์ ์ธ ์—ฐ์‚ฐ์„ ์„œ๋กœ ์˜ค๊ฐˆ๋•Œ์—๋Š” ์—ญ์‹œ๋‚˜ ํฐ ์†๋„์ €ํ•˜๊ฐ€ ํ•„์—ฐ์ ์œผ๋กœ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ๊ทœ๋ชจ๊ฐ€ ํฐ ๋ฏธ๋ถ„ ๊ณ„์‚ฐ์— ๋Œ€ํ•ด์„œ๋Š” ์–ด๋–ป๊ฒŒ ์ ‘๊ทผํ•ด์•ผํ• ๊นŒ์š”? ์†๋„ ์ €ํ•˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์œผ๋กœ ๊ตฌํ˜„ํ•˜์ž๋‹ˆ ๊ทœ๋ชจ๊ฐ€ ์ปค์„œ ์˜ค์ฐจ๋„ ๊ทธ๋งŒํผ ๋งŽ์ด ์Œ“์ผํ…Œ๊ณ , ์ •ํ™•๋„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ธฐํ˜ธ์  ๋ฏธ๋ถ„์„ ๊ณ ๋ คํ•˜์ž๋‹ˆ ๊ต‰์žฅํžˆ ์˜ค๋žœ ์‹œ์ผ์ด ๊ฑธ๋ฆด ๊ฒƒ์€ ๋ป”ํ•ฉ๋‹ˆ๋‹ค. ์‹ฌ์ง€์–ด ๋ฉ”๋ชจ๋ฆฌ ๋ฌธ์ œ๋กœ ๊ฒŒ์‚ฐ ๋„์ค‘์— ๋‹ค์šด๋  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ๋„ ๋ฏธ๋ถ„์— ํ•œํ•ด์„œ๋Š” ๊ฑฐ์˜ ์™„๋ฒฝํ•œ ํ•ด๋‹ต์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•ด์„œ๋Š” ๋‹ค์Œ ํฌ์ŠคํŠธ์—์„œ ๋‹ค๋ฃจ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.


๐Ÿ”– ๋ถ€๋ก

A. ๋กœ๋ Œ์ฆˆ ๋‚˜๋น„ ์ฝ”๋“œ

์œ„์—์„œ ์ฒจ๋ถ€ํ•œ ๋กœ๋ Œ์ฆˆ ๋‚˜๋น„ ๊ทธ๋ฆผ๋“ค์€ Rust์˜ ์ˆ˜์น˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ Peroxide๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์†Œ์Šค์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

extern crate peroxide;
use peroxide::fuga::*;

fn main() -> Result<(), Box<dyn Error>> {
    // =========================================
    //  Declare ODE
    // =========================================
    let mut ex_test = ExplicitODE::new(butterfly);

    let init_state: State<f64> = State::new(
        0.0,
        vec![10.0, 1.0, 1.0],
        vec![0.0, 0.0, 0.0],
    );

    ex_test
        .set_initial_condition(init_state)
        .set_method(ExMethod::Euler)
        .set_step_size(0.01f64)
        .set_times(10000);

    let mut ex_test2 = ex_test.clone();
    ex_test2.set_method(ExMethod::RK4);

    // =========================================
    //  Save results
    // =========================================
    let results = ex_test.integrate();
    let results2 = ex_test2.integrate();

    let mut df_euler = DataFrame::from_matrix(results);
    df_euler.set_header(vec!["t", "x", "y", "z"]);
    df_euler.print();

    let mut df_rk4 = DataFrame::from_matrix(results2);
    df_rk4.set_header(vec!["t", "x", "y", "z"]);
    df_rk4.print();

    df_euler.write_nc("data/euler.nc")?;
    df_rk4.write_nc("data/rk4.nc")?;

    Ok(())
}

fn butterfly(st: &mut State<f64>, _: &NoEnv) {
    let x = &st.value;
    let dx = &mut st.deriv;
    dx[0] = 10f64 * (x[1] - x[0]);
    dx[1] = 28f64 * x[0] - x[1] - x[0] * x[2];
    dx[2] = -8f64/3f64 * x[2] + x[0] * x[1];
}

์ดํ›„์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€์„œ ๊ทธ๋ฆผ์„ ๊ทธ๋ฆฌ๋Š” ๊ฒƒ์€ Python์œผ๋กœ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

from netCDF4 import Dataset
import matplotlib.pyplot as plt

# Import netCDF file
ncfile1 = './data/euler.nc'
data1 = Dataset(ncfile1)
var1 = data1.variables
ncfile2 = './data/rk4.nc'
data2 = Dataset(ncfile2)
var2 = data2.variables

# Use latex
plt.rc('text', usetex=True)
plt.rc('font', family='serif')

# Prepare Plot
plt.figure(figsize=(10,6), dpi=300)
plt.title(r"Lorenz Butterfly (Euler)", fontsize=16)
plt.xlabel(r'$x$', fontsize=14)
plt.ylabel(r'$z$', fontsize=14)

# Prepare Data to Plot
x1 = var1['x'][:]
z1 = var1['z'][:]  

# Plot with Legends
plt.plot(x1, z1, label=r'Lorenz (Euler)')

# Other options
plt.legend(fontsize=12)
plt.grid()
plt.savefig("euler.png", dpi=300)

# Prepare Plot
plt.figure(figsize=(10,6), dpi=300)
plt.title(r"Lorenz Butterfly (RK4)", fontsize=16)
plt.xlabel(r'$x$', fontsize=14)
plt.ylabel(r'$z$', fontsize=14)

# Prepare Data to Plot
x2 = var2['x'][:]
z2 = var2['z'][:]  

# Plot with Legends
plt.plot(x2, z2, label=r'Lorenz (RK4)')

# Other options
plt.legend(fontsize=12)
plt.grid()
plt.savefig("rk4.png", dpi=300)

์ด์™ธ์— ์ž์„ธํ•œ ์‚ฌํ•ญ์€ Peroxide Gallery์— ๋‚˜์™€์žˆ์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

]]>
๐Ÿง™ Rust์™€ ๋ฏธ๋ถ„ํ•˜๊ธฐ 01: ์ˆ˜์น˜์  ๋ฏธ๋ถ„ https://axect.github.io/kr/posts/002_ad_1/ Sun, 24 May 2020 02:44:11 +0900 https://axect.github.io/kr/posts/002_ad_1/ <blockquote> <p><strong>๐Ÿ”– Automatic Differentiation Series</strong></p> <ol> <li><a href="../002_ad_1">๐Ÿ’ป Numerical Differentiation</a></li> <li><a href="../002_ad_2">๐Ÿ–Š๏ธ Symbolic Differentiation</a></li> <li><a href="../007_ad_3">๐Ÿค– Automatic Differentiation</a></li> </ol> </blockquote> <p>๋ฏธ๋ถ„์€ ํฌ๋Œ€์˜ ์ฒœ์žฌ์˜€๋˜ ์•„์ด์ž‘ ๋‰ดํ„ด์ด๋ž˜๋กœ ์—†์–ด์„œ๋Š” ์•ˆ ๋  ์ค‘์š”ํ•œ ๊ฐœ๋…์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ๊ณผ๋‚˜ ์ด๊ณผ ๋ชจ๋‘ ๊ตฌ๋ถ„์—†์ด ๊ณ ๋“ฑํ•™๊ต๋•Œ ์ ์–ด๋„ ๋‹คํ•ญํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๋ฒ•์€ ๋ฐฐ์šฐ๋ฉฐ ์ด๊ณต๊ณ„๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ํ•™๊ณผ์—์„œ ๋ฏธ๋ถ„๋ฐฉ์ •์‹์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌํ•™๊ณผ์˜ ๊ฒฝ์šฐ๋Š” ์ข€ ๋” ๋ฏธ๋ถ„ ์˜์กด๋„๊ฐ€ ์‹ฌํ•œ๋ฐ, ๋‹น์žฅ ๋ฌผ๋ฆฌ์˜ ์‹œ์ž‘์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ์ „์—ญํ•™๋ถ€ํ„ฐ ์˜ค์ผ๋Ÿฌ-๋ผ๊ทธ๋ž‘์ฃผ ๋ฐฉ์ •์‹(Euler-Lagrange equation)์— ์˜์กดํ•˜๋ฉฐ ๋ฌผ๋ฆฌํ•™๊ณผ์˜ ํ•ต์‹ฌ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋Š” ์ „์ž๊ธฐํ•™, ์–‘์ž์—ญํ•™์€ ๊ฑฐ์˜ ๋ชจ๋“  ์ˆ˜์‹์— ๋ฏธ๋ถ„์ด ๋น ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.</p>

๐Ÿ”– Automatic Differentiation Series

  1. ๐Ÿ’ป Numerical Differentiation
  2. ๐Ÿ–Š๏ธ Symbolic Differentiation
  3. ๐Ÿค– Automatic Differentiation

๋ฏธ๋ถ„์€ ํฌ๋Œ€์˜ ์ฒœ์žฌ์˜€๋˜ ์•„์ด์ž‘ ๋‰ดํ„ด์ด๋ž˜๋กœ ์—†์–ด์„œ๋Š” ์•ˆ ๋  ์ค‘์š”ํ•œ ๊ฐœ๋…์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ๊ณผ๋‚˜ ์ด๊ณผ ๋ชจ๋‘ ๊ตฌ๋ถ„์—†์ด ๊ณ ๋“ฑํ•™๊ต๋•Œ ์ ์–ด๋„ ๋‹คํ•ญํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๋ฒ•์€ ๋ฐฐ์šฐ๋ฉฐ ์ด๊ณต๊ณ„๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ํ•™๊ณผ์—์„œ ๋ฏธ๋ถ„๋ฐฉ์ •์‹์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๋ฌผ๋ฆฌํ•™๊ณผ์˜ ๊ฒฝ์šฐ๋Š” ์ข€ ๋” ๋ฏธ๋ถ„ ์˜์กด๋„๊ฐ€ ์‹ฌํ•œ๋ฐ, ๋‹น์žฅ ๋ฌผ๋ฆฌ์˜ ์‹œ์ž‘์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ์ „์—ญํ•™๋ถ€ํ„ฐ ์˜ค์ผ๋Ÿฌ-๋ผ๊ทธ๋ž‘์ฃผ ๋ฐฉ์ •์‹(Euler-Lagrange equation)์— ์˜์กดํ•˜๋ฉฐ ๋ฌผ๋ฆฌํ•™๊ณผ์˜ ํ•ต์‹ฌ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋Š” ์ „์ž๊ธฐํ•™, ์–‘์ž์—ญํ•™์€ ๊ฑฐ์˜ ๋ชจ๋“  ์ˆ˜์‹์— ๋ฏธ๋ถ„์ด ๋น ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋‹น์—ฐํ•˜๊ฒŒ๋„ ์ˆ˜์น˜ ๊ณ„์‚ฐ ๋ถ„์•ผ์—์„œ๋„ ๋ฏธ๋ถ„์€ ํ•ญ์ƒ ๋“ฑ์žฅํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ์ธ๊ฐ„์ด ๋ฏธ๋ถ„์„ ์ดํ•ดํ•˜๋Š” ๋ฐฉ์‹๊ณผ ์ปดํ“จํ„ฐ๊ฐ€ ์ดํ•ดํ•˜๋Š” ๋ฐฉ์‹์€ ์ฐจ์ด๊ฐ€ ์žˆ๊ธฐ์— ๋ฏธ๋ถ„์„ ๋ฐ›์•„๋“ค์ด๋Š” ๋ฐฉ๋ฒ• ์—ญ์‹œ ์กฐ๊ธˆ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ผ๋‹จ ๋ฏธ์ ๋ถ„ํ•™์—์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋ฐฐ์šฐ๋Š” ๋„ํ•จ์ˆ˜์˜ ์ •์˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ f’(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h} $$

์˜ˆ๋ฅผ ๋“ค์–ด $f(x) = x^2$์„ ๋ฏธ๋ถ„ํ•œ๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ \lim_{h \rightarrow 0} \frac{(x+h)^2 - x^2}{h} = \lim_{h \rightarrow 0}\frac{2hx + h^2}{h} = 2x $$

ํ•˜์ง€๋งŒ ์ปดํ“จํ„ฐ๊ฐ€ ์ด ๋ฌธ์ œ๋ฅผ ์ ‘ํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด ์ƒ๋‹นํžˆ ๋‚œ๊ฐํ•œ ์ƒํ™ฉ์— ๋†“์ž…๋‹ˆ๋‹ค. ๊ทนํ•œ์ด๋ผ๋Š” ๊ฐœ๋…์ด ์ปดํ“จํ„ฐ์˜ ๊ตฌ์กฐ์™€ ๋Œ€์น˜๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. $h$๊ฐ€ $0$์œผ๋กœ ๊ฐ€๋Š” ๊ทนํ•œ์ด๋ผ๋Š” ๊ฒƒ์€ 0์— ํ•œ์—†์ด ๊ฐ€๊นŒ์ด ์ ‘๊ทผํ•œ๋‹ค๋Š” ์˜๋ฏธ๋กœ $h$์™€ $0$์˜ ์ฐจ์ด๊ฐ€ ๊ทธ ์–ด๋–ค ์ˆซ์ž๋ณด๋‹ค ์ž‘๊ฒŒ ๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๋œป์ธ๋ฐ, ์ปดํ“จํ„ฐ๋Š” ๊ตฌ์กฐ ์ƒ ํ•œ์—†์ด ๊ฐ€๊นŒ์ด ๊ฐ€๋Š” ๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ๋Œ€๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•˜๊ณ  ์žˆ๋Š” 64bit ์ปดํ“จํ„ฐ๋Š” $2^{-53}$ ์ดํ•˜, ์ฆ‰, ๋Œ€๋žต $10^{-16}$์ดํ•˜์˜ ์ฐจ์ด๋Š” $0$๊ณผ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์‚ฌ๋žŒ๋“ค์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹์œผ๋กœ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.

ย 


๐Ÿ’ป ์ˆ˜์น˜์  ๋ฏธ๋ถ„ (Numerical Differentiation)

์ปดํ“จํ„ฐ๋Š” ๊ทนํ•œ์„ ๋ณธ์งˆ์ ์œผ๋กœ ๋‹ค๋ฃฐ ์ˆ˜ ์—†์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ๊ณ„์‚ฐ์—์„œ๋Š” $10^{-16}$ ์ •๋„๋ฉด ์•„์ฃผ ์ถฉ๋ถ„ํ•œ ์ •๋ฐ€๋„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜น์€ ๋‹จ์œ„๋ฅผ ์กฐ์ •ํ•˜๋ฉด์„œ ์ถฉ๋ถ„ํ•œ ์ •๋ฐ€๋„๊ฐ€ ๋˜๋„๋ก ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ทนํ•œ์„ ๋‹ค๋ฃจ๋Š” ๋Œ€์‹  ์•„์ฃผ ์ž‘์€ $h$๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ทนํ•œ์˜ ๊ทผ์‚ฟ๊ฐ’์„ ๊ตฌํ•˜์—ฌ ๊ณ„์‚ฐ์— ์ด์šฉํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์„ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์ด๋ผ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋‹จ ์•„์ฃผ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์„ ๊ตฌํ˜„ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

# Python
def diff(f, x, h):
    return (f(x+h) - f(x)) / h

์ˆ˜์น˜์  ๋ฏธ๋ถ„์˜ Python ๊ตฌํ˜„์€ ๋†€๋ผ์šธ ์ •๋„๋กœ ์•„์ฃผ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ํ•จ์ˆ˜์™€ ๋ณ€์ˆ˜ ๊ทธ๋ฆฌ๊ณ  ์ •๋ฐ€๋„๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด ๋ฐ”๋กœ ๋ฏธ๋ถ„๊ฐ’์ด ๋‚˜์˜ต๋‹ˆ๋‹ค.

// Rust
fn diff<F: Fn(f64) -> f64>(f: F, x: f64, h: f64) -> f64 {
    (f(x+h) - f(x)) / h
}

Rust ๊ตฌํ˜„๋„ ๋น„๊ต์  ๊ฐ„๋‹จํ•œ ํŽธ์ด์ง€๋งŒ, ํƒ€์ž…์„ ๋ช…์‹œํ•ด์•ผ ๋˜๋Š” ์ ์ด Python๊ณผ์˜ ์ฐจ์ด๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. Rust์—์„œ ํ•จ์ˆ˜๋ฅผ ์ธ์ˆ˜๋กœ ๋ฐ›์„ ๋•Œ๋Š” ์œ„์™€ ๊ฐ™์ด ์ œ๋„ˆ๋ฆญ ํƒ€์ž…(Generic Type)์œผ๋กœ ๋ฐ›๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์•ผ ๋ช…์‹œ์  ํ•จ์ˆ˜๋‚˜ ํด๋กœ์ €(Closure) ๊ตฌ๋ถ„ ์—†์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋“ค์„ ์ด์šฉํ•˜์—ฌ $f(x) = x^2$์˜ $x=1$์—์„œ์˜ ๋ฏธ๋ถ„ ๊ณ„์ˆ˜๋ฅผ ๊ตฌํ•ด๋ด…์‹œ๋‹ค.

// Rust
fn main() {
    println!("{}", diff(f, 1f64, 1e-6));
}

fn diff<F: Fn(f64) -> f64>(f: F, x: f64, h: f64) -> f64 {
    (f(x+h)-f(x)) / h
}

fn f(x: f64) -> f64 {
    x.powi(2)
}

์ฝ”๋“œ์—์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ์ •๋ฐ€๋„๋Š” $h=10^{-6}$์„ ๋Œ€์ž…ํ•˜์—ฌ ๊ณ„์‚ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” $2.0000009999243673$์œผ๋กœ ์†Œ์ˆซ์  6๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€๋Š” ์ด๋ก  ๊ฐ’์ธ $2$์™€ ์ผ์น˜ํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด ์ˆ˜์น˜์  ๋ฏธ๋ถ„์ฝ”๋“œ๋Š” ๊ฐ„๋‹จํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ ๋ฏธ๋ถ„ ๊ฐ’์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์ง€๋งŒ, ๋„ํ•จ์ˆ˜๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฐ˜๋ณต์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ๋Œ€์ž…ํ•ด์•ผ ๋œ๋‹ค๋Š” ์ ์—์„œ ๋ถˆํŽธํ•จ์„ ์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋„ํ•จ์ˆ˜๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์กฐ๊ธˆ ๋” ์ฝ”๋“œ๋ฅผ ๋Š˜๋ ค์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ € ๊ตฌ์กฐ์ฒด๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฐ์ฒด์ง€ํ–ฅ์  ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

// Rust
struct Derivative<F: Fn(f64) -> f64> {
    pub f: F,
    pub h: f64,
}

impl<F: Fn(f64) -> f64> Derivative<F> {
    fn f(&self, x: f64) -> f64 {
        (self.f)(x)
    }

    fn calc(&self, x: f64) -> f64 {
        (self.f(x+self.h) - self.f(x)) / self.h
    }
}

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ•จ์ˆ˜์™€ ์ •๋ฐ€๋„๋Š” ์ดˆ๊ธฐ ์„ ์–ธ์‹œ์—๋งŒ ์ž…๋ ฅํ•˜๋ฉด ๋˜๊ณ , calc ๋ฉ”์†Œ๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ $x$ ๊ฐ’์—์„œ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค. f ๋ฉ”์†Œ๋“œ๋Š” ๋ณด๋‹ค ํŽธํ•˜๊ฒŒ self.f(x)๋ฅผ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด ์„ ์–ธํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋งŒ์ผ ์ด๋Ÿฌํ•œ ๋ฉ”์†Œ๋“œ๊ฐ€ ์—†๋‹ค๋ฉด (self.f)(x) ๊ผด๋กœ ์ž…๋ ฅํ•ด์•ผ๋งŒ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ ์ด์ œ ์ด ์ฝ”๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ ์•ž์—์„œ์˜ ์˜ˆ์‹œ๋ฅผ ๊ตฌํ˜„ํ•ด๋ด…์‹œ๋‹ค.

// Rust
fn main() {
    let df = Derivative {
        f,
        h: 1e-6,
    };
    println!("{}", df.calc(1f64));
}

fn f(x: f64) -> f64 {
    x.powi(2)
}

struct Derivative<F: Fn(f64) -> f64> {
    pub f: F,
    pub h: f64,
}

impl<F: Fn(f64) -> f64> Derivative<F> {
    fn f(&self, x: f64) -> f64 {
        (self.f)(x)
    }

    fn calc(&self, x: f64) -> f64 {
        (self.f(x+self.h) - self.f(x)) / self.h
    }
}

๋‹น์—ฐํ•˜๊ฒŒ๋„ ๋‹ต์€ ์•„๊นŒ์˜ ๊ฒฝ์šฐ์™€ ๊ฐ™๊ฒŒ ๋‚˜์˜ต๋‹ˆ๋‹ค. ์ด๋ฒˆ์—๋Š” ์ง„์งœ “๋„ํ•จ์ˆ˜"๋ฅผ ๋งŒ๋“œ๋Š” ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ์˜ ๊ณ ๊ณ„ ํ•จ์ˆ˜(Higher order function) ๊ฐœ๋…์„ ์ด์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

// Rust
fn derivative<F: Fn(f64) -> f64>(f: F, h: f64) -> impl Fn(f64) -> f64 {
    move |x: f64| (f(x+h) - f(x)) / h
}

์ผ๋‹จ F๋กœ f64 -> f64 ํ•จ์ˆ˜ ์—ญํ• ์„ ํ•˜๋Š” ๋ชจ๋“  ํƒ€์ž…์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์€ ์•ž์—์„œ์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋ฐ˜ํ™˜ ํƒ€์ž… ๋ถ€๋ถ„์— ๋‚ฏ์„  ํ‚ค์›Œ๋“œ๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. Rust์˜ Generic์—๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ์•ž์„œ ๋ดค๋˜ F์™€ ๊ฐ™์ด Type placeholder๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด๊ณ , ๋‘ ๋ฒˆ์งธ๋Š” impl Trait์ฒ˜๋Ÿผ implํ‚ค์›Œ๋“œ๋ฅผ ์ด์šฉํ•˜๋Š” ๋ฐฉ์‹์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ฐฉ์‹ ๋ชจ๋‘ ํฐ ์ฐจ์ด๋Š” ์—†์ง€๋งŒ, ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํƒ€์ž…์ด ๊ฐ™์ด ์“ฐ์ผ ๋•Œ์— ๊ฐ ํƒ€์ž…๋“ค์ด ๊ฐ™์€ ํƒ€์ž…์ธ์ง€, ๋‹ค๋ฅธ ํƒ€์ž…์ธ์ง€ ๋ช…ํ™•ํžˆ ํ•  ๋•Œ์—๋Š” ์ „์ž์˜ ๋ฐฉ์‹์„ ์“ฐ๊ณ , ํ•œ ๊ฐ€์ง€ ํƒ€์ž…๋งŒ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ํƒ€์ž… ์ข…๋ฅ˜๋ณด๋‹ค๋Š” ์—ญํ• ์ด ์ค‘์š”ํ•  ๋•Œ์—๋Š” ํ›„์ž์˜ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์œ„ ์ฝ”๋“œ๋ฅผ Type placeholder๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

// Rust
fn derivative<F, G>(f: F, h: f64) -> G 
where
    F: Fn(f64) -> f64,
    G: Fn(f64) -> f64,
{
    move |x: f64| (f(x+h) - f(x)) / h
}

์•„๋ž˜์˜ ๊ตฌํ˜„์ด ๊ฐ€๋…์„ฑ ๋ฉด์—์„œ๋‚˜ ์˜๋ฏธ ๋ฉด์—์„œ ์ข€ ๋” ์ข‹์€ ๊ตฌํ˜„์ด์ง€๋งŒ, ์ด ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์ œ์•ฝ์ด ์‹ฌํ•œ ํŽธ์ž…๋‹ˆ๋‹ค. impl Trait ๊ผด๋กœ ๋ฐ˜ํ™˜ํ•˜๋ฉด Rust๋Š” ๊ทธ ํƒ€์ž…์„ ํ•จ์ˆ˜ ๋ณธ๋ฌธ์—์„œ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฐ’์œผ๋กœ ์ž๋™ ์ถ”๋ก ํ•˜์—ฌ ์‚ฌ์šฉํ•˜์ง€๋งŒ, Type placeholder๋กœ ๋ฐ˜ํ™˜ํ•˜๋ฉด ๊ทธ ํƒ€์ž…์„ ๋ช…ํ™•ํžˆ ํ•˜๊ธฐ ์ „์—๋Š” ์ปดํŒŒ์ผ ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ ์œ„์™€ ๊ฐ™์€ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•  ๋•Œ, ํด๋กœ์ €์˜ ์„ฑ์งˆ์— ์œ ์˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ํด๋กœ์ €๋Š” ์ธ์ˆ˜๋กœ ๋“ค์–ด์˜จ ๊ฐ’์ด ์•„๋‹Œ ์ฃผ๋ณ€ ํ™˜๊ฒฝ๋„ ๊ฐ™์ด ์บก์ณ๋ฅผ ํ•˜๋Š” ์„ฑ์งˆ์ด ์žˆ๋Š”๋ฐ, ์ด๋•Œ, ์ฃผ๋ณ€ ๋ณ€์ˆ˜๋“ค์ด ํด๋กœ์ € ๋ฐ–์—์„œ๋„ ์ƒ์กดํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ปดํŒŒ์ผ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์œ„ ์ฝ”๋“œ์—์„œ๋„ f์™€ h๋Š” ํ•จ์ˆ˜์˜ ์ธ์ˆ˜๋กœ ๋ฐ›์•˜๊ธฐ์— ํ•จ์ˆ˜์˜ ์„ ์–ธ์ด ๋๋‚˜๋Š” ์‹œ์ ์— ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•ด์ œ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ฐ˜ํ™˜๋˜๋Š” ํด๋กœ์ €๋Š” f์™€ h์˜ ๊ฐ’์„ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ move ํ‚ค์›Œ๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ f์™€ h์˜ ์†Œ์œ ๊ถŒ์„ ํด๋กœ์ €์— ๋„˜๊ฒจ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด์ œ ์„ค๋ช…์ด ๋๋‚ฌ์œผ๋‹ˆ ์ด ๊ณ ๊ณ„ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋„ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

// Rust
fn main() {
    let df = derivative(f, 1e-6);
    println!("{}", df(1f64));
}

fn f(x: f64) -> f64 {
    x.powi(2)
}

fn derivative<F: Fn(f64) -> f64>(f: F, h: f64) -> impl Fn(f64) -> f64 {
    move |x: f64| (f(x+h) - f(x)) / h
}

๋‹ต์€ ์œ„์˜ ๋‘ ๊ฒฝ์šฐ์™€ ์ •ํ™•ํžˆ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿผ ์ด์ œ ์ˆ˜์น˜์  ๋ฏธ๋ถ„ ๋ฐฉ์‹์˜ ์žฅ์ ๊ณผ ๋‹จ์ ์„ ์š”์•ฝํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ˆ˜์น˜์  ๋ฏธ๋ถ„์˜ ์žฅ๋‹จ์ 

  • ์žฅ์ 
    • ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ด ๊ต‰์žฅํžˆ ์‰ฝ๋‹ค.
    • ์•„์ฃผ ๋น ๋ฅด๊ฒŒ ๋ฏธ๋ถ„ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋‹จ์ 
    • ์˜ค์ฐจ๊ฐ€ ์Œ“์ด๋ฉด์„œ ์‹ค์ œ ๊ฐ’๊ณผ ๋งŽ์ด ๋‹ค๋ฅธ ๊ฐ’์ด ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

๊ณ„์‚ฐ ์†๋„์™€ ํŽธ์˜ ์ƒ์˜ ํฐ ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ ์˜ค์ฐจ๊ฐ€ ๊ณ„์† ์Œ“์ผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ Step size๋Š” ์ž‘์ง€๋งŒ ๊ตฌ๊ฐ„์€ ๊ธด ์ˆ˜์น˜๋ฏธ๋ถ„๋ฐฉ์ •์‹ ๋“ฑ์€ ์ˆ˜์น˜์  ๋ฏธ๋ถ„์„ ์ ์šฉํ•˜๊ธฐ์— ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹คํ–‰ํžˆ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•ด์„œ๋Š” ๋‹ค์Œ์— ๋‹ค๋ค„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

]]>
๐Ÿช ๊ฐ€์šฐ์‹œ์•ˆ ์ •๋ณตํ•˜๊ธฐ 01: ๋‹จ์ผ๋ณ€์ˆ˜ https://axect.github.io/kr/posts/001_gaussian/ Fri, 22 May 2020 17:00:31 +0900 https://axect.github.io/kr/posts/001_gaussian/ <p>๋ฌผ๋ฆฌํ•™์ด๋‚˜ ํ†ต๊ณ„ํ•™ ๋“ฑ์„ ํ•˜๋‹ค๋ณด๋ฉด ํ•ญ์ƒ ๋งˆ์ฃผ์น˜๋Š” ์›์ˆ˜ ๊ฐ™์€ ์กด์žฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ„๋กœ ์–ด๋ ต์ง€๋Š” ์•Š์€๋ฐ ๋งˆ์ฃผ์น  ๋•Œ๋งˆ๋‹ค ํ—ท๊ฐˆ๋ฆฌ๋Š” ๊ทธ ์กด์žฌ๋Š” ๋ฐ”๋กœ <strong>๊ฐ€์šฐ์Šค ์ ๋ถ„</strong>(Gaussian Integral)์ž…๋‹ˆ๋‹ค.</p> <p>$$\int_{-\infty}^\infty e^{-\alpha x^2} dx$$</p> <p>์ด๊ณต๊ณ„ ๋Œ€ํ•™์ƒ์ด๋ผ๋ฉด 1ํ•™๋…„ ๋ฏธ์ ๋ถ„ํ•™ ์‹œ๊ฐ„์— ๊ทน์ขŒํ‘œ๊ณ„(Polar coordinate)๋ฅผ ์ด์šฉํ•œ ์ด์ค‘์ ๋ถ„์„ ๋‹ค๋ฃฐ ๋•Œ ๋‚˜์˜ค๋Š” ๊ฐ€์žฅ ๊ธฐ๋ณธ๋ฌธ์ œ๋กœ ๊ฐ€์šฐ์Šค ์ ๋ถ„์„ ๊ธฐ์–ตํ• ๊ฒ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ•ญ์ƒ ๊ฑฐ์˜ ๋ชจ๋‘๊ฐ€ ๊ทธ๋ ‡๋“ฏ์ด ์‹œ๊ฐ„์ด ์ง€๋‚˜๋ฉด ์ง€๋‚  ์ˆ˜๋ก ๊ธฐ์–ต์€ ํ’ํ™”๋˜๊ณ  ๊ฑฐ์˜ ๋ง๊ฐ์˜ ๋‹จ๊ณ„์— ์ด๋ฅด๋ €์„ ๋•Œ์— ๊ฐ‘์ž๊ธฐ ํŠ€์–ด๋‚˜์˜ค๋Š” ๋‚ฏ์„  ํ˜•ํƒœ์˜ ๊ฐ€์šฐ์Šค ์ ๋ถ„๋“ค์€ ๋Œ€์ฒ˜ํ•˜๊ธฐ๊ฐ€ ๋‚œ๊ฐํ•ฉ๋‹ˆ๋‹ค.</p> <p>๋”ฐ๋ผ์„œ ์—ฌ๊ธฐ์„œ๋Š” ๊ฐ€์šฐ์Šค ์ ๋ถ„๊ณผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์•„์ฃผ ๊ธฐ๋ณธ์ ์ธ ์„ฑ์งˆ๋“ค์„ ๋‹ค์‹œ ์ƒ๊ธฐ์‹œํ‚ค๊ณ  ์ด๋ฅผ ๋ฐœํŒ์‚ผ์•„ ๋‹ค๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ(Multivariate Gaussian)๊ณผ ์—ฌ๋Ÿฌ ํ™œ์šฉ๋“ค์„ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.</p> ๋ฌผ๋ฆฌํ•™์ด๋‚˜ ํ†ต๊ณ„ํ•™ ๋“ฑ์„ ํ•˜๋‹ค๋ณด๋ฉด ํ•ญ์ƒ ๋งˆ์ฃผ์น˜๋Š” ์›์ˆ˜ ๊ฐ™์€ ์กด์žฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ„๋กœ ์–ด๋ ต์ง€๋Š” ์•Š์€๋ฐ ๋งˆ์ฃผ์น  ๋•Œ๋งˆ๋‹ค ํ—ท๊ฐˆ๋ฆฌ๋Š” ๊ทธ ์กด์žฌ๋Š” ๋ฐ”๋กœ ๊ฐ€์šฐ์Šค ์ ๋ถ„(Gaussian Integral)์ž…๋‹ˆ๋‹ค.

$$\int_{-\infty}^\infty e^{-\alpha x^2} dx$$

์ด๊ณต๊ณ„ ๋Œ€ํ•™์ƒ์ด๋ผ๋ฉด 1ํ•™๋…„ ๋ฏธ์ ๋ถ„ํ•™ ์‹œ๊ฐ„์— ๊ทน์ขŒํ‘œ๊ณ„(Polar coordinate)๋ฅผ ์ด์šฉํ•œ ์ด์ค‘์ ๋ถ„์„ ๋‹ค๋ฃฐ ๋•Œ ๋‚˜์˜ค๋Š” ๊ฐ€์žฅ ๊ธฐ๋ณธ๋ฌธ์ œ๋กœ ๊ฐ€์šฐ์Šค ์ ๋ถ„์„ ๊ธฐ์–ตํ• ๊ฒ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ•ญ์ƒ ๊ฑฐ์˜ ๋ชจ๋‘๊ฐ€ ๊ทธ๋ ‡๋“ฏ์ด ์‹œ๊ฐ„์ด ์ง€๋‚˜๋ฉด ์ง€๋‚  ์ˆ˜๋ก ๊ธฐ์–ต์€ ํ’ํ™”๋˜๊ณ  ๊ฑฐ์˜ ๋ง๊ฐ์˜ ๋‹จ๊ณ„์— ์ด๋ฅด๋ €์„ ๋•Œ์— ๊ฐ‘์ž๊ธฐ ํŠ€์–ด๋‚˜์˜ค๋Š” ๋‚ฏ์„  ํ˜•ํƒœ์˜ ๊ฐ€์šฐ์Šค ์ ๋ถ„๋“ค์€ ๋Œ€์ฒ˜ํ•˜๊ธฐ๊ฐ€ ๋‚œ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์—ฌ๊ธฐ์„œ๋Š” ๊ฐ€์šฐ์Šค ์ ๋ถ„๊ณผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์•„์ฃผ ๊ธฐ๋ณธ์ ์ธ ์„ฑ์งˆ๋“ค์„ ๋‹ค์‹œ ์ƒ๊ธฐ์‹œํ‚ค๊ณ  ์ด๋ฅผ ๋ฐœํŒ์‚ผ์•„ ๋‹ค๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ(Multivariate Gaussian)๊ณผ ์—ฌ๋Ÿฌ ํ™œ์šฉ๋“ค์„ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.


๐Ÿ”ข ๊ธฐ๋ณธ์ ์ธ ๊ฐ€์šฐ์Šค ์ ๋ถ„

์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฐœ๋…

  • ๊ณ ๊ต ์ˆ˜์ค€์˜ ์ ๋ถ„ (์น˜ํ™˜ ์ ๋ถ„, ์ง€์ˆ˜ํ•จ์ˆ˜์˜ ์ ๋ถ„)
  • ๊ทน์ขŒํ‘œ๊ณ„
  • ์ด์ค‘์ ๋ถ„

๊ฐ€์šฐ์Šค ์ ๋ถ„์€ ์ „ํ˜•์ ์ธ ์ฒ˜์Œ ํ•˜๊ธฐ๋Š” ํž˜๋“ค์ง€๋งŒ ํ•œ ๋ฒˆ ๋ณด๋ฉด ๋ˆ„๊ตฌ๋‚˜ ํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ์˜ ์Šคํ‚ฌ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ๋ณด๊ณ  ๊ทน์ขŒํ‘œ๊ณ„์—์„œ์˜ ์ด์ค‘์ ๋ถ„์„ ๋– ์˜ฌ๋ฆฌ๊ธฐ๋Š” ํž˜๋“ค์ง€๋งŒ ๊ทธ๊ฒƒ์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๊นจ๋‹ซ๋Š” ์ˆœ๊ฐ„ ๋ฌธ์ œ๋Š” ์•„์ฃผ ๊ธฐ์ดˆ์ ์ธ ๋ฏธ์ ๋ถ„ ๋ฌธ์ œ๋กœ ๊ฒฉํ•˜๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ ๋จผ์ € ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๊ฐ€์šฐ์Šค ์ ๋ถ„์„ ๋ด…์‹œ๋‹ค.

ย 

์ผ์ฐจ์› ๊ฐ€์šฐ์Šค ์ ๋ถ„

$$ \int_{-\infty}^\infty e^{-\alpha x^2} dx = \sqrt{\frac{\pi}{\alpha}} $$

์ด๊ฒƒ์„ ๋ฐ”๋กœ ์ฆ๋ช…ํ•˜๊ธฐ์—๋Š” ์–ด๋ ค์›€์ด ์žˆ์œผ๋ฏ€๋กœ ์ œ๊ณฑ ๊ผด์„ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. $$\left(\int_{-\infty}^\infty e^{-\alpha x^2}\right)^2 = \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-\alpha(x^2 + y^2)} dx dy$$ ์ด๊ฒƒ์„ $x^2 + y^2 = r^2,~dxdy = rdrd\theta$์ž„์„ ์ด์šฉํ•˜์—ฌ ๊ทน์ขŒํ‘œ๊ณ„ ์ ๋ถ„์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์•„์ฃผ ๊ฐ„๋‹จํ•œ ์น˜ํ™˜ ์ ๋ถ„์œผ๋กœ ์ ๋ถ„์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. $$\int_{0}^\infty \int_{0}^{2\pi} re^{-\alpha r^2} dr d\theta = \frac{\pi}{\alpha}$$ ๊ฐ€์šฐ์Šค ์ ๋ถ„์˜ ์ œ๊ณฑ์ด ์œ„์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋˜์—ˆ๊ณ , ๊ฐ€์šฐ์‹œ์•ˆ ํ•จ์ˆ˜($e^{-\alpha x^2}$)๋Š” ํ•ญ์ƒ ์–‘์ˆ˜์ด๋ฏ€๋กœ ์œ„ ์‹์ด ์„ฑ๋ฆฝํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณดํ†ต ๋ฌผ๋ฆฌํ•™์ด๋‚˜ ํ†ต๊ณ„ํ•™์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ๊ฐ€์šฐ์Šค ์ ๋ถ„์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค.

$$ \int_{-\infty}^\infty e^{-\frac{a}{2}y^2} dy = (2\pi)^{\frac{1}{2}} a^{-\frac{1}{2}} $$

ย 

์ผ์ฐจ์› ๊ฐ€์šฐ์Šค ์ ๋ถ„ ์ผ๋ฐ˜ํ™”

$$ (2\pi)^{-\frac{1}{2}}\int_{-\infty}^\infty e^{-\frac{1}{2}a x^2 + Jx} = a^{-\frac{1}{2}} e^{\frac{J^2}{2a}} $$

๋‹จ์ˆœํžˆ ์™„์ „์ œ๊ณฑ์‹์„ ์ด์šฉํ•˜์—ฌ ์ •๋ฆฌํ•˜๋ฉด ํ•ด๊ฒฐ๋˜๋Š” ์ ๋ถ„์ž…๋‹ˆ๋‹ค. $$\int_{-\infty}^\infty e^{-\frac{a}{2}(y - \frac{J}{a})^2 + \frac{J^2}{2a}} = \int_{-\infty}^\infty e^{-\frac{a}{2}t^2} dt \cdot e^{\frac{J^2}{2a}}$$

์ด๋Ÿฐ ํ˜•์‹์˜ ์ ๋ถ„์€ ํ‰๊ท ์ด 0์ด ์•„๋‹Œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋‚˜ ์–‘์ž ๋ฌผ๋ฆฌํ•™์˜ ๊ฒฝ๋กœ ์ ๋ถ„(Path Integral)์—์„œ ์™ธ๋ถ€ ํž˜(External force)์ด ์ž‘์šฉํ•  ๋•Œ์˜ ๊ณ„์‚ฐ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

ย 

ํŒŒ์ธ๋งŒ ํŠธ๋ฆญ์„ ์ด์šฉํ•œ ๊ฐ€์šฐ์Šค ์ ๋ถ„

$$ \int_{-\infty}^\infty x^2 e^{-a x^2}dx = \frac{1}{2}\pi^{\frac{1}{2}} a^{-\frac{3}{2}} $$

๋จผ์ € ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐ€์šฐ์Šค ์ ๋ถ„์„ $a$์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋กœ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. $$I(a) \equiv \int_{-\infty}^\infty e^{-ax^2} dx$$ ์ด๋ฅผ $a$์— ๋Œ€ํ•ด ํŽธ๋ฏธ๋ถ„ ํ•ฉ๋‹ˆ๋‹ค. $$\frac{\partial I(a)}{\partial a} = \int_{-\infty}^\infty -x^2 e^{-ax^2}dx$$ ์šฐ๋ฆฌ๋Š” ์ด๋ฏธ $I(a)$์˜ ๊ฐ’์ด $\sqrt{\pi}{a}$์ž„์„ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์œ„ ์‹์—์„œ ์ขŒ๋ณ€์˜ ๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. $$\frac{\partial I(a)}{\partial a} = -\frac{1}{2}\pi^{\frac{1}{2}} a^{-\frac{3}{2}}$$ ์ด์ œ ์œ„ ์‹ 2๊ฐœ๊ฐ€ ๊ฐ™์Œ์„ ์ด์šฉํ•˜๋ฉด ์ฆ๋ช…์€ ๋๋‚ฉ๋‹ˆ๋‹ค.

ํ†ต์ƒ์ ์œผ๋กœ ์ด๋Ÿฐ ์ ๋ถ„์€ ๋ถ€๋ถ„์ ๋ถ„(Partial integration)์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ด์ง€๋งŒ ํŒŒ์ธ๋งŒ์˜ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•˜๋ฉด ์ด๋ ‡๊ฒŒ ๊ต‰์žฅํžˆ ์‰ฝ๊ฒŒ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ํŠธ๋ฆญ์€ ๋ฌผ๋ฆฌํ•™์—์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋ฉฐ ํ†ต๊ณ„ํ•™์—์„œ๋„ ์•„์ฃผ ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์•Œ์•„๋‘๋ฉด ํฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.


๐Ÿช ๋‹จ๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ (Single variate Gaussian)

Single variate Gaussian

ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜ (Probability Density Function)

์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฐœ๋…

  • ๊ณ ๊ต ์ˆ˜์ค€์˜ ํ†ต๊ณ„ํ•™

๊ฐ€์šฐ์Šค ์ ๋ถ„์˜ ๊ฝƒ์€ ์—ญ์‹œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ†ต๊ณ„ํ•™์—์„œ ์ •๊ทœ๋ถ„ํฌ๋ผ๊ณ ๋„ ์ผ์ปซ๋Š” ์ด ํ™•๋ฅ  ๋ถ„ํฌ๋Š” ํ™•๋ฅ ์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฐ˜๋“œ์‹œ ๊ฐ€์šฐ์Šค ์ ๋ถ„์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋‹จ๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ ํ˜น์€ 1์ฐจ์› ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜(Probability density function)์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ p(x) = \mathcal{N}(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp{\left(-\frac{1}{2\sigma^2}(x-\mu)^2 \right)} $$

์œ„ ์‹์—์„œ ๋ณด๋‹ค์‹œํ”ผ ๋‹จ๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‘ ๋งค๊ฐœ๋ณ€์ˆ˜(Parameter) $\mu$์™€ $\sigma$๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ ์ด ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์˜ ์˜๋ฏธ๋ฅผ ์•Œ๊ณ  ์žˆ๋Š” ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์„ ๊ฒƒ ๊ฐ™์ง€๋งŒ ์ผ๋‹จ์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ด์ƒ์˜ ์˜๋ฏธ๋ฅผ ๋‘์ง€ ์•Š๊ณ  ์„ฑ์งˆ์„ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์œ„์—์„œ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ผ๊ณ  ๋ฏธ๋ฆฌ ์–ธ๊ธ‰ํ–ˆ์ง€๋งŒ, ์ง•๊ฒ€๋‹ค๋ฆฌ๋„ ๋‘๋“œ๋ ค๋ณด๊ณ  ๊ฑด๋„ˆ ๋“ฏ์ด ์ € ์ด์ƒํ•œ ํ•จ์ˆ˜๊ฐ€ ์ง„์งœ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์ธ์ง€ ํ™•์ธ์„ ํ•ด๋ด…์‹œ๋‹ค. ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ ์œ„์‹œํ•œ ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜๋Š” ๋ฐ˜๋“œ์‹œ 2๊ฐ€์ง€์˜ ์กฐ๊ฑด์„ ์ถฉ์กฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  1. ์ •๊ทœํ™”(Normalization) $$\int_{-\infty}^{\infty} p(x) dx = 1$$

  2. ์Œ์ด ์•„๋‹Œ ์ •๋ถ€ํ˜ธ(Nonnegative definite) $$p(x) \geq 0$$

์œ„์—์„œ ์ •์˜ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋Š” ์ง€์ˆ˜ํ•จ์ˆ˜ ํ˜•ํƒœ์ด๋ฏ€๋กœ ํ•ญ์ƒ 0๋ณด๋‹ค ํฐ ๊ฒƒ์€ ์ž๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ •๊ทœํ™” ์กฐ๊ฑด๋งŒ ํ™•์ธํ•˜๋ฉด ์ง•๊ฒ€๋‹ค๋ฆฌ๊ฐ€ ์•ˆ์ „ํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ ๋ˆˆ์น˜์ฑ„์…จ์„ ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ ์ด ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜์˜ ์ •๊ทœํ™” ์กฐ๊ฑด์„ ํ™•์ธํ•˜๋Š” ๊ฒƒ์€ ๊ฐ€์šฐ์Šค ์ ๋ถ„์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$$ \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{1}{2\sigma^2}(x - \mu)^2\right) dx = \frac{1}{\sqrt{2\pi\sigma^2}} \sqrt{2\pi \sigma^2} = 1 $$

๋ชจ๋“  ์กฐ๊ฑด์„ ํ™•์ธํ–ˆ์œผ๋‹ˆ ์šฐ๋ฆฌ๊ฐ€ ์ •์˜ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๊ฐ€ ์ง„์งœ๋ผ๋Š” ๊ฒƒ์„ ๋‚ฉ๋“ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ด ๋ถ„ํฌ์˜ ์„ฑ์งˆ์„ ๋ณด๊ธฐ ์œ„ํ•ด ๊ธฐ๋Œ“๊ฐ’(ํ‰๊ท )๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ตฌํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

ย 

๊ธฐ๋Œ“๊ฐ’ (Expectation value)

ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ธฐ๋Œ“๊ฐ’์˜ ์ •์˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ \mathbb{E}[X] = \int_{-\infty}^\infty x p(x) dx $$

์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹จ๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ๊ธฐ๋Œ“๊ฐ’์„ ๊ตฌํ•ด๋ด…์‹œ๋‹ค. ์‹์ด ์กฐ๊ธˆ ๋ณต์žกํ•  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์ƒ์ˆ˜๋Š” ์ œ์™ธํ•˜๊ณ  ์ ๋ถ„์„ ๋จผ์ € ๊ณ„์‚ฐํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$$ \begin{align*} \int_{-\infty}^\infty xe^{-\frac{(x - \mu)^2}{2\sigma^2}} dx &= \int_{-\infty}^\infty (t + \mu) e^{-\frac{t^2}{2\sigma^2}} dt \\ &= \mu \sqrt{2\pi \sigma^2} \end{align*} $$

๊ณ„์‚ฐ์„ ์กฐ๊ธˆ ์„ค๋ช…ํ•˜๋ฉด, ์ฒซ ๋ฒˆ์งธ ์ค„์—์„œ๋Š” ๋‹จ์ˆœํžˆ $t=(x-\mu)$๋กœ ์น˜ํ™˜ํ•˜์—ฌ ์ „๊ฐœํ•˜์˜€๊ณ  ์ดํ›„, $t e^{-\frac{t^2}{2\sigma^2}}$์ด ๊ธฐํ•จ์ˆ˜(Odd function)์ž„์„ ์ด์šฉ, ์ ๋ถ„ ๊ฐ’์ด 0์ด ๋˜๋ฏ€๋กœ ๋‘๋ฒˆ์งธ ํ•ญ๋งŒ ๊ฐ€์šฐ์Šค ์ ๋ถ„์„ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด์ œ ์—ฌ๊ธฐ์— ์•„๊นŒ ์ž ๊น ๋ฏธ๋ค„๋†“์•˜๋˜ ์ƒ์ˆ˜๋ฅผ ๊ณฑํ•ด์ฃผ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

$$ \mathbb{E}[X] = \mu $$

๋†€๋ž๊ฒŒ๋„ ๊ทธ์ € ๋งค๊ฐœ๋ณ€์ˆ˜ ์ค‘ ํ•˜๋‚˜์ธ ์ค„ ์•Œ์•˜๋˜ $\mu$๊ฐ€ ์‚ฌ์‹ค์€ ๋ถ„ํฌ์˜ ํ‰๊ท ์„ ๋‹ด๋‹นํ•˜๋Š” ์ค‘์š”ํ•œ ๋ณ€์ˆ˜์˜€์Šต๋‹ˆ๋‹ค! ์ด์ œ ์—ฌ์„ธ๋ฅผ ๋ชฐ์•„ ํ‘œ์ค€ํŽธ์ฐจ๋„ ๊ตฌํ•ด๋ด…์‹œ๋‹ค.

ย 

ํ‘œ์ค€ํŽธ์ฐจ (Standard deviation)

ํ‘œ์ค€ํŽธ์ฐจ๋Š” ๋ถ„์‚ฐ (Variance) ์„ ๊ตฌํ•˜๋ฉด ์ž๋™์œผ๋กœ ๋„์ถœ๋˜๋Š” ๊ฐ’์ž…๋‹ˆ๋‹ค. ๋ถ„์‚ฐ์˜ ์ •์˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

$$ Var[X] = \mathbb{E}\left[(X - \mu)^2 \right] $$

๊ทธ๋Ÿผ ๋ฐ”๋กœ ๊ณ„์‚ฐ์„ ์‹œ์ž‘ํ•ด๋ด…์‹œ๋‹ค.

$$ \begin{align*} \int_{-\infty}^\infty (x - \mu)^2 e^{-\frac{(x- \mu^2)}{2\sigma^2}} dx &= \int_{-\infty}^\infty t^2 e^{-\frac{t^2}{2\sigma^2}} dt \\ &= \frac{1}{2} \pi^{\frac{1}{2}} \left(\frac{1}{2\sigma^2}\right)^{-\frac{3}{2}} \\ &= \sigma^2 \sqrt{2\pi\sigma^2} \end{align*} $$

์ด๋ฒˆ์—๋Š” ์•ž์„œ ๋‹ค๋ค˜๋˜ ํŒŒ์ธ๋งŒ ํŠธ๋ฆญ์„ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋Œ“๊ฐ’์„ ๊ตฌํ•  ๋•Œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ƒ์ˆ˜๋งŒ ๋‹ค์‹œ ๋ถ™์—ฌ์ฃผ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค.

$$ Var[X] = \sigma^2 $$

์—ฌ๊ธฐ์— ๋” ๋‚˜์•„๊ฐ€, ํ‘œ์ค€ํŽธ์ฐจ๋Š” ๋ถ„์‚ฐ์˜ ์–‘์˜ ์ œ๊ณฑ๊ทผ์œผ๋กœ ์ •์˜๋˜๋ฏ€๋กœ $\sigma$๊ฐ€ ๋ฐ”๋กœ ํ‘œ์ค€ํŽธ์ฐจ๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ย 


๋งˆ์น˜๋ฉฐ

์ด๋ฒˆ ๊ธ€์—์„œ๋Š” 1์ฐจ์› ๋‹จ์ผ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๊ฐ„๋‹จํ•œ ๊ฐ€์šฐ์Šค ์ ๋ถ„์„ ์•Œ์•„๋ณด๊ณ  ์ด์— ๋Œ€ํ•œ ํ™œ์šฉ์œผ๋กœ ๋‹จ๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ๊ธฐ๋ณธ์ ์ธ ์„ฑ์งˆ์„ ์•Œ์•„๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์—๋Š” ์ด๋ฅผ ๋‹ค์ฐจ์›์œผ๋กœ ํ™•์žฅํ•˜์—ฌ ํ–‰๋ ฌ ๊ผด๋กœ ํ‘œํ˜„๋˜๋Š” ๊ฐ€์šฐ์Šค ์ ๋ถ„๊ณผ ๋‹ค๋ณ€์ˆ˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ย 


์ฐธ๊ณ ๋ฌธํ—Œ

  • Russell L. Herman, An introduction to Mathematical physics via oscillations, 2012
  • Massimiliano Bonamente, Statistics and Analysis of Scientific Data, Springer, 2017
]]>