index.html

<!doctype html>
<html lang="en-GB">
<head>
	<meta charset="utf-8" />
	<title>Coding theory demo</title>
	<meta name="author" content="David Robertson">
	<meta name="description" content="A toy application which visualises the effect of an error-correcting code.">
	<meta name="viewport" content="width=device-width,initial-scale=1">
	<script defer src="demo.js"></script>
	<link rel="stylesheet" type="text/css" href="demo.css" />
	<link rel="stylesheet" type="text/css" href="mobile_portrait.css" media="(max-width: 600px)" />
	<link rel="stylesheet" type="text/css" href="mobile_landscape.css" media="(min-width: 601px) and (max-height: 500px)" />
	<link rel="stylesheet" type="text/css" href="tablet_portrait.css" media="(min-width: 601px) and (max-width: 850px) and (min-height: 501px)" />
	<link rel="stylesheet" type="text/css" href="desktop.css" media="(min-width: 851px) and (min-height: 501px)" />
	<!-- Wish the browsers would agree on this -->
	<link rel="manifest" href="manifest.json">
	<meta name="theme-color" content="#5f889b">
	<link rel="icon" href="icons/64.png" type="image/png">
	<link rel="apple-touch-icon" href="icons/256.png" type="image/png">
	<link rel="apple-touch-icon-startup" href="icons/256.png" type="image/png">
	<meta name="apple-mobile-web-app-capable" content="yes">
</head>
<body>
	<main id="canvases">
		<label id="Alice" for="get_image">
			<canvas id="sent" width="0" height="0"></canvas>
			<input id="get_image" type="file" accept="image/*">
			<p><strong>Click <span class="desktop">or drag</span> to choose a file</strong><span class="mobile">, then scroll</span>.</p>
		</label>
		<section id="Bob">
			<aside>
				<label for="accuracy_without_code" title="If we just transmitted each pixel without using a code, how many would be correctly transmitted?
			
			Each bit independently incurs an error with probability $p$, so the total number of errors $X$ in a 24-bit pixel string has a Bin(24, $p$) distribution. We get a pixel error whenever $X$ > 0, which occurs with probability 1 - Pr($X$ = 0) = 1 - (1-$p$)^(24).">
				Uncoded accuracy:</label> <output id="accuracy_without_code"></output>
			</aside>
			<canvas id="naive_transmission"></canvas>
			<canvas id="naive_transmission_diff" class="diff"></canvas>
			<p id="Bob_caption" title="If Bob doesn't correct his recieved messages, he may as well just ask Alice to send her image without encoding. The simulated result of doing so is displayed here. Hover over the canvas to see the difference between Alice's source image and Bob's (uncoded) received image.">Transmission <strong>without</strong> a code</p>
		</section>
		<section id="Bob_decoded">
			<aside>
				<label for="accuracy_with_code" title="Using the code and including Bob's error correction attempt, how many pixels were correctly transmitted?">
				Coded accuracy:</label> <output id="accuracy_with_code"></output><br />
			</aside>
			<canvas id="decoded"></canvas>
			<canvas id="decoded_diff" class="diff"></canvas>
			<p id="Bob_decoded_caption" title="The simulated result of Bob decoding Alice's image is displayed here. Hover over the canvas to see the difference between Alice's source image and Bob's (coded) received image.">After correction <strong>using</strong> a code</p>
		</section>
	</main>
	<footer>
		<fieldset id="channel">
			<h3><label for="error_probability" title="A binary symmetric channel (BSC) is a simple model of a communications channel. The BSC can transmit only two symbols, conventionally chosen to be 0 and 1. Whenever the sender (Alice) sends $n$ symbols in a row, the receiver (Bob) gets $n$ symbols back.
			
			Unfortunately, this channel is not perfect. Each symbol Alice sends may incur a random error and be received by Bob as the other symbol. The probability of an error occurring for a single symbol is some constant 0 ≤ $p$ ≤ 1, though we usually assume $p$ < 1/2. We further assume that error probabilities for different symbols are independent.
			
			This slider controls the value of $p$ in the definition of the binary symmetric channel.">Bit error rate</label>: <var>p</var> = <output for="error_probability"></output></h3>
			<input id="error_probability" type="range" list="tickmarks" min="0" max="1" step="any" value="0.4641588833612779">
			<datalist id="tickmarks">
				<option data-target-value="0">0%</option>
				<option data-target-value="0.01">1%</option>
				<option data-target-value="0.1">10%</option>
				<option data-target-value="0.25">25%</option>
				<option data-target-value="0.5">50%</option>
				<option data-target-value="0.75">75%</option>
				<option data-target-value="1">100%</option>
			</datalist>		
			<output id="uncoded_bit_error_distribution">
				<header>
					<label for="uncoded_bit_error_distribution" title="Without using a code, pixels are represented with a 24-bit binary string. Maybe no bits incur errors; maybe only two do; maybe 23 of our 24 total bits incur errors! It all depends on the channel error rate. The distribution of bit errors is shown below.">Bit errors per pixel (uncoded)</label>
					<br />
					mean: <output id="uncoded_bit_error_distribution_mean"></output>,
					mode: <output id="uncoded_bit_error_distribution_mode"></output><br />
					Probability of ≥ 1 error: <output id="uncoded_bit_error_distribution_error_probability"></output>
				</header>
			</output>
		</fieldset>
		
		<fieldset id="code_choice">
			<h3><label for="code" title="Let $A$ be a finite set called the alphabet. A code $C$ is a collection of strings over $A$, whose elements are called codewords.
			
			If Alice wants to send a message to Bob over an error-prone channel, she can improve her chances of communicating successfully by only agreeing to send pre-chosen messages: codewords belonging to a certain code. If Bob received something that wasn't a codeword, he knows there's been an error. If the code is carefully constructed, given a corrupted message, Bob can try to recover Alice's original message.
			
			Usually Alice has a collection of small strings—message units—that she turns into a longer string by a process called encoding. Bob is responsible for correcting any suspected errors; then using decoding to retrieve his best guess of Alice's original message.">Code</label>:
				<select id="code">
					<option value="rep2x"    >twofold repetition</option>
					<option value="rep3x"    >threefold repetition</option>
					<option value="rep4x"    >fourfold repetition</option>
					<option value="Ham(3)"   >Hamming [7, 4]</option>
					<option value="Ham+(3)"  >Extended Hamming [7, 4]</option>
					<option value="Golay"    >binary Golay</option>
					<!--
					<option label="extended binary Golay"   value="Golay+2x" >
					<option label="parity-check extension" value="check1"   >
					-->
				</select>
			</h3>
			<label for="dimension" title="If our code's alphabet is equipped with addition (e.g. the integers modulo a prime <var>p</var>), then we can add two words by adding the first digits together, the second digits together, and so on. Then we use arrange the digit sums in a row to form a new word.
			
			If our code's alphabet is equipped with (invertible) multiplication, then we can multiply a word by a digit $λ$ by multiplying the first digit by $λ$, the second digit by $λ$, and so on. Then we use arrange the digit products in a row to form a new word, which is called a scalar multiple of the original.
			
			A code is called linear if any two codewords' sum is a codeword, and if any scalar multiple of a codeword is a codeword. Such a code can be seen as finite-dimensional vector space over a finite field, allowing us to take advantage of the theory of linear algebra. This means we can choose a basis for our code, and measure the codes' size by finding its dimension (the size of any chosen basis). All of the codes we demonstrate here are linear.">Dimension</label>: <var>k</var> = <output id="dimension"></output><br />
			
			<label for="units_per_pixel" title="In many image formats, each pixel is represented as a triple of numbers ($r$, $g$, $b$), whose entries stand for red, green and blue respectively. Higher values correspond to more intense colours. Each entry is typically an integer between 0 and 255, which we may format as an eight-digit binary string (since 256 is 2 to the power 8). Combining the three colour channels together, our pixel is described a 24-digit binary string. This suggests we want to use a 24-dimensional code to transmit our image, one pixel at a time. But we might not want to use such a relatively high dimension!
			
			To get around this, let $k$ be a divisor of 24. We chop a 24-bit pixel string into $r$ = 24/$k$ pieces of length $k$; then we transmit each piece using a suitable [$n$, $k$]-code. Bob receives $r$ strings of length $n$, which decode to $r$ strings of length $k$. Assembling these last strings together, Bob receives a length $r$$k$ = 24 string.
			
			This field shows the of value $r$.">Message units per pixel</label>: <output id="units_per_pixel"></output><br />
			
			<label for="word_length" title="We only consider a code C whose codewords all have the same length n. The literature calls these block codes, but we just called them 'codes'. Not all codes are block codes: Morse code is an example of a famous non-block code.">Encoded unit word length</label>: <var>n</var> = <output id="word_length"></output><br />
			
			<label for="information_rate" title="The information rate measures how efficient it is to transmit a message using a given code. If we're thinking of a linear [$n$, $k$] code, this is the ratio $k$/$n$.
			
			The idea is that for every $k$-bit string of information Alice wants to send, she has to also transmit an extra $n$ - $k$ bits of extra information to facilitate error correction and detection. So only $k$ out of every $n$ bits transmitted are new information.">Information rate</label>: <output id="information_rate"></output><br />
			
			<label for="minimum_distance" title="The Hamming distance between two strings of length $n$ is the number of positions where the strings have different digits. For instance, the strings 0110 and 1100 differ in their first and third columns: thus their Hamming distance is 2.
			
			The minimum distance $d$ of a code is the smallest Hamming distance between any two distinct codewords. In other words, any different words must be different in at least $d$ places. The value of $d$ is cruicial for error correction. This is because the larger the value of $d$, the harder it is for random noise to turn one codeword into another.">Minimum distance</label>: <var>d</var> = <output id="minimum_distance"></output><br />
			
			<label for="detectable_errors" title="Suppose that Alice transmits an encoded message unit $a$ which Bob recieves as $b$. If $b$ ≠ $a$ then an error has occurred during transmission. We say that the size of this error is the number of columns where $a$ and $b$ have different digits (i.e. the Hamming distance between $a$ and $b$). Bob will only be able to detect this error if $b$ is not a codeword.
			
			If it takes at least a size $d$ error to turn Alice's codeword $a$ into another codeword $b$, then Bob is guaranteed to spot any error of size $s$ = $d$ - 1 or less. Depending on the circumstances, he might be able to spot some larger errors too, depending on the code and messages sent and received.">Detectable errors</label>: <var>s</var> = <output id="detectable_errors"></output><br />
			
			<label for="correctable_errors" title="Correction is a little harder for Bob. He needs a strategy to infer the most likely codeword $a$ that Alice sent, upon receiving the word $b$. To keep things simple, it'd be very helpful if there wasn't a tie between two equally likely messages $a$₁ and $a$₂ from Alice; otherwise Bob's correction wouldn't be unique.
			
			If the closest codewords are $d$ units (errors) apart, Bob is guaranteed to be able to uniquely correct any error of size $t$ = floor([$d$ - 1]/2) or less. He may be able to correct certain larger errors too, depending on the code and the messages being exchanged.">Correctable errors</label>: <var>t</var> = <output id="correctable_errors"></output>
		</fieldset>
		<fieldset id="simulation_stats">
			<h3>Transmission statistics</h3>
			<label for="encoded_pixel_accuracy" title="We break a pixel into message units (see middle column), then encode each unit to a longer string and transmit. After doing so, if any of these units occur an error then the pixel incurs an error.
			
			What proportion of pixels are accurately transmitted? The total number of bits transmitted is 24$r$, where $r$ is the number of message units per pixel. No bits incur an error with probability (1-$p$)^($r$$n$). Since $r$ = 1 only if we use the full code (with no error correction), we actually get more pixel errors by encoding!">
			Encoded pixels accuracy</label>: <output id="encoded_pixel_accuracy"></output><br />
			
			<label for="error_detection_rate" title="If Alice's message unit encodes as $a$ and is received in error as $b \neq a$, we detect the error if and only if $b$ is not a codeword. A pixel incurs an error when one of its message units incurs an error.
			
			Out of the errors that occur to encoded pixels, how many of them did the code detect?">
			Errors detected</label>: <output id="error_detection_rate"></output><br />
			
			<label for="correct_correction_rate" title="Of the errors we detected, how many of our attempts to correct recovered the original codeword?">
			Correct corrections</label>: <output id="correct_correction_rate"></output><br />
			
			<h3 id="options">Options</h3>
			<input id="smooth_scaling" type="checkbox"> <label for="smooth_scaling">Smooth image scaling  </label> <br />
			<span id="about_anchor" title="The idea came to me from a figure in MacKay's book on information theory. Click this link to show a bibliographic reference and more details on the application itself.">About this application</span>
		</fieldset>
	</footer>
	<aside id="info" class="hidden">
		<main>
			<aside class="info_icon emoji">ℹ️</aside>
		</main>
		<section id="about">
			<aside class="info_icon emoji">ℹ️</aside>
			<p>This is a small Javascript toy to illustrate the idea behind (<a href="https://en.wikipedia.org/wiki/Error_detection_and_correction">channel</a>) <a href="https://en.wikipedia.org/wiki/Coding_theory">coding theory</a>. Users can submit an image, and see the effect on the transmitted image of varying the channel error rate <var>p</var> on the image. Everything is computed client side. I've only had the time to test this in Chrome 63, but I think recent versions of Firefox and Safari should be okay too. The source code is <a href="https://github.com/DMRobertson/coding-demo">on GitHub</a>.</p>
			
			<p>I took the idea from MacKay's book</p>
				<blockquote>
					MacKay, David J.C. <em><a href="http://www.cambridge.org/gb/academic/subjects/computer-science/pattern-recognition-and-machine-learning/information-theory-inference-and-learning-algorithms?format=HB&isbn=9780521642989#u22fCX0eDcCDOZTm.97">Information Theory, Inference, and Learning Algorithms</a>.</em> Cambridge University Press, <time datetime="2003">2003</time>. xii+628&nbsp;pp. ISBN: <a href="https://isbnsearch.org/isbn/9780521642989">978-0-521-64298-9</a>. MR: <a href="https://mathscinet.ams.org/mathscinet-getitem?mr=2012999">2012999</a>. <a href="http://www.inference.org.uk/itila/">Available online</a>.
					</blockquote>
			<p>in which (e.g. Figures 1.5, 47.5) he visualises the effect of a noisy channel by randomly adjusting bits in a 100x100 1-bit image of a panel from a Dilbert comic.</p>
		</section>
		<button id="close" class="emoji">❌</button>
	</aside>
</body>
</html>