diff --git a/_data/navigation.yml b/_data/navigation.yml index a380856f..7abe1b7f 100644 --- a/_data/navigation.yml +++ b/_data/navigation.yml @@ -154,6 +154,10 @@ wiki: url: /wiki/machine-learning/python-libraries-for-reinforcement-learning/ - title: Reinforcement Learning url: /wiki/machine-learning/intro-to-rl + - title: Generative modeling + url: /wiki/machine-learning/generative-modeling + - title: Offline reinforcement learning + url: /wiki/machine-learning/offline-rl - title: YOLO Integration with ROS and Running with CUDA GPU url: /wiki/sensing/ros-yolo-gpu/ - title: YOLOv5 Training and Deployment on NVIDIA Jetson Platforms diff --git a/wiki/machine-learning/assets/offline_rl_image-1.png b/wiki/machine-learning/assets/offline_rl_image-1.png new file mode 100644 index 00000000..3d60f0a1 Binary files /dev/null and b/wiki/machine-learning/assets/offline_rl_image-1.png differ diff --git a/wiki/machine-learning/assets/offline_rl_image-2.png b/wiki/machine-learning/assets/offline_rl_image-2.png new file mode 100644 index 00000000..e12bc0be Binary files /dev/null and b/wiki/machine-learning/assets/offline_rl_image-2.png differ diff --git a/wiki/machine-learning/assets/offline_rl_image.png b/wiki/machine-learning/assets/offline_rl_image.png new file mode 100644 index 00000000..3dfda73a Binary files /dev/null and b/wiki/machine-learning/assets/offline_rl_image.png differ diff --git a/wiki/machine-learning/generative-modeling.md b/wiki/machine-learning/generative-modeling.md new file mode 100644 index 00000000..29de1bf3 --- /dev/null +++ b/wiki/machine-learning/generative-modeling.md @@ -0,0 +1,123 @@ +--- +# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be +# overwritten except in special circumstances. +# You should set the date the article was last updated like this: +date: 2024-12-01 # YYYY-MM-DD +# This will be displayed at the bottom of the article +# You should set the article's title: +title: Generative modeling +# The 'title' is automatically displayed at the top of the page +# and used in other parts of the site. + +--- +This blog is supposed to be a junction that connects some of the important concepts in generative modeling. It provides high-level information about generative AI and its importance, popular methods, and key equations. Readers can find more detailed information in the references provided. + +## Introduction +In recent years, generative models have taken the machine learning world by storm, revolutionizing our ability to create and manipulate data across various domains. This blog post will explore the fascinating world of generative modeling, from its fundamental concepts to cutting-edge applications. +### Introduction to Generative Modeling +Generative modeling is a subfield of machine learning focused on creating new data samples that mimic the characteristics of a given dataset. Unlike discriminative models, which predict labels or outcomes (e.g., p(y|x)), generative models learn the underlying distribution p(x) or joint distribution p(x,y). This enables them to generate novel samples that resemble real data. + +The goal of generative models is to approximate these complex, high-dimensional data distributions. For instance, if we represent the data distribution as p(x), a generative model attempts to learn this function such that it can generate $\bar{x} \sim p(x)$, where $\bar{x}$ is a new, generated sample. Recent advances in deep learning have significantly improved the ability of these models to generate realistic images, coherent text, and more. + +#### Fundamental Methods in Generative Modeling +##### Variational Autoencoders (VAEs) +VAEs are probabilistic models that encode data into a latent space (z) and then decode it back to reconstruct the original data. The generative process assumes: + +$$\begin{aligned} +p(x) &= \int p(x | z) p(z) \, dz +\end{aligned}$$ + +where $p(z)$ is the prior distribution (often a Gaussian), and $p(x|z)$ is the likelihood. The VAE optimizes a lower bound on the data log-likelihood, known as the Evidence Lower Bound (ELBO): + +$$\begin{aligned} +\mathcal{L}_{\text{ELBO}} &= \mathbb{E}_{q_\phi(z | x)}[\log p_\theta(x | z)] - D_{\text{KL}}(q_\phi(z | x) \| p(z)) +\end{aligned}$$ + +The first term encourages the decoder $p_\theta(x|z)$ to reconstruct $x$ from $z$, while the second term ensures that the approximate posterior $q_\phi(z|x)$ remains close to the prior $p(z)$. This probabilistic framework allows VAEs to generate new data by sampling $z \sim p(z)$ and passing it through the decoder. + + +##### Generative Adversarial Networks (GANs) +GANs consist of two neural networks: a generator $G(z)$ and a discriminator $D(x)$. The generator learns to map random noise $z \sim p_z$ to data $x$, while the discriminator attempts to distinguish between real data $x \sim p_{\text{data}}$ and generated data $\tilde{x} = G(z)$. + +$$\begin{aligned} +\mathcal{L}_{\text{GAN}} = \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log (1 - D(G(z)))]. +\end{aligned}$$ +The generator aims to minimize $\mathcal{L}_{\text{GAN}}$, while the discriminator maximizes it: +$$\begin{aligned} +G^* = \arg \min_G \max_D \mathcal{L}_{\text{GAN}}. +\end{aligned}$$ +This adversarial process ensures the generator produces increasingly realistic samples, making GANs highly effective for generating images and videos. + +##### Flow-based Models +Flow-based models transform data $x$ into a simpler latent representation $z$ through an invertible transformation $f$. This allows the likelihood $p(x)$ to be computed exactly: +$$\begin{aligned} +z \sim \pi(z), x = f(z), z = f^{-1}(x) \\ +p(x) = \pi(z) \left| \det \frac{\partial z}{\partial x} \right| = \pi(f^{-1}(x)) \left| \det \frac{\partial f^{-1}}{\partial x} \right| +% p(x) = \pi(z) \left| \det \frac{\partial f}{\partial u} \right|^{-1}, +\end{aligned}$$ +where $z = f^{-1}(x)$. + +By designing $f$ as a series of bijective layers, flow-based models enable efficient sampling and exact density estimation. Architectures like RealNVP and Glow use this property to generate high-quality images. + + +##### Diffusion Models +Diffusion models generate data by learning to reverse a process that gradually adds noise to data. Starting from noise $x_T$, they iteratively denoise to produce a coherent sample $x_0$. + +The forward process is defined as: +$$\begin{aligned} +q(x_t | x_{t-1}) = \mathcal{N}(x_t | \sqrt{\alpha_t} x_{t-1}, \sqrt{1 - \alpha_t} I), +\end{aligned}$$ +where $\alpha_t$ controls the noise schedule. The reverse process is modeled as: +$$\begin{aligned} +p(x_{0:T}) = p(X_T)\prod_{t=1}^T p(x_{t-1} | x_t) \\ +p(x_{t-1} | x_t) = \mathcal{N}(x_{t-1} | \mu_\theta(x_t, t), \Sigma_\theta(x_t, t)) +\end{aligned}$$ +Diffusion models optimize the following loss: +$$\begin{aligned} +\mathcal{L}_{\text{diffusion}} = \mathbb{E}_{q(x_t | x_{t-1})} \left[ ||x_t - \mu_\theta(x_{t-1}, t)||^2 \right] +\end{aligned}$$ +These models have shown remarkable success in tasks like image generation (e.g., DALL-E 2 and Stable Diffusion). + +##### Autoregressive Generation +Autoregressive models generate data sequentially, predicting each element conditioned on previous ones. For a sequence $x = (x_1, x_2, \dots, x_T)$, the joint probability is factorized as: +$$\begin{aligned} +p(x) = \prod_{t=1}^T p(x_t | x_1, \dots, x_{t-1}). +\end{aligned}$$ +###### Examples: +**PixelCNN** for images, where each pixel is generated conditioned on previously generated pixels. +**GPT** for text, where each token is generated based on preceding tokens. +These models excel in generating coherent sequences, particularly in text and music. + +#### Applications of Generative Models +Generative models have found applications across various domains: +- **Text Generation**: From language translation to chatbots, generative models are powering advanced natural language processing systems. +- **Image Generation and Manipulation**: Models like DALL-E and Midjourney can create photorealistic images from text descriptions, while others enable sophisticated image editing and style transfer. +- **Video Generation**: Recent advancements allow for the creation of short video clips from text prompts or the manipulation of existing videos. +- **Audio Generation**: From text-to-speech systems to music composition, generative models are pushing the boundaries of audio synthesis. + +#### Foundation Models +Foundation models represent a paradigm shift in AI, where large-scale models trained on vast amounts of data serve as a basis for a wide range of downstream tasks. These models, such as BERT and GPT-3, have demonstrated remarkable zero-shot and few-shot learning capabilities. +In robotics, foundation models are being explored for tasks such as visual understanding, task planning, and natural language instruction following. They hold the potential to bridge the gap between perception and action in embodied AI systems. + +#### Conclusion +Generative models have come a long way in recent years, pushing the boundaries of what's possible in artificial intelligence. As research continues to advance, we can expect even more impressive applications and capabilities to emerge, reshaping industries and opening new frontiers in AI-driven creativity and problem-solving. + +### References +##### VAEs +- Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013). +##### GANs +- Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014. +##### Flow-based Models +- Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. "Density estimation using real nvp." arXiv preprint arXiv:1605.08803 (2016). +- Rezende, Danilo Jimenez, and Shakir Mohamed. "Variational inference with normalizing flows." International Conference on Machine Learning (ICML). 2015. +##### Diffusion Models +- Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." arXiv preprint arXiv:2006.11239 (2020). +##### Autoregressive Generation +- van den Oord, Aaron, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." International Conference on Machine Learning (ICML). 2016. +- Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. "Improving language understanding by generative pre-training." OpenAI preprint (2018). (For GPT and autoregressive text models.) + +### Related +- [Lilian Weng's blog on VAEs](https://lilianweng.github.io/posts/2018-08-12-vae/) +- [Lilian Weng's blog on Flow-based models](https://lilianweng.github.io/posts/2018-10-13-flow-models/) +- [Lilian Weng's blog on GANs](https://lilianweng.github.io/posts/2017-08-20-gan/) +- [Lilian Weng's blog on Diffusion models](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/) diff --git a/wiki/machine-learning/offline-rl.md b/wiki/machine-learning/offline-rl.md new file mode 100644 index 00000000..ef4fec2f --- /dev/null +++ b/wiki/machine-learning/offline-rl.md @@ -0,0 +1,60 @@ +--- +# Jekyll 'Front Matter' goes here. Most are set by default, and should NOT be +# overwritten except in special circumstances. +# You should set the date the article was last updated like this: +date: 2024-04-30 # YYYY-MM-DD +# This will be displayed at the bottom of the article +# You should set the article's title: +title: Offline reinforcement learning +# The 'title' is automatically displayed at the top of the page +# and used in other parts of the site. + +--- +This blog is supposed to be a junction that connects some of the important concepts in offline reinforcement learning (RL). It provides high-level information about offline RL and its importance, limitations of online learning algorithms, and popular methods in offline RL and readers can find more detailed information in the references provided. +### Introduction +Offline RL refers to a learning paradigm where an agent learns from a fixed dataset of pre-collected experiences, without the need for real-time interaction with the environment. Unlike online RL, the agent's policy is trained solely on historical data, making it more sample-efficient and suitable for scenarios where data collection is expensive or time-consuming. Offline RL methods aim to optimize the policy using the given dataset, addressing challenges like sample efficiency and stability in learning. This approach has practical applications in fields such as robotics, finance, and healthcare. + +#### Reinforcement learning +Classically refers to a machine learning paradigm where an agent learns to make sequential decisions by interacting with an environment. The agent receives feedback in the form of rewards based on its actions, allowing it to learn an optimal policy that maximizes long-term rewards. +![alt text](assets/offline_rl_image-2.png) +Figure: Shows the interaction between the agent and the environment and how the agent only learns for the latest data in traditional online RL setup. + +#### Why offline setting is important? +Online RL algorithms heavily rely on interacting with the environment and cannot effectively utilize existing data. For many real-world applications, collecting data can be expensive (e.g., robotics), time-consuming, or even dangerous (e.g., healtcare). Besides, there are problem settings with massive amounts of existing data, that could be leveraged. Some example applications include [computational design](https://github.com/brandontrabucco/design-bench), [chip design](https://research.google/blog/offline-optimization-for-architecting-hardware-accelerators/), and autonomous vehicles. +#### Limitations of online learning algorithms + +__On-policy__ algorithms like PPO, TRPO, and REINFORCE generally require real-time interaction with the environment to update the policy. Techniques like importance sampling [1] can be used to learn from a fixed dataset, but they are often unstable and inefficient in practice. +__Off-policy__ algorithms like DQN, DDPG, and SAC are designed to utilize a data buffer (known as replay buffer) of interactions. However, there are still limitations when it comes to only learning from a fixed dataset. One of the problem being, there is no possibility of improving exploration: exploration is outside the scope of the algorithm, so if the dataset does not contain transitions that illustrate high-reward regions of the state space, it may be impossible to discover those high-reward regions. Another problem is distributional shift: while a function approximator (policy, value function, or model) might be trained under one distribution, it will be evaluated on a different distribution, due both to the change in visited states for the new policy and, more subtly, by the act of maximizing the expected return. Once the policy enters one of out-of-distribution states, it will keep making mistakes and may remain out-of-distribution for the remainder of the trial. +![alt text](assets/offline_rl_image-1.png) +Figure: Shows the interaction between the agent and the environment and how the agent learns for a buffer of saved data in off-policy RL. + + +## Offline reinforcement learning +Offline RL involves training an agent using a __fixed dataset__ of historical experiences. The agent learns from this dataset without interacting with the environment in real-time, making it more sample-efficient and suitable for scenarios where data collection is expensive or impractical. + +![alt text](assets/offline_rl_image.png) +Figure: Shows the interaction between the agent and the environment and the learning process being isolated. The agent learns from a fixed dataset of historical experiences in offline RL. + +### Popular methods +#### Conservative Q-learning +One of the major issues is directly using off-policy value based methods like DQN, DDPG, and SAC is that they can overfit to the data and perform poorly on unseen states. Conservative Q-learning [2] is a simple yet effective method for offline RL that aims to learn a policy that maximizes the expected return while ensuring that the policy does not deviate significantly from the behavior policy that generated the dataset. This constraint helps prevent overfitting to the dataset and improves generalization to unseen states. + +#### Behavior regularized offline reinforcement learning +Behavior regularized offline RL [3] is another approach that aims to improve the stability and generalization of offline RL algorithms. This method introduces a behavior regularizer that encourages the learned policy to be close to the behavior policy that generated the dataset. By incorporating this regularizer into the optimization objective, the algorithm can learn a policy that performs well on unseen states while maintaining stability during training. + +#### Implicit Q-learning +Implicit Q-learning [4] is a recent method that aims to improve the sample efficiency and stability of offline RL algorithms. This approach leverages implicit models to learn a value function that is consistent with the dataset while avoiding overfitting. By using implicit models, the algorithm can learn a more robust value function that generalizes well to unseen states and improves performance on challenging tasks. + +### Open problems +#### Offline model-based RL +Model-based reinforcement learning (RL) holds promise for offline RL challenges, but it relies heavily on accurate uncertainty estimation for models to address distributional shift. Current methods often use bootstrap ensembles, yet they fall short of ideal performance. Furthermore, modeling certain Markov decision processes (MDPs), especially those with high-dimensional image observations and long horizons, remains a significant hurdle. Hybrid approaches, combining model-based and model-free learning, show potential in overcoming these obstacles. Still, the fundamental question persists: can model-based RL surpass model-free dynamic programming algorithms? Both aim to predict future outcomes, with model-free methods offering flexibility in predicting various quantities. In linear function approximation scenarios, model-based updates and value iteration updates yield identical results, but this equivalence isn't guaranteed in nonlinear cases. Thus, exploring the theoretical boundaries of offline model-based RL against dynamic programming methods remains an open challenge. + +### References +[1] Levine, Sergey, Aviral Kumar, George Tucker, and Justin Fu. "Offline reinforcement learning: Tutorial, review, and perspectives on open problems." arXiv preprint arXiv:2005.01643 (2020). +[2] Kumar, Aviral, Aurick Zhou, George Tucker, and Sergey Levine. "Conservative q-learning for offline reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 1179-1191. +[3] Wu, Yifan, George Tucker, and Ofir Nachum. "Behavior regularized offline reinforcement learning." arXiv preprint arXiv:1911.11361 (2019). +[4] Kostrikov, Ilya, Ashvin Nair, and Sergey Levine. "Offline reinforcement learning with implicit q-learning." arXiv preprint arXiv:2110.06169 (2021). + +### Related +- [Lilian Weng's blog on RL](https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html) +- [Introduction to RL](intro-to-rl.md)