You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

getting-started-gym.ipynb 8.2KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Unity ML-Agents Toolkit\n",
  8. "## Gym Wrapper Basics\n",
  9. "This notebook contains a walkthrough of the basic functions of the Python Gym Wrapper for the Unity ML-Agents toolkit. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md)."
  10. ]
  11. },
  12. {
  13. "cell_type": "markdown",
  14. "metadata": {},
  15. "source": [
  16. "## Single-Agent Environments\n",
  17. "\n",
  18. "The first five steps show how to use the `UnityEnv` wrapper with single-agent environments. See below step five for how to use with multi-agent environments."
  19. ]
  20. },
  21. {
  22. "cell_type": "markdown",
  23. "metadata": {},
  24. "source": [
  25. "### 1. Load dependencies\n",
  26. "\n",
  27. "The following loads the necessary dependencies and checks the Python version (at runtime). ML-Agents Toolkit (v0.3 onwards) requires Python 3."
  28. ]
  29. },
  30. {
  31. "cell_type": "code",
  32. "execution_count": null,
  33. "metadata": {},
  34. "outputs": [],
  35. "source": [
  36. "import matplotlib.pyplot as plt\n",
  37. "import numpy as np\n",
  38. "import sys\n",
  39. "\n",
  40. "from gym_unity.envs import UnityEnv\n",
  41. "\n",
  42. "%matplotlib inline\n",
  43. "\n",
  44. "print(\"Python version:\")\n",
  45. "print(sys.version)\n",
  46. "\n",
  47. "# check Python version\n",
  48. "if (sys.version_info[0] < 3):\n",
  49. " raise Exception(\"ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3\")"
  50. ]
  51. },
  52. {
  53. "cell_type": "markdown",
  54. "metadata": {},
  55. "source": [
  56. "### 2. Start the environment\n",
  57. "`UnityEnv` launches and begins communication with the environment when instantiated. We will be using the `GridWorld` environment. You will need to create an `envs` directory within the `/python` subfolder of the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md)."
  58. ]
  59. },
  60. {
  61. "cell_type": "code",
  62. "execution_count": null,
  63. "metadata": {},
  64. "outputs": [],
  65. "source": [
  66. "env_name = \"../envs/GridWorld\" # Name of the Unity environment binary to launch\n",
  67. "env = UnityEnv(env_name, worker_id=0, use_visual=True)\n",
  68. "\n",
  69. "# Examine environment parameters\n",
  70. "print(str(env))"
  71. ]
  72. },
  73. {
  74. "cell_type": "markdown",
  75. "metadata": {},
  76. "source": [
  77. "### 3. Examine the observation and state spaces\n",
  78. "We can reset the environment to be provided with an initial observation of the environment."
  79. ]
  80. },
  81. {
  82. "cell_type": "code",
  83. "execution_count": null,
  84. "metadata": {},
  85. "outputs": [],
  86. "source": [
  87. "# Reset the environment\n",
  88. "initial_observation = env.reset()\n",
  89. "\n",
  90. "if len(env.observation_space.shape) == 1:\n",
  91. " # Examine the initial vector observation\n",
  92. " print(\"Agent state looks like: \\n{}\".format(initial_observation))\n",
  93. "else:\n",
  94. " # Examine the initial visual observation\n",
  95. " print(\"Agent observations look like:\")\n",
  96. " if env.observation_space.shape[2] == 3:\n",
  97. " plt.imshow(initial_observation[:,:,:])\n",
  98. " else:\n",
  99. " plt.imshow(initial_observation[:,:,0])"
  100. ]
  101. },
  102. {
  103. "cell_type": "markdown",
  104. "metadata": {},
  105. "source": [
  106. "### 4. Take random actions in the environment\n",
  107. "Once we restart an environment, we can step the environment forward and provide actions to all of the agents within the environment. Here we simply choose random actions using the `env.action_space.sample()` function.\n",
  108. "\n",
  109. "Once this cell is executed, 10 messages will be printed that detail how much reward will be accumulated for the next 10 episodes. The Unity environment will then pause, waiting for further signals telling it what to do next. Thus, not seeing any animation is expected when running this cell."
  110. ]
  111. },
  112. {
  113. "cell_type": "code",
  114. "execution_count": null,
  115. "metadata": {},
  116. "outputs": [],
  117. "source": [
  118. "for episode in range(10):\n",
  119. " initial_observation = env.reset()\n",
  120. " done = False\n",
  121. " episode_rewards = 0\n",
  122. " while not done:\n",
  123. " observation, reward, done, info = env.step(env.action_space.sample())\n",
  124. " episode_rewards += reward\n",
  125. " print(\"Total reward this episode: {}\".format(episode_rewards))"
  126. ]
  127. },
  128. {
  129. "cell_type": "markdown",
  130. "metadata": {},
  131. "source": [
  132. "### 5. Close the environment when finished\n",
  133. "When we are finished using an environment, we can close it with the function below."
  134. ]
  135. },
  136. {
  137. "cell_type": "code",
  138. "execution_count": null,
  139. "metadata": {},
  140. "outputs": [],
  141. "source": [
  142. "env.close()"
  143. ]
  144. },
  145. {
  146. "cell_type": "markdown",
  147. "metadata": {},
  148. "source": [
  149. "## Multi-Agent Environments\n",
  150. "\n",
  151. "It is also possible to use the gym wrapper with multi-agent environments. For these environments, observations, rewards, and done flags will be provided in a list. Likewise, the environment will expect a list of actions when calling `step(action)`."
  152. ]
  153. },
  154. {
  155. "cell_type": "markdown",
  156. "metadata": {},
  157. "source": [
  158. "### 1. Start the environment\n",
  159. "\n",
  160. "We will use the `3DBall` environment for this walkthrough. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). We will launch it from the `python/envs` sub-directory of the repo. Please create an `envs` folder if one does not already exist."
  161. ]
  162. },
  163. {
  164. "cell_type": "code",
  165. "execution_count": null,
  166. "metadata": {},
  167. "outputs": [],
  168. "source": [
  169. "# Name of the Unity environment binary to launch\n",
  170. "multi_env_name = \"../envs/3DBall\" \n",
  171. "multi_env = UnityEnv(multi_env_name, worker_id=1, \n",
  172. " use_visual=False, multiagent=True)\n",
  173. "\n",
  174. "# Examine environment parameters\n",
  175. "print(str(multi_env))"
  176. ]
  177. },
  178. {
  179. "cell_type": "markdown",
  180. "metadata": {},
  181. "source": [
  182. "### 2. Examine the observation space "
  183. ]
  184. },
  185. {
  186. "cell_type": "code",
  187. "execution_count": null,
  188. "metadata": {},
  189. "outputs": [],
  190. "source": [
  191. "# Reset the environment\n",
  192. "initial_observations = multi_env.reset()\n",
  193. "\n",
  194. "if len(multi_env.observation_space.shape) == 1:\n",
  195. " # Examine the initial vector observation\n",
  196. " print(\"Agent observations look like: \\n{}\".format(initial_observations[0]))\n",
  197. "else:\n",
  198. " # Examine the initial visual observation\n",
  199. " print(\"Agent observations look like:\")\n",
  200. " if multi_env.observation_space.shape[2] == 3:\n",
  201. " plt.imshow(initial_observations[0][:,:,:])\n",
  202. " else:\n",
  203. " plt.imshow(initial_observations[0][:,:,0])"
  204. ]
  205. },
  206. {
  207. "cell_type": "markdown",
  208. "metadata": {},
  209. "source": [
  210. "### 3. Take random steps in the environment"
  211. ]
  212. },
  213. {
  214. "cell_type": "code",
  215. "execution_count": null,
  216. "metadata": {},
  217. "outputs": [],
  218. "source": [
  219. "for episode in range(10):\n",
  220. " initial_observation = multi_env.reset()\n",
  221. " done = False\n",
  222. " episode_rewards = 0\n",
  223. " while not done:\n",
  224. " actions = [multi_env.action_space.sample() for agent in range(multi_env.number_agents)]\n",
  225. " observations, rewards, dones, info = multi_env.step(actions)\n",
  226. " episode_rewards += np.mean(rewards)\n",
  227. " done = dones[0]\n",
  228. " print(\"Total reward this episode: {}\".format(episode_rewards))"
  229. ]
  230. },
  231. {
  232. "cell_type": "markdown",
  233. "metadata": {},
  234. "source": [
  235. "### 4. Close the environment"
  236. ]
  237. },
  238. {
  239. "cell_type": "code",
  240. "execution_count": null,
  241. "metadata": {},
  242. "outputs": [],
  243. "source": [
  244. "multi_env.close()"
  245. ]
  246. }
  247. ],
  248. "metadata": {
  249. "anaconda-cloud": {},
  250. "kernelspec": {
  251. "display_name": "Python 3",
  252. "language": "python",
  253. "name": "python3"
  254. },
  255. "language_info": {
  256. "codemirror_mode": {
  257. "name": "ipython",
  258. "version": 3
  259. },
  260. "file_extension": ".py",
  261. "mimetype": "text/x-python",
  262. "name": "python",
  263. "nbconvert_exporter": "python",
  264. "pygments_lexer": "ipython3",
  265. "version": "3.6.6"
  266. }
  267. },
  268. "nbformat": 4,
  269. "nbformat_minor": 2
  270. }