Tsun-Hsuan (Johnson) Wang
tsunw_at_mit_dot_edu

| CV (Oct. 23) | Google Scholar |
| Github | Twitter |

Currently, I am a 4'rd-year PhD student at MIT CSAIL, under the supervision of Prof. Daniela Rus. I received my M.Sc. at National Tsing Hua University (NTHU) advised by Prof. Min Sun. I also got my B.Sc. at this fantastic place. During the last half year of my master study, I had a wonderful time working with Prof. Raquel Urtasun at Uber ATG. Prior to that, I was lucky to collaborate with Prof. Wei-Chen Chiu, Dr. Yi-Hsuan Tsai, and Prof. Hwann-Tzong Chen.

My research interest lies in the intersection of robotics, simulation, and machine learning. Particularly, I am fascinated with building the entire life cycle of robots; this spans from design, to data, to modeling, to learning, and finally to decision. Along this direction, my goal is to bridge higher level cognition to lower level physics, control, or even embodiment and morphology.

My chinese name is 尊玄, which pronouces just like Johnson

sym

MIT
PhD in EECS
Sep 20 - Present

sym

MIT-IBM
Research Intern
June 22 - Sept 22

sym

Uber ATG
Research Intern
Jun 19 - Mar 20

sym

NTHU
M.Sc. in EE
Sep 17 - Mar 20

sym

IIS, Academia Sinica
Research Intern
Jun 17 - Aug 17

sym

NTHU
B.Sc. in EE
Sep 13 - Jun 17

  News
  • [11/2023] We are organizing a workshop on the topic of "Towards Generalist Robots" at CoRL 2023 [white paper].
  • [09/2023] 2 papers accepted by NeurIPS'23; one (DiffuseBot) as oral and the other (Gigastep) at the dataset track!
  • [08/2023] 1 paper (interpretability by disentanglement) accepted by CoRL'23 as oral!
  • [06/2023] 2 papers (ML for soft robots, cooperative flight) accepted by IROS'23!
  • [05/2023] Our soft robot co-design work SoftZoo featured in MIT News!
  • [04/2023] 1 paper (Invariance-ODE) accepted by ICML'23!
  • [03/2023] 1 paper (Att-CLF) accepted by L4DC'23!
  • [01/2023] 2 papers (SoftZoo, LiquidS4) accepted by ICLR'23!
  • [12/2022] 1 paper (BarrierNet) accepted by T-RO!
  Publications
sym

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models
Tsun-Hsuan Wang, Alaa Maalouf, Wei Xiao, Yutong Ban, Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus
Under Submission,
CoRL OOD Workshop 2023, Atlanta,
NeurIPS Robot Learning Workshop 2023, New Orleans

webpage | abstract | bibtex | arxiv

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do so, we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training (data augmentation via text) and policy debugging. We encourage the reader to check our explainer video at this https URL and to view the code and demos on our project webpage at this https URL.

@article{wang2023drive,
  title={Drive Anywhere: Generalizable End-to-end 
         Autonomous Driving with Multi-modal 
         Foundation Models},
  author={Wang, Tsun-Hsuan and 
          Maalouf, Alaa and 
          Xiao, Wei and 
          Ban, Yutong and 
          Amini, Alexander and 
          Rosman, Guy and 
          Karaman, Sertac and 
          Rus, Daniela},
  journal={arXiv preprint arXiv:2310.17642},
  year={2023}
}
sym

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Yufei Wang*, Zhou Xian*, Feng Chen*, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan
Under Submission
(Powered by Genesis; stay tuned!!)

webpage | abstract | bibtex | arxiv | code

We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. RoboGen leverages the latest advancements in foundation and generative models. Instead of directly using or adapting these models to produce policies or low-level actions, we advocate for a generative scheme, which uses these models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision. Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent first proposes interesting tasks and skills to develop, and then generates corresponding simulation environments by populating pertinent objects and assets with proper spatial configurations. Afterwards, the agent decomposes the proposed high-level task into sub-tasks, selects the optimal learning approach (reinforcement learning, motion planning, or trajectory optimization), generates required training supervision, and then learns policies to acquire the proposed skill. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics. Our fully generative pipeline can be queried repeatedly, producing an endless stream of skill demonstrations associated with diverse tasks and environments.

@article{wang2023robogen,
  title={RoboGen: Towards Unleashing Infinite 
         Data for Automated Robot Learning 
         via Generative Simulation},
  author={Wang, Yufei and 
          Xian, Zhou and 
          Chen, Feng and 
          Wang, Tsun-Hsuan and 
          Wang, Yian and 
          Fragkiadaki, Katerina and 
          Erickson, Zackory and 
          Held, David and 
          Gan, Chuang},
  journal={arXiv preprint arXiv:2311.01455},
  year={2023}
}
sym

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
Wei Xiao, Tsun-Hsuan Wang, Chuang Gan, Daniela Rus
Under Submission

webpage | abstract | bibtex | arxiv

Diffusion model-based approaches have shown promise in data-driven planning. Although these planners are typically used in decision-critical applications, there are yet no known safety guarantees established for them. In this paper, we address this limitation by introducing SafeDiffuser, a method to equip probabilistic diffusion models with safety guarantees via control barrier functions. The key idea of our approach is to embed finite-time diffusion invariance, i.e., a form of specification mainly consisting of safety constraints, into the denoising diffusion procedure. This way we enable data generation under safety constraints. We show that SafeDiffusers maintain the generative performance of diffusion models while also providing robustness in safe data generation. We finally test our method on a series of planning tasks, including maze path generation, legged robot locomotion, and 3D space manipulation, and demonstrate the advantages of robustness over vanilla diffusion models.

@article{xiao2023safediffuser,
  title={SafeDiffuser: Safe Planning with 
         Diffusion Probabilistic Models},
  author={Xiao, Wei and 
          Wang, Tsun-Hsuan and 
          Gan, Chuang and 
          Rus, Daniela},
  journal={arXiv preprint arXiv:2306.00148},
  year={2023}
}
sym

DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
Tsun-Hsuan Wang, Juntian Zheng, Pingchuan Ma, Yilun Du, Byungchul Kim, Andrew Everett Spielberg, Joshua B Tenenbaum, Chuang Gan, Daniela Rus
NeurIPS 2023 (oral), New Orleans
NeurIPS ML4CD Workshop 2023, New Orleans

webpage | abstract | bibtex | openreview | code (coming soon)

Nature evolves creatures with a high complexity of morphological and behavioral intelligence, meanwhile computational methods lag in approaching that diversity and efficacy. Co-optimization of artificial creatures' morphology and control in silico shows promise for applications in physical soft robotics and virtual character creation; such approaches, however, require developing new learning algorithms that can reason about function atop pure structure. In this paper, we present DiffuseBot, a physics-augmented diffusion model that generates soft robot morphologies capable of excelling in a wide spectrum of tasks. DiffuseBot bridges the gap between virtually generated content and physical utility by (i) augmenting the diffusion process with a physical dynamical simulation which provides a certificate of performance, and ii) introducing a co-design procedure that jointly optimizes physical design and control by leveraging information about physical sensitivities from differentiable simulation. We showcase a range of simulated and fabricated robots along with their capabilities.

@inproceedings{
  wang2023diffusebot,
  title={DiffuseBot: Breeding Soft Robots With 
         Physics-Augmented Generative Diffusion Models},
  author={Tsun-Hsuan Wang and 
          Juntian Zheng and 
          Pingchuan Ma and 
          Yilun Du and 
          Byungchul Kim and 
          Andrew Everett Spielberg and 
          Joshua B. Tenenbaum and 
          Chuang Gan and 
          Daniela Rus},
  booktitle={Thirty-seventh Conference on Neural 
             Information Processing Systems},
  year={2023},
  url={https://openreview.net/forum?id=1zo4iioUEs}
  }
sym

Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning
Mathias Lechner, Lianhao Yin, Tim Seyde, Tsun-Hsuan Wang, Wei Xiao, Ramin Hasani, Joshua Rountree, Daniela Rus
NeurIPS Datasets & Benchmarks 2023, New Orleans

abstract | bibtex | openreview | code

Multi-agent reinforcement learning (MARL) research is faced with a trade-off: it either uses complex environments requiring large compute resources, which makes it inaccessible to researchers with limited resources, or relies on simpler dynamics for faster execution, which makes the transferability of the results to more realistic tasks challenging. Motivated by these challenges, we present Gigastep, a fully vectorizable, MARL environment implemented in JAX, capable of executing up to one billion environment steps per second on consumer-grade hardware. Its design allows for comprehensive MARL experimentation, including a complex, highdimensional space defined by 3D dynamics, stochasticity, and partial observations. Gigastep supports both collaborative and adversarial tasks, continuous and discrete action spaces, and provides RGB image and feature vector observations, allowing the evaluation of a wide range of MARL algorithms. We validate Gigastep's usability through an extensive set of experiments, underscoring its role in widening participation and promoting inclusivity in the MARL research community. MIT licensed code is available at https://github.com/mlech26l/gigastep.

@inproceedings{lechner2023gigastep,
  author={Mathias Lechner and 
          Lianhao Yin and 
          Tim Seyde and 
          Tsun-Hsuan Wang and 
          Wei Xiao and 
          Ramin Hasani and 
          Joshua Rountree and 
          Daniela Rus},
  title={Gigastep - One Billion Steps per Second 
         Multi-agent Reinforcement Learning},
  booktitle={Advances in Neural Information 
             Processing Systems (NeurIPS)},
  year={2023},
  url={https://openreview.net/forum?id=UgPAaEugH3}
}
sym

Measuring Interpretability of Neural Policies of Robots with Disentangled Representation
Tsun-Hsuan Wang, Wei Xiao, Tim Seyde, Ramin Hasani, Daniela Rus
CoRL 2023 (oral), Atlanta

abstract | bibtex | openreview | code (coming soon)

The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are mostly safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement of learned neural dynamics from a concentration of decisions, mutual information and modularity perspective. We showcase the effectiveness of the connection between interpretability and disentanglement consistently across extensive experimental analysis.

@inproceedings{
  wang2023measuring,
  title={Measuring Interpretability of Neural Policies 
         of Robots with Disentangled Representation},
  author={Tsun-Hsuan Wang and 
          Wei Xiao and 
          Tim Seyde and 
          Ramin Hasani and 
          Daniela Rus},
  booktitle={7th Annual Conference on Robot Learning},
  year={2023},
  url={https://openreview.net/forum?id=6kSohKYYTn0}
}
sym

Machine Learning Best Practices for Soft Robot Proprioception
Annan Zhang*, Tsun-Hsuan Wang*, Ryan L Truby, Lillian Chin, Daniela Rus
(* indicates equal contribution)
IROS 2023, Detroit

abstract | bibtex | paper

Machine learning-based approaches for soft robot proprioception have recently gained popularity, in part due to the difficulties in modeling the relationship between sensor signals and robot shape. However, to date, there exists no systematic analysis of the required design choices to set up a machine learning pipeline for soft robot proprioception. Here, we present the first study examining how design choices on different levels of the machine learning pipeline affect the performance of a neural network for predicting the state of a soft robot. We address the most frequent questions researchers face, such as how to choose the appropriate sensor and actuator signals, process input and output data, deal with time series, and pick the best neural network architecture. By testing our hypotheses on data collected from two vastly different systems- an electrically actuated robotic platform and a pneumatically actuated soft trunk - we seek conclusions that may generalize beyond one specific type of soft robot and hope to provide insights for researchers to use machine learning for soft robot proprioception.

@inproceedings{zhang2023machine,
  title={Machine Learning Best Practices for 
         Soft Robot Proprioception},
  author={Zhang, Annan and 
          Wang, Tsun-Hsuan and 
          Truby, Ryan L and 
          Chin, Lillian and 
          Rus, Daniela},
  booktitle={2023 IEEE/RSJ International Conference 
             on Intelligent Robots and Systems (IROS)},
  year={2023},
  organization={IEEE}
}
sym

Towards Cooperative Flight Control Using Visual-Attention
Lianhao Yin, Makram Chahine, Tsun-Hsuan Wang, Tim Niklas Seyde, Chao Liu, Mathias Lechner, Ramin Hasani, Daniela Rus
IROS 2023, Detroit

abstract | bibtex | arxiv | MIT News

The cooperation of a human pilot with an autonomous agent during flight control realizes parallel autonomy. A parallel-autonomous system acts as a guardian that significantly enhances the robustness and safety of flight operations in challenging circumstances. Here, we propose an air-guardian concept that facilitates cooperation between an artificial pilot agent and a parallel end-to-end neural control system. Our vision-based air-guardian system combines a causal continuous-depth neural network model with a cooperation layer to enable parallel autonomy between a pilot agent and a control system based on perceived differences in their attention profile. The attention profiles are obtained by computing the networks' saliency maps (feature importance) through the VisualBackProp algorithm. The guardian agent is trained via reinforcement learning in a fixed-wing aircraft simulated environment. When the attention profile of the pilot and guardian agents align, the pilot makes control decisions. If the attention map of the pilot and the guardian do not align, the air-guardian makes interventions and takes over the control of the aircraft. We show that our attention-based air-guardian system can balance the trade-off between its level of involvement in the flight and the pilot's expertise and attention. We demonstrate the effectivness of our methods in simulated flight scenarios with a fixed-wing aircraft and on a real drone platform.

@article{yin2022cooperative,
  title={Cooperative Flight Control Using 
         Visual-Attention--Air-Guardian},
  author={Yin, Lianhao and 
          Chahine, Makram and
          Wang, Tsun-Hsuan and 
          Seyde, Tim and 
          Lechner, Mathias and 
          Hasani, Ramin and 
          Rus, Daniela},
  journal={arXiv preprint arXiv:2212.11084},
  year={2022}
}
sym

On the Forward Invariance of Neural ODEs
Wei Xiao, Tsun-Hsuan Wang, Ramin Hasani, Mathias Lechner, Daniela Rus
ICML 2023, Honolulu

webpage | abstract | bibtex | arxiv | proceedings | code

We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation. Our approach uses a class of control barrier functions to transform output specifications into constraints on the parameters and inputs of the learning system. This setup allows us to achieve output specification guarantees simply by changing the constrained parameters/inputs both during training and inference. Moreover, we demonstrate that our invariance set propagation through data-controlled neural ODEs not only maintains generalization performance but also creates an additional degree of robustness by enabling causal manipulation of the system's parameters/inputs. We test our method on a series of representation learning tasks, including modeling physical dynamics and convexity portraits, as well as safe collision avoidance for autonomous vehicles.

@article{xiao2022forward,
  title={On the Forward Invariance of Neural ODEs},
  author={Xiao, Wei and 
          Wang, Tsun-Hsuan and 
          Hasani, Ramin and 
          Lechner, Mathias and 
          Rus, Daniela},
  journal={arXiv preprint arXiv:2210.04763},
  year={2022}
}
sym

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments
Tsun-Hsuan Wang, Pingchuan Ma, Andrew Spielberg, Zhou Xian, Hao Zhang, Joshua Tenenbaum, Daniela Rus, Chuang Gan
ICLR 2023, Kigali

webpage | abstract | bibtex | openreview | code

While significant research progress has been made in robot learning for control, unique challenges arise when simultaneously co-optimizing morphology. Existing work has typically been tailored for particular environments or representations. In order to more fully understand inherent design and performance tradeoffs and accelerate the development of new breeds of soft robots, a comprehensive virtual platform — with well-established tasks, environments, and evaluation metrics — is needed. In this work, we introduce SoftZoo, a soft robot co-design platform for locomotion in diverse environments. SoftZoo supports an extensive, naturally-inspired material set, including the ability to simulate environments such as flat ground, desert, wetland, clay, ice, snow, shallow water, and ocean. Further, it provides a variety of tasks relevant for soft robotics, including fast locomotion, agile turning, and path following, as well as differentiable design representations for morphology and control. Combined, these elements form a feature-rich platform for analysis and development of soft robot co-design algorithms. We benchmark prevalent representations and co-design algorithms, and shed light on 1) the interplay between environment, morphology, and behavior (2) the importance of design space representations 3) the ambiguity in muscle formation and controller synthesis and 4) the value of differentiable physics. We envision that SoftZoo will serve as a standard platform and template an approach toward the development of novel representations and algorithms for co-designing soft robots' behavioral and morphological intelligence. Demos are available on our project page.

@inproceedings{
  wang2023softzoo,
  title={SoftZoo: A Soft Robot Co-design 
         Benchmark For Locomotion In 
         Diverse Environments},
  author={Tsun-Hsuan Wang and 
          Pingchuan Ma and 
          Andrew Everett Spielberg and 
          Zhou Xian and 
          Hao Zhang and 
          Joshua B. Tenenbaum and 
          Daniela Rus and 
          Chuang Gan},
  booktitle={The Eleventh International Conference 
             on Learning Representations },
  year={2023},
  url={https://openreview.net/forum?id=Xyme9p1rpZw}
  }
sym

Liquid structural state-space models
Ramin Hasani, Mathias Lechner, Tsun-Hsuan Wang, Makram Chahine, Alexander Amini, Daniela Rus
ICLR 2023, Kigali

abstract | bibtex | arxiv | openreview | code

A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-ofthe-art on a large series of long-range sequence modeling benchmarks. In this paper, we show that we can improve further when the structural SSM such as S4 is given by a linear liquid time-constant (LTC) statespace model. LTC neural networks are causal continuous-time neural networks with an input-dependent state transition module, which makes them learn to adapt to incoming inputs at inference. We show that by using a diagonal plus low-rank decomposition of the state transition matrix introduced in S4, and a few simplifications, the LTC-based structural state-space model, dubbed Liquid-S4, achieves the new state-of-the-art generalization across sequence modeling tasks with long-term dependencies such as image, text, audio, and medical timeseries, with an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition dataset Liquid-S4 achieves 96.78% accuracy with 30% reduction in parameter counts compared to S4. The additional gain in performance is the direct result of the Liquid-S4's kernel structure that takes into account the similarities of the input sequence samples during training and inference.

@article{hasani2022liquid,
  title={Liquid structural state-space models},
  author={Hasani, Ramin and 
          Lechner, Mathias and 
          Wang, Tsun-Hsuan and 
          Chahine, Makram and 
          Amini, Alexander and 
          Rus, Daniela},
  journal={arXiv preprint arXiv:2209.12951},
  year={2022}
}
sym

Learning Stability Attention in Vision-based End-to-end Driving Policies
Tsun-Hsuan Wang*, Wei Xiao*, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus
L4DC 2023, Philadelphia

abstract | bibtex | arxiv

Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end vision-based policies with stability properties and introduce stability attention in CLFs (att-CLFs) to tackle environmental changes and improve learning flexibility. We also present an uncertainty propagation technique that is tightly integrated into att-CLFs. We demonstrate the effectiveness of att-CLFs via comparison with classical CLFs, model predictive control, and vanilla end-to-end learning in a photo-realistic simulator and on a real full-scale autonomous vehicle.

@article{wang2023learning,
  title={Learning Stability Attention in 
         Vision-based End-to-end Driving Policies}, 
  author={Tsun-Hsuan Wang and 
          Wei Xiao and 
          Makram Chahine and 
          Alexander Amini and 
          Ramin Hasani and 
          Daniela Rus},
  journal={arXiv preprint arXiv:2304.02733},
  year={2023},
}
sym

Are All Vision Models Created Equal? A Study of the Open-Loop to Closed-Loop Causality Gap
Mathias Lechner, Ramin Hasani, Alexander Amini, Tsun-Hsuan Wang, Thomas Henzinger, Daniela Rus
NeurIPS ML4AD Workshop, New Orleans

abstract | bibtex | arxiv

There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control from visual observations. These advanced deep models, ranging from convolutional to patch-based networks, have been extensively tested on offline image classification and regression tasks. In this paper, we study these vision architectures with respect to the open-loop to closed-loop causality gap, i.e., offline training followed by an online closed-loop deployment. This causality gap typically emerges in robotics applications such as autonomous driving, where a network is trained to imitate the control commands of a human. In this setting, two situations arise: 1) Closed-loop testing in-distribution, where the test environment shares properties with those of offline training data. 2) Closed-loop testing under distribution shifts and out-of-distribution. Contrary to recently reported results, we show that under proper training guidelines, all vision models perform indistinguishably well on in-distribution deployment, resolving the causality gap. In situation 2, We observe that the causality gap disrupts performance regardless of the choice of the model architecture. Our results imply that the causality gap can be solved in situation one with our proposed training guideline with any modern network architecture, whereas achieving out-of-distribution generalization (situation two) requires further investigations, for instance, on data diversity rather than the model architecture.

  @article{lechner2022all,
    title={Are All Vision Models Created Equal? 
           A Study of the Open-Loop to Closed-Loop 
           Causality Gap},
    author={Lechner, Mathias and 
            Hasani, Ramin and 
            Amini, Alexander and 
            Wang, Tsun-Hsuan and 
            Henzinger, Thomas A and 
            Rus, Daniela},
    journal={arXiv preprint arXiv:2210.04303},
    year={2022}
  }
sym

Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
Wei-Cheng Tseng, Tsun-Hsuan Wang, Yen-Chen Lin, Phillip Isola
NeurIPS 2022, New Orleans

abstract | bibtex | openreview

We introduce an offline multi-agent reinforcement learning (offline MARL) frame- work that utilizes previously collected data without additional online data collection. Our method reformulates offline MARL as a sequence modeling problem and thus builds on top of the simplicity and scalability of the Transformer architecture. In the fashion of centralized training and decentralized execution, we propose to first train a teacher policy who has the privilege to access every agent's observations, actions, and rewards. After the teacher policy has identified and recombined the "good" behavior in the dataset, we create separate student policies and distill not only the teacher policy's features but also its structural relations among different agents' features to student policies. We show that our framework significantly improves performances on a range of tasks and outperforms state-of-the-art offline MARL baselines. Furthermore, we demonstrate that the proposed method has a better convergence rate, is more sample efficient, and is more robust to various demonstration qualities compared with baselines.

@inproceedings{
  tseng2022offline,
  title={Offline Multi-Agent Reinforcement 
         Learning with Knowledge Distillation},
  author={Wei-Cheng Tseng and 
          Tsun-Hsuan Wang and 
          Yen-Chen Lin and 
          Phillip Isola},
  booktitle={Advances in Neural Information 
             Processing Systems},
  editor={Alice H. Oh and Alekh Agarwal and 
          Danielle Belgrave and Kyunghyun Cho},
  year={2022},
  url={https://openreview.net/forum?id=yipUuqxveCy}
  }
sym

Differentiable Control Barrier Functions for Vision-based End-to-End Autonomous Driving
Wei Xiao*, Tsun-Hsuan Wang*, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus
(* indicates equal contribution)
T-RO

abstract | bibtex | arxiv | proceeding | code

Guaranteeing safety of perception-based learning systems is challenging due to the absence of ground-truth state information unlike in state-aware control scenarios. In this paper, we introduce a safety guaranteed learning framework for vision-based end-to-end autonomous driving. To this end, we design a learning system equipped with differentiable control barrier functions (dCBFs) that is trained end-to-end by gradient descent. Our models are composed of conventional neural network architectures and dCBFs. They are interpretable at scale, achieve great test performance under limited training data, and are safety guaranteed in a series of autonomous driving scenarios such as lane keeping and obstacle avoidance. We evaluated our framework in a sim-to-real environment, and tested on a real autonomous car, achieving safe lane following and obstacle avoidance via Augmented Reality (AR) and real parked vehicles.

@article{xiao2022differentiable,
  title={Differentiable Control Barrier 
          Functions for Vision-based 
          End-to-End Autonomous Driving},
  author={Xiao, Wei and Wang, Tsun-Hsuan and 
          Chahine, Makram and Amini, Alexander 
          and Hasani, Ramin and Rus, Daniela},
  journal={arXiv preprint arXiv:2203.02401},
  year={2022}
}
sym

VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles
Alexander Amini*, Tsun-Hsuan Wang*, Igor Gilitschenski, Wilko Schwarting, Zhijian Liu, Song Han, Sertac Karaman, Daniela Rus
(* indicates equal contribution)
ICRA 2022, Philadelphia

webpage | abstract | bibtex | arxiv | MIT News

Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in safety-critical scenarios. However, the poor photorealism and lack of diverse sensor modalities of existing simulation engines remain key hurdles towards realizing this potential. Here, we present VISTA, an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles. Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras, enabling the rapid generation of novel viewpoints in simulation and thereby enriching the data available for policy learning with corner cases that are difficult to capture in the physical world. Using VISTA, we demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle. The policies learned in VISTA exhibit sim-to-real transfer without modification and greater robustness than those trained exclusively on real-world data.

@article{amini2021vista,
  title={VISTA 2.0: An Open, Data-driven Simulator 
         for Multimodal Sensing and Policy Learning
         for Autonomous Vehicles},
  author={Amini, Alexander and 
          Wang, Tsun-Hsuan and 
          Gilitschenski, Igor and 
          Schwarting, Wilko and 
          Liu, Zhijian and 
          Han, Song and 
          Karaman, Sertac and 
          Rus, Daniela},
  journal={arXiv preprint arXiv:2111.12083},
  year={2021}
}
sym

Learning Interactive Driving Policies via Data-driven Simulation
Tsun-Hsuan Wang*, Alexander Amini*, Wilko Schwarting, Igor Gilitschenski, Sertac Karaman, Daniela Rus
(* indicates equal contribution)
ICRA 2022, Philadelphia

webpage | abstract | bibtex | arxiv

Data-driven simulators promise high data efficiency for driving policy learning. When used for modelling interactions, this data-efficiency becomes a bottleneck: Small underlying datasets often lack interesting and challenging edge cases for learning interactive driving. We address this challenge by proposing a simulation method that uses in-painted ado vehicles for learning robust driving policies. Thus, our approach can be used to learn policies that involve multi-agent interactions and allows for training via state-of-the-art policy learning methods. We evaluate the approach for learning standard interaction scenarios in driving. In extensive experiments, our work demonstrates that the resulting policies can be directly transferred to a full-scale autonomous vehicle without making use of any traditional sim-to-real transfer techniques such as domain randomization.

@article{wang2021learning,
  title={Learning Interactive Driving Policies 
         via Data-driven Simulation},
  author={Wang, Tsun-Hsuan and 
          Amini, Alexander and 
          Schwarting, Wilko and 
          Gilitschenski, Igor and 
          Karaman, Sertac and 
          Rus, Daniela},
  journal={arXiv preprint arXiv:2111.12137},
  year={2021}
}
sym

Interpretable Autonomous Flight via Compact Visualizable Neural Circuit Policies
Paul Tylkin, Tsun-Hsuan Wang, Kyle Palko, Ross Allen, Ho Chit Siu, Daniel Wrafter, Tim Niklas Seyde, Alexander Amini, Daniela Rus
RAL 2022

abstract | bibtex | InProceedings

We learn interpretable end-to-end controllersbased on Neural Circuit Policies (NCPs) to enable goal reachingand dynamic obstacle avoidance in flight domains. In additionto being able to learn high-quality control, NCP networksare designed with a small number of neurons. This propertyallows for the learned policies to be interpreted at the neuronlevel and interrogated, leading to more robust understandingof why the artificial agents make the decisions that theydo. We also demonstrate transfer of the learned policy tophysical flight hardware by deploying a small NCP (200KB ofmemory) capable of real-time inference on a Raspberry Pi Zerocontrolling a DJI Tello drone. Designing interpretable artificialagents is crucial for building trustworthy AIs, both as fullyautonomous systems and also for parallel autonomy, wherehumans and AIs work on collaboratively solving problems inthe same environment

@article{tylkin2022interpretable,
  title={Interpretable Autonomous Flight via Compact 
         Visualizable Neural Circuit Policies},
  author={Tylkin, Paul and Wang, Tsun-Hsuan and Palko, 
          Kyle and Allen, Ross and Siu, Ho Chit and 
          Wrafter, Daniel and Seyde, Tim Niklas and 
          Amini, Alexander and Rus, Daniela},
  journal={IEEE Robotics and Automation Letters},
  year={2022},
  publisher={IEEE}
}
sym

Autonomous Flight Arcade Challenge: Single- and Multi-Agent Learning Environments for Aerial Vehicles
Paul Tylkin, Tsun-Hsuan Wang, Tim Seyde, Kyle Palko, Ross Allen, Alexander Amini and Daniela Rus
AAMAS 2022 Extended Abstract, virtual

abstract | bibtex | InProceedings

The Autonomous Flight Arcade (AFA) is a novel suite of singleand multi-agent learning environments for control of aerial vehicles. These environments incorporate realistic physics using the Unity game engine with diverse objectives and levels of decisionmaking sophistication. In addition to the environments themselves, we introduce an interface for interacting with them, including the ability to vary key parameters, thereby both changing the difficulty and the core challenges. We also introduce a pipeline for collecting human gameplay within the environments. We demonstrate the performance of artificial agents in these environments trained using deep reinforcement learning, and also motivate these environments as a benchmark for designing non-learned classical control policies and agents trained using imitation learning from human demonstrations. Finally, we motivate the use of AFA environments as a testbed for training artificial agents capable of cooperative human-AI decision making, including parallel autonomy.

  @inproceedings{tylkin2022autonomous,
    title={Autonomous Flight Arcade Challenge: 
           Single-and Multi-Agent Learning 
           Environments for Aerial Vehicles},
    author={Tylkin, Paul and 
            Wang, Tsun-Hsuan and 
            Seyde, Tim and 
            Palko, Kyle and 
            Allen, Ross and 
            Amini, Alexander and 
            Rus, Daniela},
    booktitle={Proceedings of the 21st International 
               Conference on Autonomous Agents and 
               Multiagent Systems},
    pages={1744--1746},
    year={2022}
  }
sym

Adversarial Attacks On Multi-Agent Communication
Tsun-Hsuan Wang*, James Tu*, Jingkang Wang, Sivabalan Manivasagam, Mengye Ren, Raquel Urtasun
(* indicates equal contribution)
ICCV 2021, virtual

abstract | bibtex | arxiv

Growing at a fast pace, modern autonomous systems will soon be deployed at scale, opening up the possibility for cooperative multi-agent systems. Sharing information and distributing workloads allow autonomous agents to better perform tasks and increase computation efficiency. However, shared information can be modified to execute adversarial attacks on deep learning models that are widely employed in modern systems. Thus, we aim to study the robustness of such systems and focus on exploring adversarial attacks in a novel multi-agent setting where communication is done through sharing learned intermediate representations of neural networks. We observe that an indistinguishable adversarial message can severely degrade performance, but becomes weaker as the number of benign agents increases. Furthermore, we show that black-box transfer attacks are more difficult in this setting when compared to directly perturbing the inputs, as it is necessary to align the distribution of learned representations with domain adaptation. Our work studies robustness at the neural network level to contribute an additional layer of fault tolerance to modern security protocols for more secure multi-agent systems.

@InProceedings{Tu_2021_ICCV,
  author = {Tu, James and 
            Wang, Tsunhsuan and 
            Wang, Jingkang and 
            Manivasagam, Sivabalan and 
            Ren, Mengye and 
            Urtasun, Raquel},
  title = {Adversarial Attacks on 
           Multi-Agent Communication},
  booktitle = {ICCV},
  month = {October},
  year = {2021},
  pages = {7768-7777}
}
sym

V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction
Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, James Tu, Raquel Urtasun
ECCV 2020 (Oral), Glasgow virtual

abstract | bibtex | arxiv

In this paper, we explore the use of vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles. By intelligently aggregating the information received from multiple nearby vehicles, we can observe the same scene from different viewpoints. This allows us to see through occlusions and detect actors at long range, where the observations are very sparse or non-existent. We also show that our approach of sending compressed deep feature map activations achieves high accuracy while satisfying communication bandwidth requirements.

@inproceedings{wang2020v2vnet,
  Author = {Wang, Tsun-Hsuan and 
            Manivasagam, Sivabalan and 
            Liang, Ming and 
            Bin, Yang and 
            Zeng, Wenyuan and 
            Tu, James and 
            Urtasun, Raquel},
  Title = {V2VNet: Vehicle-to-Vehicle Communication 
           for Joint Perception and Prediction},
  Booktitle = {ECCV},
  Year = {2020}
}
sym

Point-to-Point Video Generation
Tsun-Hsuan Wang*, Yen-Chi Cheng*, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun
(* indicates equal contribution)
ICCV 2019, Seoul

webpage | abstract | bibtex | arxiv | code

While image manipulation achieves tremendous breakthroughs (e.g., generating realistic faces) in recent years, video generation is much less explored and harder to control, which limits its applications in the real world. For instance, video editing requires temporal coherence across multiple clips and thus poses both start and end constraints within a video sequence. We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames. The task is challenging since the model not only generates a smooth transition of frames, but also plans ahead to ensure that the generated end-frame conforms to the targeted end-frame for videos of various length. We propose to maximize the modified variational lower bound of conditional data likelihood under a skip-frame training strategy. Our model can generate sequences such that their end-frame is consistent with the targeted end-frame without loss of quality and diversity. Extensive experiments are conducted on Stochastic Moving MNIST, Weizmann Human Action, and Human3.6M to evaluate the effectiveness of the proposed method. We demonstrate our method under a series of scenarios (e.g., dynamic length generation) and the qualitative results showcase the potential and merits of point-to-point generation.

@inproceedings{wang2019p2pvg,
  Author = {Wang, Tsun-Hsuan and 
            Cheng, Yen-Chi and 
            Lin, Chieh Hubert and 
            Chen, Hwann-Tzong and 
            Sun, Min},
  Title = {Point-to-Point Video Generation},
  Booktitle = {ICCV},
  Year = {2019}
}
sym

3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization
Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
IROS 2019, Macao

webpage | abstract | bibtex | arxiv | code

The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception. Instead of directly fusing estimated depths across LiDAR and stereo modalities, we take advantages of the stereo matching network with two enhanced techniques: Input Fusion and Conditional Cost Volume Normalization (CCVNorm) on the LiDAR information. The proposed framework is generic and closely integrated with the cost volume component that is commonly utilized in stereo matching neural networks. We experimentally verify the efficacy and robustness of our method on the KITTI Stereo and Depth Completion datasets, obtaining favorable performance against various fusion strategies. Moreover, we demonstrate that, with a hierarchical extension of CCVNorm, the proposed method brings only slight overhead to the stereo matching network in terms of computation time and model size.

@inproceedings{wang2019ccvnorm,
  Author = {Wang, Tsun-Hsuan and 
            Hu, Hou-Ning and 
            Lin, Chieh Hubert and 
            Tsai, Yi-Hsuan and 
            Chiu, Wei-Chen and 
            Sun, Min},
  Title = {3D LiDAR and Stereo Fusion using Stereo 
           Matching Network with Conditional 
           Cost Volume Normalization},
  Booktitle = {IROS},
  Year = {2019}
}
sym

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation
Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
ICRA 2019, Montreal

webpage | abstract | bibtex | arxiv | code

We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input. Given any pre-trained depth prediction model, our PnP module updates the intermediate feature map such that the model outputs new depths consistent with the given sparse depths. Our method requires no additional training and can be applied to practical applications such as leveraging both RGB and sparse LiDAR points to robustly estimate dense depth map. Our approach achieves consistent improvements on various state-of-the-art methods on indoor (i.e., NYU-v2) and outdoor (i.e., KITTI) datasets. Various types of LiDARs are also synthesized in our experiments to verify the general applicability of our PnP module in practice.

@inproceedings{wang2019pnpdepth,
  Author = {Wang, Tsun-Hsuan and 
            Wang, Fu-En and 
            Lin, Juan-Ting and 
            Tsai, Yi-Hsuan and 
            Chiu, Wei-Chen and 
            Sun, Min},
  Title = {Plug-and-Play: Improve Depth Estimation 
           via Sparse Data Propagation},
  Booktitle = {ICRA},
  Year = {2019}
}
sym

Liquid Pouring Monitoring via Rich Sensory Inputs
Tz-Ying Wu*, Juan-Ting Lin*, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun
(* indicates equal contribution)
ECCV 2018, Munich

webpage | abstract | bibtex | arxiv

Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied. In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful (e.g., no spilling) or not via rich sensory inputs. We mimic humans’ rich sensories using synchronized observation from a chest-mounted camera and a wrist-mounted IMU sensor. Given many success and failure demonstrations of liquid pouring, we train a hierarchical LSTM with late fusion for monitoring. To improve the robustness of the system, we propose two auxiliary tasks during training: inferring (1) the initial state of containers and (2) forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure. These tasks encourage our method to learn representation sensitive to container states and how objects are manipulated in 3D. With these novel components, our method achieves ~8% and ~11% better monitoring accuracy than the baseline method without auxiliary tasks on unseen containers and unseen users respectively.

@inproceedings{wu2019pouring,
  Author = {Wu, Tz-Ying and 
            Lin, Juan-Ting and 
            Wang, Tsun-Hsuan and 
            Hu, Chan-Wei and 
            Niebles, Juan Carlos and 
            Sun, Min},
  Title = {Liquid Pouring Monitoring via Rich 
           Sensory Inputs},
  Booktitle = {ECCV},
  Year = {2018}
}
sym

Omnidirectional CNN for Visual Place Recognition and Navigation
Tsun-Hsuan Wang*, Hung-Jui Huang*, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun
(* indicates equal contribution)
ICRA 2018, Brisbane

webpage | abstract | bibtex | arxiv | code

Visual place recognition is challenging, especially when only a few place exemplars are given. To mitigate the challenge, we consider place recognition method using omnidirectional cameras and propose a novel Omnidirectional Convolutional Neural Network (O-CNN) to handle severe camera pose variation. Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place. With the ability to estimate relative distance, a heuristic policy is proposed to navigate a robot to the retrieved closest place. Note that the network is designed to take advantage of the omnidirectional view by incorporating circular padding and rotation invariance. To train a powerful O-CNN, we build a virtual world for training on a large scale. We also propose a continuous lifted structured feature embedding loss to learn the concept of distance efficiently. Finally, our experimental results confirm that our method achieves state-of-the-art accuracy and speed with both the virtual world and real-world datasets.

@inproceedings{wang2019omnicnn,
  Author = {Wang, Tsun-Hsuan and 
            Huang, Hung-Jui and 
            Lin, Juan-Ting and 
            Hu, Chan-Wei and 
            Zeng, Kuo-Hao and 
            Sun, Min},
  Title = {Omnidirectional CNN for Visual Place 
           Recognition and Navigation},
  Booktitle = {ICRA},
  Year = {2018}
}
  Award

Fall 2022,             MathWorks Fellowship

Summer 2022,    Finalist of Qualcomm Innovation Fellowship

Fall 2020,             David S. Y. and Harold Wong Fellowship

Fall 2018,             Appier Scholarship

Fall 2017,             NTHU Matriculation Scholarship (MS)

Fall 2016,             Academic Achievement Award

Summer 2014,    Oversea Exchange Scholarship

Fall 2013,             NTHU Matriculation Scholarship (BS)

  Teaching

August 2017,    TA, Vision for Interaction, AI Summer School, MOST

Fall 2017,           Head TA, Computer Vision, NTHU

March 2017,      TA, Reinforcement Learning, TSMC


A huge thanks to template from this.