Tsun-Hsuan (Johnson) Wang
tsunw_at_mit_dot_edu
|
CV (Oct. 23) |
Google Scholar | |
Github |
Twitter |
|
Currently, I am a 4'rd-year PhD student at MIT CSAIL, under the supervision of Prof. Daniela Rus.
I received my M.Sc. at National Tsing Hua University (NTHU) advised by Prof. Min Sun. I also got my B.Sc. at this fantastic place.
During the last half year of my master study, I had a wonderful time working with Prof. Raquel Urtasun at Uber ATG.
Prior to that, I was lucky to collaborate with Prof. Wei-Chen Chiu, Dr. Yi-Hsuan Tsai, and Prof. Hwann-Tzong Chen.
My research interest lies in the intersection of robotics, simulation, and machine learning.
Particularly, I am fascinated with building the entire life cycle of robots; this spans from design, to data, to modeling, to learning, and finally to decision.
Along this direction, my goal is to bridge higher level cognition to lower level physics, control, or even embodiment and morphology.
My chinese name is 尊玄, which pronouces just like Johnson
|
MIT PhD in EECS Sep 20 - Present
|
MIT-IBM Research Intern June 22 - Sept 22
|
Uber ATG Research Intern Jun 19 - Mar 20
|
NTHU M.Sc. in EE Sep 17 - Mar 20
|
IIS, Academia Sinica Research Intern Jun 17 - Aug 17
|
NTHU B.Sc. in EE Sep 13 - Jun 17
|
News
- [11/2023] We are organizing a workshop on the topic of "Towards Generalist Robots" at CoRL 2023 [white paper].
- [09/2023] 2 papers accepted by NeurIPS'23; one (DiffuseBot) as oral and the other (Gigastep) at the dataset track!
- [08/2023] 1 paper (interpretability by disentanglement) accepted by CoRL'23 as oral!
- [06/2023] 2 papers (ML for soft robots, cooperative flight) accepted by IROS'23!
- [05/2023] Our soft robot co-design work SoftZoo featured in MIT News!
- [04/2023] 1 paper (Invariance-ODE) accepted by ICML'23!
- [03/2023] 1 paper (Att-CLF) accepted by L4DC'23!
- [01/2023] 2 papers (SoftZoo, LiquidS4) accepted by ICLR'23!
- [12/2022] 1 paper (BarrierNet) accepted by T-RO!
|
|
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models
Tsun-Hsuan Wang, Alaa Maalouf, Wei Xiao, Yutong Ban, Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus
Under Submission,
CoRL OOD Workshop 2023, Atlanta,
NeurIPS Robot Learning Workshop 2023, New Orleans
webpage |
abstract |
bibtex |
arxiv
As autonomous driving technology matures, end-to-end
methodologies have emerged as a leading strategy, promising seamless integration from
perception to control via deep learning. However, existing systems grapple with challenges
such as unexpected open set environments and the complexity of black-box models. At the same
time, the evolution of deep learning introduces larger, multimodal foundational models, offering
multi-modal visual and textual understanding. In this paper, we harness these multimodal
foundation models to enhance the robustness and adaptability of autonomous driving systems,
enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically,
we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that
is capable of providing driving decisions from representations queryable by image and text. To do so,
we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers
to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates
unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution
situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training
(data augmentation via text) and policy debugging. We encourage the reader to check our explainer
video at this https URL and to view the code and demos on our project webpage at this https URL.
@article{wang2023drive,
title={Drive Anywhere: Generalizable End-to-end
Autonomous Driving with Multi-modal
Foundation Models},
author={Wang, Tsun-Hsuan and
Maalouf, Alaa and
Xiao, Wei and
Ban, Yutong and
Amini, Alexander and
Rosman, Guy and
Karaman, Sertac and
Rus, Daniela},
journal={arXiv preprint arXiv:2310.17642},
year={2023}
}
|
|
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Yufei Wang*, Zhou Xian*, Feng Chen*, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan
Under Submission
(Powered by Genesis; stay tuned!!)
webpage |
abstract |
bibtex |
arxiv |
code
We present RoboGen, a generative robotic agent that
automatically learns diverse robotic skills at scale via generative simulation. RoboGen
leverages the latest advancements in foundation and generative models. Instead of directly
using or adapting these models to produce policies or low-level actions, we advocate for a
generative scheme, which uses these models to automatically generate diversified tasks, scenes,
and training supervisions, thereby scaling up robotic skill learning with minimal human supervision.
Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent
first proposes interesting tasks and skills to develop, and then generates corresponding simulation
environments by populating pertinent objects and assets with proper spatial configurations.
Afterwards, the agent decomposes the proposed high-level task into sub-tasks, selects the
optimal learning approach (reinforcement learning, motion planning, or trajectory optimization),
generates required training supervision, and then learns policies to acquire the proposed skill.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models
and transfer them to the field of robotics. Our fully generative pipeline can be queried repeatedly,
producing an endless stream of skill demonstrations associated with diverse tasks and environments.
@article{wang2023robogen,
title={RoboGen: Towards Unleashing Infinite
Data for Automated Robot Learning
via Generative Simulation},
author={Wang, Yufei and
Xian, Zhou and
Chen, Feng and
Wang, Tsun-Hsuan and
Wang, Yian and
Fragkiadaki, Katerina and
Erickson, Zackory and
Held, David and
Gan, Chuang},
journal={arXiv preprint arXiv:2311.01455},
year={2023}
}
|
|
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
Wei Xiao, Tsun-Hsuan Wang, Chuang Gan, Daniela Rus
Under Submission
webpage |
abstract |
bibtex |
arxiv
Diffusion model-based approaches have shown
promise in data-driven planning. Although these planners are typically used in decision-critical
applications, there are yet no known safety guarantees established for them. In this paper,
we address this limitation by introducing SafeDiffuser, a method to equip probabilistic diffusion
models with safety guarantees via control barrier functions. The key idea of our approach is to
embed finite-time diffusion invariance, i.e., a form of specification mainly consisting of safety
constraints, into the denoising diffusion procedure. This way we enable data generation under safety
constraints. We show that SafeDiffusers maintain the generative performance of diffusion models while
also providing robustness in safe data generation. We finally test our method on a series of planning
tasks, including maze path generation, legged robot locomotion, and 3D space manipulation, and
demonstrate the advantages of robustness over vanilla diffusion models.
@article{xiao2023safediffuser,
title={SafeDiffuser: Safe Planning with
Diffusion Probabilistic Models},
author={Xiao, Wei and
Wang, Tsun-Hsuan and
Gan, Chuang and
Rus, Daniela},
journal={arXiv preprint arXiv:2306.00148},
year={2023}
}
|
|
DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
Tsun-Hsuan Wang, Juntian Zheng, Pingchuan Ma, Yilun Du, Byungchul Kim, Andrew Everett Spielberg, Joshua B Tenenbaum, Chuang Gan, Daniela Rus
NeurIPS 2023 (oral), New Orleans
NeurIPS ML4CD Workshop 2023, New Orleans
webpage |
abstract |
bibtex |
openreview |
code (coming soon)
Nature evolves creatures with a high complexity of morphological and behavioral
intelligence, meanwhile computational methods lag in approaching that diversity
and efficacy. Co-optimization of artificial creatures' morphology and control in
silico shows promise for applications in physical soft robotics and virtual character
creation; such approaches, however, require developing new learning algorithms
that can reason about function atop pure structure. In this paper, we present DiffuseBot, a physics-augmented diffusion model that generates soft robot morphologies
capable of excelling in a wide spectrum of tasks. DiffuseBot bridges the gap
between virtually generated content and physical utility by (i) augmenting the
diffusion process with a physical dynamical simulation which provides a certificate
of performance, and ii) introducing a co-design procedure that jointly optimizes
physical design and control by leveraging information about physical sensitivities
from differentiable simulation. We showcase a range of simulated and fabricated
robots along with their capabilities.
@inproceedings{
wang2023diffusebot,
title={DiffuseBot: Breeding Soft Robots With
Physics-Augmented Generative Diffusion Models},
author={Tsun-Hsuan Wang and
Juntian Zheng and
Pingchuan Ma and
Yilun Du and
Byungchul Kim and
Andrew Everett Spielberg and
Joshua B. Tenenbaum and
Chuang Gan and
Daniela Rus},
booktitle={Thirty-seventh Conference on Neural
Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=1zo4iioUEs}
}
|
|
Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning
Mathias Lechner, Lianhao Yin, Tim Seyde, Tsun-Hsuan Wang, Wei Xiao, Ramin Hasani, Joshua Rountree, Daniela Rus
NeurIPS Datasets & Benchmarks 2023, New Orleans
abstract |
bibtex |
openreview |
code
Multi-agent reinforcement learning (MARL) research is faced with a trade-off: it
either uses complex environments requiring large compute resources, which makes
it inaccessible to researchers with limited resources, or relies on simpler dynamics
for faster execution, which makes the transferability of the results to more realistic
tasks challenging. Motivated by these challenges, we present Gigastep, a fully
vectorizable, MARL environment implemented in JAX, capable of executing up to
one billion environment steps per second on consumer-grade hardware. Its design
allows for comprehensive MARL experimentation, including a complex, highdimensional space defined by 3D dynamics, stochasticity, and partial observations.
Gigastep supports both collaborative and adversarial tasks, continuous and discrete
action spaces, and provides RGB image and feature vector observations, allowing
the evaluation of a wide range of MARL algorithms. We validate Gigastep's
usability through an extensive set of experiments, underscoring its role in widening
participation and promoting inclusivity in the MARL research community. MIT
licensed code is available at https://github.com/mlech26l/gigastep.
@inproceedings{lechner2023gigastep,
author={Mathias Lechner and
Lianhao Yin and
Tim Seyde and
Tsun-Hsuan Wang and
Wei Xiao and
Ramin Hasani and
Joshua Rountree and
Daniela Rus},
title={Gigastep - One Billion Steps per Second
Multi-agent Reinforcement Learning},
booktitle={Advances in Neural Information
Processing Systems (NeurIPS)},
year={2023},
url={https://openreview.net/forum?id=UgPAaEugH3}
}
|
|
Measuring Interpretability of Neural Policies of Robots with Disentangled Representation
Tsun-Hsuan Wang, Wei Xiao, Tim Seyde, Ramin Hasani, Daniela Rus
CoRL 2023 (oral), Atlanta
abstract |
bibtex |
openreview |
code (coming soon)
The advancement of robots, particularly those functioning in
complex human-centric environments, relies on control solutions that are driven by machine learning.
Understanding how learning-based controllers make decisions is crucial since robots are mostly safety-critical
systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability
of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens
of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement
in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well
networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement
of learned neural dynamics from a concentration of decisions, mutual information and modularity perspective.
We showcase the effectiveness of the connection between interpretability and disentanglement consistently
across extensive experimental analysis.
@inproceedings{
wang2023measuring,
title={Measuring Interpretability of Neural Policies
of Robots with Disentangled Representation},
author={Tsun-Hsuan Wang and
Wei Xiao and
Tim Seyde and
Ramin Hasani and
Daniela Rus},
booktitle={7th Annual Conference on Robot Learning},
year={2023},
url={https://openreview.net/forum?id=6kSohKYYTn0}
}
|
|
Machine Learning Best Practices for Soft Robot Proprioception
Annan Zhang*, Tsun-Hsuan Wang*, Ryan L Truby, Lillian Chin, Daniela Rus
(* indicates equal contribution)
IROS 2023, Detroit
|
|
Towards Cooperative Flight Control Using Visual-Attention
Lianhao Yin, Makram Chahine, Tsun-Hsuan Wang, Tim Niklas Seyde, Chao Liu, Mathias Lechner, Ramin Hasani, Daniela Rus
IROS 2023, Detroit
abstract |
bibtex |
arxiv |
MIT News
The cooperation of a human pilot with an autonomous agent
during flight control realizes parallel autonomy. A parallel-autonomous system acts as a guardian
that significantly enhances the robustness and safety of flight operations in challenging circumstances.
Here, we propose an air-guardian concept that facilitates cooperation between an artificial pilot agent
and a parallel end-to-end neural control system. Our vision-based air-guardian system combines a causal
continuous-depth neural network model with a cooperation layer to enable parallel autonomy between a pilot
agent and a control system based on perceived differences in their attention profile. The attention profiles
are obtained by computing the networks' saliency maps (feature importance) through the VisualBackProp algorithm.
The guardian agent is trained via reinforcement learning in a fixed-wing aircraft simulated environment. When
the attention profile of the pilot and guardian agents align, the pilot makes control decisions. If the
attention map of the pilot and the guardian do not align, the air-guardian makes interventions and takes
over the control of the aircraft. We show that our attention-based air-guardian system can balance the trade-off
between its level of involvement in the flight and the pilot's expertise and attention. We demonstrate the
effectivness of our methods in simulated flight scenarios with a fixed-wing aircraft and on a real drone platform.
@article{yin2022cooperative,
title={Cooperative Flight Control Using
Visual-Attention--Air-Guardian},
author={Yin, Lianhao and
Chahine, Makram and
Wang, Tsun-Hsuan and
Seyde, Tim and
Lechner, Mathias and
Hasani, Ramin and
Rus, Daniela},
journal={arXiv preprint arXiv:2212.11084},
year={2022}
}
|
|
On the Forward Invariance of Neural ODEs
Wei Xiao, Tsun-Hsuan Wang, Ramin Hasani, Mathias Lechner, Daniela Rus
ICML 2023, Honolulu
webpage |
abstract |
bibtex |
arxiv |
proceedings |
code
We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output
specifications by using invariance set propagation. Our approach uses a class of control barrier
functions to transform output specifications into
constraints on the parameters and inputs of the
learning system. This setup allows us to achieve
output specification guarantees simply by changing the constrained parameters/inputs both during training and inference. Moreover,
we demonstrate that our invariance set propagation through
data-controlled neural ODEs not only maintains
generalization performance but also creates an additional degree of robustness by enabling causal
manipulation of the system's parameters/inputs.
We test our method on a series of representation
learning tasks, including modeling physical dynamics and convexity portraits, as well as safe
collision avoidance for autonomous vehicles.
@article{xiao2022forward,
title={On the Forward Invariance of Neural ODEs},
author={Xiao, Wei and
Wang, Tsun-Hsuan and
Hasani, Ramin and
Lechner, Mathias and
Rus, Daniela},
journal={arXiv preprint arXiv:2210.04763},
year={2022}
}
|
|
SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments
Tsun-Hsuan Wang, Pingchuan Ma, Andrew Spielberg, Zhou Xian, Hao Zhang, Joshua Tenenbaum, Daniela Rus, Chuang Gan
ICLR 2023, Kigali
webpage |
abstract |
bibtex |
openreview |
code
While significant research progress has been made in robot learning for control,
unique challenges arise when simultaneously co-optimizing morphology. Existing work has typically been tailored for
particular environments or representations. In order to more fully understand inherent design and performance tradeoffs
and accelerate the development of new breeds of soft robots, a comprehensive virtual platform — with well-established tasks,
environments, and evaluation metrics — is needed. In this work, we introduce SoftZoo, a soft robot co-design platform for
locomotion in diverse environments. SoftZoo supports an extensive, naturally-inspired material set, including the ability
to simulate environments such as flat ground, desert, wetland, clay, ice, snow, shallow water, and ocean. Further, it provides
a variety of tasks relevant for soft robotics, including fast locomotion, agile turning, and path following, as well as
differentiable design representations for morphology and control. Combined, these elements form a feature-rich platform
for analysis and development of soft robot co-design algorithms. We benchmark prevalent representations and co-design algorithms,
and shed light on 1) the interplay between environment, morphology, and behavior (2) the importance of design space representations
3) the ambiguity in muscle formation and controller synthesis and 4) the value of differentiable physics. We envision that SoftZoo
will serve as a standard platform and template an approach toward the development of novel representations and algorithms for
co-designing soft robots' behavioral and morphological intelligence. Demos are available on our project page.
@inproceedings{
wang2023softzoo,
title={SoftZoo: A Soft Robot Co-design
Benchmark For Locomotion In
Diverse Environments},
author={Tsun-Hsuan Wang and
Pingchuan Ma and
Andrew Everett Spielberg and
Zhou Xian and
Hao Zhang and
Joshua B. Tenenbaum and
Daniela Rus and
Chuang Gan},
booktitle={The Eleventh International Conference
on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=Xyme9p1rpZw}
}
|
|
Liquid structural state-space models
Ramin Hasani, Mathias Lechner, Tsun-Hsuan Wang, Makram Chahine, Alexander Amini, Daniela Rus
ICLR 2023, Kigali
abstract |
bibtex |
arxiv |
openreview |
code
A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard
nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-ofthe-art on a large series of long-range sequence modeling benchmarks. In this paper, we show that we can
improve further when the structural SSM such as S4 is given by a linear liquid time-constant (LTC) statespace model. LTC neural networks are causal continuous-time neural networks with an input-dependent state
transition module, which makes them learn to adapt to incoming inputs at inference. We show that by using a
diagonal plus low-rank decomposition of the state transition matrix introduced in S4, and a few simplifications,
the LTC-based structural state-space model, dubbed Liquid-S4, achieves the new state-of-the-art generalization
across sequence modeling tasks with long-term dependencies such as image, text, audio, and medical timeseries, with an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech
Command recognition dataset Liquid-S4 achieves 96.78% accuracy with 30% reduction in parameter counts
compared to S4. The additional gain in performance is the direct result of the Liquid-S4's kernel structure that
takes into account the similarities of the input sequence samples during training and inference.
@article{hasani2022liquid,
title={Liquid structural state-space models},
author={Hasani, Ramin and
Lechner, Mathias and
Wang, Tsun-Hsuan and
Chahine, Makram and
Amini, Alexander and
Rus, Daniela},
journal={arXiv preprint arXiv:2209.12951},
year={2022}
}
|
|
Learning Stability Attention in Vision-based End-to-end Driving Policies
Tsun-Hsuan Wang*, Wei Xiao*, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus
L4DC 2023, Philadelphia
abstract |
bibtex |
arxiv
Modern end-to-end learning systems can learn to explicitly
infer control from perception. However, it is difficult to guarantee stability and robustness for
these systems since they are often exposed to unstructured, high-dimensional, and complex observation
spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov
functions (CLFs) to equip end-to-end vision-based policies with stability properties and introduce
stability attention in CLFs (att-CLFs) to tackle environmental changes and improve learning flexibility.
We also present an uncertainty propagation technique that is tightly integrated into att-CLFs. We
demonstrate the effectiveness of att-CLFs via comparison with classical CLFs, model predictive control,
and vanilla end-to-end learning in a photo-realistic simulator and on a real full-scale autonomous vehicle.
@article{wang2023learning,
title={Learning Stability Attention in
Vision-based End-to-end Driving Policies},
author={Tsun-Hsuan Wang and
Wei Xiao and
Makram Chahine and
Alexander Amini and
Ramin Hasani and
Daniela Rus},
journal={arXiv preprint arXiv:2304.02733},
year={2023},
}
|
|
Are All Vision Models Created Equal? A Study of the Open-Loop to Closed-Loop Causality Gap
Mathias Lechner, Ramin Hasani, Alexander Amini, Tsun-Hsuan Wang, Thomas Henzinger, Daniela Rus
NeurIPS ML4AD Workshop, New Orleans
abstract |
bibtex |
arxiv
There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control
from visual observations. These advanced deep models, ranging from convolutional to patch-based networks,
have been extensively tested on offline image classification and regression tasks. In this paper, we study these
vision architectures with respect to the open-loop to closed-loop causality gap, i.e., offline training followed
by an online closed-loop deployment. This causality gap typically emerges in robotics applications such as
autonomous driving, where a network is trained to imitate the control commands of a human. In this setting, two
situations arise: 1) Closed-loop testing in-distribution, where the test environment shares properties with those of
offline training data. 2) Closed-loop testing under distribution shifts and out-of-distribution. Contrary to recently
reported results, we show that under proper training guidelines, all vision models perform indistinguishably
well on in-distribution deployment, resolving the causality gap. In situation 2, We observe that the causality gap
disrupts performance regardless of the choice of the model architecture. Our results imply that the causality
gap can be solved in situation one with our proposed training guideline with any modern network architecture,
whereas achieving out-of-distribution generalization (situation two) requires further investigations, for instance,
on data diversity rather than the model architecture.
@article{lechner2022all,
title={Are All Vision Models Created Equal?
A Study of the Open-Loop to Closed-Loop
Causality Gap},
author={Lechner, Mathias and
Hasani, Ramin and
Amini, Alexander and
Wang, Tsun-Hsuan and
Henzinger, Thomas A and
Rus, Daniela},
journal={arXiv preprint arXiv:2210.04303},
year={2022}
}
|
|
Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
Wei-Cheng Tseng, Tsun-Hsuan Wang, Yen-Chen Lin, Phillip Isola
NeurIPS 2022, New Orleans
abstract |
bibtex |
openreview
We introduce an offline multi-agent reinforcement learning (offline MARL) frame-
work that utilizes previously collected data without additional online data collection.
Our method reformulates offline MARL as a sequence modeling problem and thus
builds on top of the simplicity and scalability of the Transformer architecture. In
the fashion of centralized training and decentralized execution, we propose to first
train a teacher policy who has the privilege to access every agent's observations,
actions, and rewards. After the teacher policy has identified and recombined the
"good" behavior in the dataset, we create separate student policies and distill not
only the teacher policy's features but also its structural relations among different
agents' features to student policies. We show that our framework significantly
improves performances on a range of tasks and outperforms state-of-the-art offline
MARL baselines. Furthermore, we demonstrate that the proposed method has a
better convergence rate, is more sample efficient, and is more robust to various
demonstration qualities compared with baselines.
@inproceedings{
tseng2022offline,
title={Offline Multi-Agent Reinforcement
Learning with Knowledge Distillation},
author={Wei-Cheng Tseng and
Tsun-Hsuan Wang and
Yen-Chen Lin and
Phillip Isola},
booktitle={Advances in Neural Information
Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and
Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=yipUuqxveCy}
}
|
|
Differentiable Control Barrier Functions for Vision-based End-to-End Autonomous Driving
Wei Xiao*, Tsun-Hsuan Wang*, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus
(* indicates equal contribution)
T-RO
abstract |
bibtex |
arxiv |
proceeding |
code
Guaranteeing safety of perception-based learning systems
is challenging due to the absence of ground-truth state information unlike in state-aware control
scenarios. In this paper, we introduce a safety guaranteed learning framework for vision-based
end-to-end autonomous driving. To this end, we design a learning system equipped with differentiable
control barrier functions (dCBFs) that is trained end-to-end by gradient descent. Our models are composed
of conventional neural network architectures and dCBFs. They are interpretable at scale, achieve great
test performance under limited training data, and are safety guaranteed in a series of autonomous
driving scenarios such as lane keeping and obstacle avoidance. We evaluated our framework in a sim-to-real
environment, and tested on a real autonomous car, achieving safe lane following and obstacle avoidance
via Augmented Reality (AR) and real parked vehicles.
@article{xiao2022differentiable,
title={Differentiable Control Barrier
Functions for Vision-based
End-to-End Autonomous Driving},
author={Xiao, Wei and Wang, Tsun-Hsuan and
Chahine, Makram and Amini, Alexander
and Hasani, Ramin and Rus, Daniela},
journal={arXiv preprint arXiv:2203.02401},
year={2022}
}
|
|
VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles
Alexander Amini*, Tsun-Hsuan Wang*, Igor Gilitschenski, Wilko Schwarting,
Zhijian Liu, Song Han, Sertac Karaman, Daniela Rus
(* indicates equal contribution)
ICRA 2022, Philadelphia
webpage |
abstract |
bibtex |
arxiv |
MIT News
Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in
safety-critical scenarios. However, the poor photorealism and lack of diverse sensor modalities of existing simulation
engines remain key hurdles towards realizing this potential. Here, we present VISTA, an open source, data-driven
simulator that integrates multiple types of sensors for autonomous vehicles. Using high fidelity, real-world datasets,
VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras, enabling the rapid generation of novel
viewpoints in simulation and thereby enriching the data available for policy learning with corner cases that are
difficult to capture in the physical world. Using VISTA, we demonstrate the ability to train and test
perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on
a full scale autonomous vehicle. The policies learned in VISTA exhibit sim-to-real transfer without modification and
greater robustness than those trained exclusively on real-world data.
@article{amini2021vista,
title={VISTA 2.0: An Open, Data-driven Simulator
for Multimodal Sensing and Policy Learning
for Autonomous Vehicles},
author={Amini, Alexander and
Wang, Tsun-Hsuan and
Gilitschenski, Igor and
Schwarting, Wilko and
Liu, Zhijian and
Han, Song and
Karaman, Sertac and
Rus, Daniela},
journal={arXiv preprint arXiv:2111.12083},
year={2021}
}
|
|
Learning Interactive Driving Policies via Data-driven Simulation
Tsun-Hsuan Wang*, Alexander Amini*, Wilko Schwarting, Igor Gilitschenski,
Sertac Karaman, Daniela Rus
(* indicates equal contribution)
ICRA 2022, Philadelphia
webpage |
abstract |
bibtex |
arxiv
Data-driven simulators promise high
data efficiency for driving policy learning. When used for modelling
interactions, this data-efficiency becomes a bottleneck: Small
underlying datasets often lack interesting and challenging edge
cases for learning interactive driving. We address this challenge
by proposing a simulation method that uses in-painted ado
vehicles for learning robust driving policies. Thus, our approach
can be used to learn policies that involve multi-agent interactions and
allows for training via state-of-the-art policy learning
methods. We evaluate the approach for learning standard
interaction scenarios in driving. In extensive experiments, our
work demonstrates that the resulting policies can be directly
transferred to a full-scale autonomous vehicle without making
use of any traditional sim-to-real transfer techniques such as
domain randomization.
@article{wang2021learning,
title={Learning Interactive Driving Policies
via Data-driven Simulation},
author={Wang, Tsun-Hsuan and
Amini, Alexander and
Schwarting, Wilko and
Gilitschenski, Igor and
Karaman, Sertac and
Rus, Daniela},
journal={arXiv preprint arXiv:2111.12137},
year={2021}
}
|
|
Interpretable Autonomous Flight via Compact Visualizable Neural Circuit Policies
Paul Tylkin, Tsun-Hsuan Wang, Kyle Palko, Ross Allen, Ho Chit Siu,
Daniel Wrafter, Tim Niklas Seyde, Alexander Amini, Daniela Rus
RAL 2022
abstract |
bibtex |
InProceedings
We learn interpretable end-to-end controllersbased on Neural Circuit Policies (NCPs) to enable goal reachingand dynamic
obstacle avoidance in flight domains. In additionto being able to learn high-quality control, NCP networksare designed
with a small number of neurons. This propertyallows for the learned policies to be interpreted at the neuronlevel and
interrogated, leading to more robust understandingof why the artificial agents make the decisions that theydo. We also
demonstrate transfer of the learned policy tophysical flight hardware by deploying a small NCP (200KB ofmemory) capable
of real-time inference on a Raspberry Pi Zerocontrolling a DJI Tello drone. Designing interpretable artificialagents is
crucial for building trustworthy AIs, both as fullyautonomous systems and also for parallel autonomy, wherehumans and
AIs work on collaboratively solving problems inthe same environment
@article{tylkin2022interpretable,
title={Interpretable Autonomous Flight via Compact
Visualizable Neural Circuit Policies},
author={Tylkin, Paul and Wang, Tsun-Hsuan and Palko,
Kyle and Allen, Ross and Siu, Ho Chit and
Wrafter, Daniel and Seyde, Tim Niklas and
Amini, Alexander and Rus, Daniela},
journal={IEEE Robotics and Automation Letters},
year={2022},
publisher={IEEE}
}
|
|
Autonomous Flight Arcade Challenge: Single- and Multi-Agent Learning Environments for Aerial Vehicles
Paul Tylkin, Tsun-Hsuan Wang, Tim Seyde, Kyle Palko, Ross Allen, Alexander Amini and Daniela Rus
AAMAS 2022 Extended Abstract, virtual
abstract |
bibtex |
InProceedings
The Autonomous Flight Arcade (AFA) is a novel suite of singleand multi-agent learning environments for control of aerial vehicles. These environments incorporate realistic physics using the
Unity game engine with diverse objectives and levels of decisionmaking sophistication. In addition to the environments themselves,
we introduce an interface for interacting with them, including the
ability to vary key parameters, thereby both changing the difficulty
and the core challenges. We also introduce a pipeline for collecting human gameplay within the environments. We demonstrate
the performance of artificial agents in these environments trained
using deep reinforcement learning, and also motivate these environments as a benchmark for designing non-learned classical control
policies and agents trained using imitation learning from human
demonstrations. Finally, we motivate the use of AFA environments
as a testbed for training artificial agents capable of cooperative
human-AI decision making, including parallel autonomy.
@inproceedings{tylkin2022autonomous,
title={Autonomous Flight Arcade Challenge:
Single-and Multi-Agent Learning
Environments for Aerial Vehicles},
author={Tylkin, Paul and
Wang, Tsun-Hsuan and
Seyde, Tim and
Palko, Kyle and
Allen, Ross and
Amini, Alexander and
Rus, Daniela},
booktitle={Proceedings of the 21st International
Conference on Autonomous Agents and
Multiagent Systems},
pages={1744--1746},
year={2022}
}
|
|
Adversarial Attacks On Multi-Agent Communication
Tsun-Hsuan Wang*, James Tu*, Jingkang Wang, Sivabalan Manivasagam,
Mengye Ren, Raquel Urtasun
(* indicates equal contribution)
ICCV 2021, virtual
abstract |
bibtex |
arxiv
Growing at a fast pace, modern autonomous systems will
soon be deployed at scale, opening up the possibility for cooperative multi-agent systems. Sharing information and
distributing workloads allow autonomous agents to better
perform tasks and increase computation efficiency. However, shared information can be modified to execute adversarial
attacks on deep learning models that are widely employed in modern systems. Thus, we aim to study the robustness of such
systems and focus on exploring adversarial attacks in a novel multi-agent setting where communication is
done through sharing learned intermediate representations of neural networks. We observe that an indistinguishable
adversarial message can severely degrade performance, but becomes weaker as the number of benign agents increases.
Furthermore, we show that black-box transfer attacks are
more difficult in this setting when compared to directly perturbing the inputs, as it is necessary to align the distribution
of learned representations with domain adaptation. Our work studies robustness at the neural network level to contribute an additional layer of fault tolerance to modern
security protocols for more secure multi-agent systems.
@InProceedings{Tu_2021_ICCV,
author = {Tu, James and
Wang, Tsunhsuan and
Wang, Jingkang and
Manivasagam, Sivabalan and
Ren, Mengye and
Urtasun, Raquel},
title = {Adversarial Attacks on
Multi-Agent Communication},
booktitle = {ICCV},
month = {October},
year = {2021},
pages = {7768-7777}
}
|
|
V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction
Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, James Tu, Raquel Urtasun
ECCV 2020 (Oral), Glasgow virtual
abstract |
bibtex |
arxiv
In this paper, we explore the use of vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles. By intelligently aggregating the information received from multiple nearby vehicles, we can observe the same scene from different viewpoints. This allows us to see through occlusions and detect actors at long range, where the observations are very sparse or non-existent. We also show that our approach of sending compressed deep feature map activations achieves high accuracy while satisfying communication bandwidth requirements.
@inproceedings{wang2020v2vnet,
Author = {Wang, Tsun-Hsuan and
Manivasagam, Sivabalan and
Liang, Ming and
Bin, Yang and
Zeng, Wenyuan and
Tu, James and
Urtasun, Raquel},
Title = {V2VNet: Vehicle-to-Vehicle Communication
for Joint Perception and Prediction},
Booktitle = {ECCV},
Year = {2020}
}
|
|
Point-to-Point Video Generation
Tsun-Hsuan Wang*, Yen-Chi Cheng*, Chieh Hubert Lin, Hwann-Tzong Chen, Min Sun
(* indicates equal contribution)
ICCV 2019, Seoul
webpage |
abstract |
bibtex |
arxiv |
code
While image manipulation achieves tremendous breakthroughs (e.g., generating realistic faces) in recent years, video generation is much less explored and harder to control, which limits its applications in the real world. For instance, video editing requires temporal coherence across multiple clips and thus poses both start and end constraints within a video sequence. We introduce point-to-point video generation that controls the generation process with two control points: the targeted start- and end-frames. The task is challenging since the model not only generates a smooth transition of frames, but also plans ahead to ensure that the generated end-frame conforms to the targeted end-frame for videos of various length. We propose to maximize the modified variational lower bound of conditional data likelihood under a skip-frame training strategy. Our model can generate sequences such that their end-frame is consistent with the targeted end-frame without loss of quality and diversity. Extensive experiments are conducted on Stochastic Moving MNIST, Weizmann Human Action, and Human3.6M to evaluate the effectiveness of the proposed method. We demonstrate our method under a series of scenarios (e.g., dynamic length generation) and the qualitative results showcase the potential and merits of point-to-point generation.
@inproceedings{wang2019p2pvg,
Author = {Wang, Tsun-Hsuan and
Cheng, Yen-Chi and
Lin, Chieh Hubert and
Chen, Hwann-Tzong and
Sun, Min},
Title = {Point-to-Point Video Generation},
Booktitle = {ICCV},
Year = {2019}
}
|
|
3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization
Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
IROS 2019, Macao
webpage |
abstract |
bibtex |
arxiv |
code
The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception. Instead of directly fusing estimated depths across LiDAR and stereo modalities, we take advantages of the stereo matching network with two enhanced techniques: Input Fusion and Conditional Cost Volume Normalization (CCVNorm) on the LiDAR information. The proposed framework is generic and closely integrated with the cost volume component that is commonly utilized in stereo matching neural networks. We experimentally verify the efficacy and robustness of our method on the KITTI Stereo and Depth Completion datasets, obtaining favorable performance against various fusion strategies. Moreover, we demonstrate that, with a hierarchical extension of CCVNorm, the proposed method brings only slight overhead to the stereo matching network in terms of computation time and model size.
@inproceedings{wang2019ccvnorm,
Author = {Wang, Tsun-Hsuan and
Hu, Hou-Ning and
Lin, Chieh Hubert and
Tsai, Yi-Hsuan and
Chiu, Wei-Chen and
Sun, Min},
Title = {3D LiDAR and Stereo Fusion using Stereo
Matching Network with Conditional
Cost Volume Normalization},
Booktitle = {IROS},
Year = {2019}
}
|
|
Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation
Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
ICRA 2019, Montreal
webpage |
abstract |
bibtex |
arxiv |
code
We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input. Given any pre-trained depth prediction model, our PnP module updates the intermediate feature map such that the model outputs new depths consistent with the given sparse depths. Our method requires no additional training and can be applied to practical applications such as leveraging both RGB and sparse LiDAR points to robustly estimate dense depth map. Our approach achieves consistent improvements on various state-of-the-art methods on indoor (i.e., NYU-v2) and outdoor (i.e., KITTI) datasets. Various types of LiDARs are also synthesized in our experiments to verify the general applicability of our PnP module in practice.
@inproceedings{wang2019pnpdepth,
Author = {Wang, Tsun-Hsuan and
Wang, Fu-En and
Lin, Juan-Ting and
Tsai, Yi-Hsuan and
Chiu, Wei-Chen and
Sun, Min},
Title = {Plug-and-Play: Improve Depth Estimation
via Sparse Data Propagation},
Booktitle = {ICRA},
Year = {2019}
}
|
|
Liquid Pouring Monitoring via Rich Sensory Inputs
Tz-Ying Wu*, Juan-Ting Lin*, Tsun-Hsuang Wang, Chan-Wei Hu, Juan Carlos Niebles, Min Sun
(* indicates equal contribution)
ECCV 2018, Munich
webpage |
abstract |
bibtex |
arxiv
Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the ability to monitor the state of the task via rich sensory information is important but often less studied. In this work, we take liquid pouring as a concrete example and aim at learning to continuously monitor whether liquid pouring is successful (e.g., no spilling) or not via rich sensory inputs. We mimic humans’ rich sensories using synchronized observation from a chest-mounted camera and a wrist-mounted IMU sensor. Given many success and failure demonstrations of liquid pouring, we train a hierarchical LSTM with late fusion for monitoring. To improve the robustness of the system, we propose two auxiliary tasks during training: inferring (1) the initial state of containers and (2) forecasting the one-step future 3D trajectory of the hand with an adversarial training procedure. These tasks encourage our method to learn representation sensitive to container states and how objects are manipulated in 3D. With these novel components, our method achieves ~8% and ~11% better monitoring accuracy than the baseline method without auxiliary tasks on unseen containers and unseen users respectively.
@inproceedings{wu2019pouring,
Author = {Wu, Tz-Ying and
Lin, Juan-Ting and
Wang, Tsun-Hsuan and
Hu, Chan-Wei and
Niebles, Juan Carlos and
Sun, Min},
Title = {Liquid Pouring Monitoring via Rich
Sensory Inputs},
Booktitle = {ECCV},
Year = {2018}
}
|
|
Omnidirectional CNN for Visual Place Recognition and Navigation
Tsun-Hsuan Wang*, Hung-Jui Huang*, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun
(* indicates equal contribution)
ICRA 2018, Brisbane
webpage |
abstract |
bibtex |
arxiv |
code
Visual place recognition is challenging, especially when only a few place exemplars are given. To mitigate the challenge, we consider place recognition method using omnidirectional cameras and propose a novel Omnidirectional Convolutional Neural Network (O-CNN) to handle severe camera pose variation. Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place. With the ability to estimate relative distance, a heuristic policy is proposed to navigate a robot to the retrieved closest place. Note that the network is designed to take advantage of the omnidirectional view by incorporating circular padding and rotation invariance. To train a powerful O-CNN, we build a virtual world for training on a large scale. We also propose a continuous lifted structured feature embedding loss to learn the concept of distance efficiently. Finally, our experimental results confirm that our method achieves state-of-the-art accuracy and speed with both the virtual world and real-world datasets.
@inproceedings{wang2019omnicnn,
Author = {Wang, Tsun-Hsuan and
Huang, Hung-Jui and
Lin, Juan-Ting and
Hu, Chan-Wei and
Zeng, Kuo-Hao and
Sun, Min},
Title = {Omnidirectional CNN for Visual Place
Recognition and Navigation},
Booktitle = {ICRA},
Year = {2018}
}
|
Fall 2022, MathWorks Fellowship
Summer 2022, Finalist of Qualcomm Innovation Fellowship
Fall 2020, David S. Y. and Harold Wong Fellowship
Fall 2018, Appier Scholarship
Fall 2017, NTHU Matriculation Scholarship (MS)
Fall 2016, Academic Achievement Award
Summer 2014, Oversea Exchange Scholarship
Fall 2013, NTHU Matriculation Scholarship (BS)
August 2017, TA, Vision for Interaction, AI Summer School, MOST
Fall 2017, Head TA, Computer Vision, NTHU
March 2017, TA, Reinforcement Learning, TSMC
A huge thanks to template from this.
|
|