This section describes the methods in DJ-GOMP. Underlying DJ-GOMP is a jerk- and time-optimizing constrained motion planner based on an SQP. Because of the complexity of solving this SQP, computation time can far exceed the trajectory’s execution time. DJ-GOMP uses this SQP on a random set of pick-and-place inputs to generate training data (trajectories) to train a neural network. During pick-and-place operation, DJ-GOMP uses the neural network to compute an approximate trajectory for the given pick and place frames, which it then uses to warm start the SQP.
Jerk- and time-optimized trajectory generation
To generate a jerk- and time-optimized trajectory, DJ-GOMP extends the SQP formulated in GOMP (2). The solver for this SQP, following the method in TrajOpt (3) and summarized in Algorithm 1, starts with a discretized estimate of the trajectory τ as a sequence of H waypoints after the starting configuration, in which each waypoint represents the robot’s configuration q, velocity v, acceleration a, and jerk j at a moment in time. The waypoints are sequentially separated by tstep seconds. This discretization is collected into x(0), where the superscript represents a refinement iteration. Thus
The choice of H and tstep is application specific, although in physical experiments, we set tstep to match (an integer multiple of) the control frequency of the robot, and we set H such that H · tstep is an estimate of the upper bound of the minimum trajectory time for the workspace and task input distribution.
The initial value of x(0) seeds (or warm starts) the SQP computation. Without the approximation generated by the neural network (e.g., for training data set generation), this trajectory can be initialized to all zeros. In practice, the SQP can converge faster by first computing a trajectory between inverse kinematic solutions to g0 and gH with only the convex kinematic and dynamic constraints (described below).
In each iteration k = (0,1,2, …, m) of the SQP, DJ-GOMP linearizes the nonconvex constraints of obstacles and pick-and-place locations and solves a QP of the following form
where A defines constraints enforcing the trust region, joint limits, and dynamics, and P is defined such that xTPx is a sum-of-squared jerks. To enforce the linearized nonconvex constraints, DJ-GOMP adds constrained nonnegative slack variables penalized using appropriate coefficients in p. As DJ-GOMP iterates over the SQP, it increases the penalty term exponentially, terminating on the iteration m at which x(m) meets the nonconvex constraints.
Algorithm 1: Jerk-limited Motion Plan
Require: x(0) is an initial guess of the trajectory, h + 1 is the number of waypoints in x(0), tstep is the time between each waypoint, g0 and gH are the pick and place frames, βshrink ∈ (0,1), βgrow > 1, and γ > 1
1: μ ← initial penalty multiple
2: ϵtrust ← initial trust region size
3: k ← 0
4: P, p, A, b ← linearize x(0) as a QP
5: while μ < μmaxdo
6:
/* warm start with x(k) */
7: if sufficient decrease in trajectory cost then
8: k ← k + 1 /*accept trajectory */
9: ϵtrust ← ϵtrustβgrow /* grow trust region */
10: A, b ← update linearization using x(k)
11: else
12: ϵtrust ← ϵtrustβshrink /* shrink trust region */
13: b ← update trust region bounds only
14: if ϵtrust < ϵmin_trustthen
15: μ ← γμ /* increase penalty */
16: ϵtrust ← initial trust region size
17: p ← update penalty in QP to match μ
18: return x(k)
To enforce joint limits and dynamic constraints, Algorithm 1 creates a matrix A and a vector b that enforce the following linear inequalities on joint limits
and the following equalities that enforce dynamic constraints between variables
In addition, Algorithm 1 linearizes nonconvex constraints by adding slack variables to implement L1 penalties. Thus, for a nonconvex constraint gj(x) ≤ c, the algorithm adds the linearization
as a constraint in the form
where μ is the penalty, and the slack variables are constrained such that
and
.
In the QP, obstacle avoidance constraints are linearized on the basis of the waypoints of the current iteration’s trajectory (Algorithm 2). To compute these constraints, the algorithm evaluates the spline
between each pair of waypoints (xt, xt + 1) against a depth map of obstacles to find the time s ∈ [0, tstep) and corresponding configuration
that minimizes signed distance separation from any obstacle. In this evaluation, a negative signed distance means that the configuration is in collision. The algorithm then uses this
to computes a separating hyperplane in the form nTq + d = 0. The hyperplane is either the top plane of the obstacle it is penetrating or the plane that separates
from the nearest obstacle (see Fig. 8). By selecting the top plane of the penetrated obstacle, this pushes the trajectory above the obstacle, which is a specialization of TrajOpt’s more general obstacle avoidance approach that is useful in bin picking. By selecting the hyperplane of the nearest obstacle, the algorithm keeps the trajectory from entering the obstacle. The linearize constraint for this point is
where
is the Jacobian of the robot’s position at
. Because
and
are at an interpolated state between optimization variables at xt and xt + 1, linearizing this constraint requires computing the chain rule for the Jacobian
where
is the Jacobian of the position at configuration qt, and Jq(s) is the Jacobian of the configuration on the spline at s
The constraint linearization process keeps the trajectory away from obstacles by adding constraints based on the Jacobian of the configuration at each waypoint of the accepted trajectory x(k). In this figure, the obstacle is shown from the side, the robot’s path along part of x(k) is shown in blue, and the constraints’ normal projections into Euclidean space are shown in red. For waypoints that are outside the obstacle (A), constraints keep the waypoints from entering the obstacle. For waypoints that are inside the obstacle (B), constraints push the waypoints up out of the obstacle. If the algorithm adds constraints only at waypoints as in (C), the optimization can compute trajectories that collide with obstacles and produce discontinuities between trajectories with small changes to the pick or place frame. These effects are mitigated when obstacles are inflated to account for them, but the discontinuities can lead to poor results when training the neural network. The proposed algorithm adds linearized constraints to account for collision between obstacles, leading to more consistent results shown in (D).
We observe that linearization at each waypoint will safely avoid obstacles with a sufficient buffer around obstacles (e.g., via a Minkowski difference with obstacles); however, slight variations in pick or place frames can shift the alignment of waypoints to obstacles. This shift of alignment (see Fig. 8C) can lead to solutions that vary disproportionately to small changes in input. Although this may be acceptable in operation, it can lead to data that can be difficult for a neural network to learn.
Algorithm 2: Linearize Obstacle-Avoidance Constraint
1: for t ∈ [0, H) do
2: (nmin, dmin) ← linearize obstacle nearest to p(qt)
3: qmin ← qt
4: for all s ∈ [0, tstep) do /* line search s to desired resolution */
5:
6: (ns, ds)← linearize obstacle nearest to p(qs)
7: if
then /* compare signed distance */
8: (nmin, dmin, qmin) ← (ns, ds, qs)
9: Jq ← Jacobian of qs
10: Jp ← Jacobian of position at qmin
11:
12:
/* lower bound with slack
*/
13: Add
and
to set of linearconstraints in QP
As with GOMP, DJ-GOMP allows degrees of freedom in rotation and translation to be added to start and goal grasp frames. Adding this degree of freedom allows DJ-GOMP to take a potentially shorter path when an exact pose of the end effector is unnecessary. For example, a pick point with a parallel-jaw gripper can rotate about the axis defined by antipodal contact points (see Fig. 2), and the pick point with a suction gripper can rotate about the normal of its contact plane. Similarly, a task may allow for a place point anywhere within a bounded box. The degrees of freedom about the pick point can be optionally added as constraints that are linearized as
where
and
are the configuration and Jacobian of the first waypoint in the accepted trajectory,
is one of variables the QP is minimizing, and wmin ≤ wmax defines the twist allowed about the pick point. Applying a similar set of constraints to gH allows degrees of freedom in the place frame as well.
The SQP establishes trust regions to constrain the optimized trajectory to be within a box with extents defined by a shrinking trust region size. Because convex constraints on dynamics enforce the relationship between configuration, velocity, and acceleration of each waypoint, we observe that trust regions only need to be defined as box bounds around one of the three for each waypoint. In experiments, we established trust regions on configurations.
Algorithm 3: Time-optimal Motion Plan
Require: g0 and gH are the start and end frames, γ > 1 is the search bisection ratio
1: Hupper ← fixed or estimated upper limit of maximum time
2: Hlower ← 3
3: vupper ← ∞ /* constraint violation */
4: while vupper> tolerance do /* find upper limit */
5: (xupper, vupper) ← call Alg. 1 with cold-start trajectory for Hupper
6: Hupper ← max(Hupper + 1, ⌈γ Hupper⌉)
7: while Hlower < Hupperdo /* search for shortest H */
8: Hmin ← Hlower + ⌊(Hupper − Hlower)/γ⌋
9: (xmid, vmid) ← call Alg. 1 with warm-start trajectory xupper interpolated to Hmid
10: if vmid≤ tolerance then
11: (Hupper, xupper, vupper) ← (Hmid, xmid, vmid)
12: else
13: Hlower ← Hmid + 1
14: return xupper
To find the minimum time-time trajectory, J-GOMP searches for the shortest jerk-minimized trajectory that solves all constraints. This search, shown in Algorithm 3, starts with a guess of H and then performs an exponential search for the upper bound, followed by a binary search for the shortest H by repeatedly performing the SQP of Algorithm 1. The binary search warm starts each SQP with an interpolation of the trajectory of the current upper bound of H. The search ends when the upper and lower bounds of H are the same.