四轴飞行器算法研究

07-09

2005 IEEE/RSJ International Conference on Intelligent Robots and Systems

Multi-Agent Quadrotor Testbed Control Design:

∗

Integral Sliding Mode vs. Reinforcement Learning

Steven L. Waslander †, Gabriel M. Hoffmann †, ‡

Ph.D. Candidate

Aeronautics and Astronautics

Stanford University

{stevenw, gabeh }@stanford.edu

Jung Soon Jang

Research Associate

Aeronautics and Astronautics

Stanford University [email protected]

Claire J. Tomlin

Associate Professor

Aeronautics and Astronautics

Stanford University [email protected]

Abstract —The Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control (STARMAC)is a multi-vehicle testbed currently comprised of two quadrotors, also called X4-ﬂyers,with capacity for eight. This paper presents a comparison of control design techniques, speciﬁcallyfor outdoor altitude control, in and above ground effect, that accommodate the unique dynamics of the aircraft. Due to the complex airﬂowin-duced by the four interacting rotors, classical linear techniques failed to provide sufﬁcientstability. Integral Sliding Mode and Reinforcement Learning control are presented as two design techniques for accommodating the nonlinear disturbances. The methods both result in greatly improved performance over classical control techniques.

I. I NTRODUCTION

As ﬁrstintroduced by the authors in [1],the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Con-trol (STARMAC)is an aerial platform intended to validate novel multi-vehicle control techniques and present real-world problems for further investigation. The base vehicle for STARMAC is a four rotor aircraft with ﬁxedpitch blades, referred to as a quadrotor, or an X4-ﬂyer.They are capable of

15minute outdoor ﬂightsin a 100m square area [1].

Fig. 1. One of the STARMAC quadrotors in action.

There have been numerous projects involving quadrotors to date, with the ﬁrstknown hover occurring in October, 1922[2].Recent interest in the quadrotor concept has been sparked by commercial remote control versions, such as the

supported by ONR under the MURI contract N00014-02-1-0720called “CoMotion:Computational Methods for Collaborative Motion”,by the NASA Joint University Program under grant NAG 2-1564, and by NASA grant NCC 2-5536.

†These authors contributed equally to this work.

‡Funding provided by National Defense Science and Engineering Grant.

∗Research

DraganFlyer IV [3].Many groups [4]–[7]have seen signif-icant success in developing autonomous quadrotor vehicles. To date, however, STARMAC is the only operational multi-vehicle quadrotor platform capable of autonomous outdoor ﬂight,without tethers or motion guides.

The ﬁrstmajor milestone for STARMAC was autonomous hover control, with closed loop control of attitude, altitude and position. Using inertial sensing, the attitude of the aircraft is simple to control, by applying small variations in the relative speeds of the blades. In fact, standard integral LQR techniques were applied to provide reliable attitude stability and tracking for the vehicle. Position control was also achieved with an integral LQR, with careful design in order to ensure spectral separation of the successive loops. Unfortunately, altitude control proves less straightforward. There are many factors that affect the altitude loop specif-ically that do not amend themselves to classical control techniques. Foremost is the highly nonlinear and destabilizing effect of four rotor downwashes interacting. In our experi-ence, this effect becomes critical when motion is not damped by motion guides or tethers. Empirical observation during manual ﬂightrevealed a noticeable loss in thrust upon descent through the highly turbulent ﬂowﬁeld.Similar aerodynamic phenomenon for helicopters have been studied extensively [8],but not for the quadrotor, due to its relative obscurity and complexity. Other factors that introduce disturbances into the altitude control loop include blade ﬂex,ground effect and battery discharge dynamics. Although these effects are also present in generating attitude controlling moments, the differential nature of the control input eliminates much of the absolute thrust disturbances that complicate altitude control. Additional complications arise from the limited choice in low cost, high resolution altitude sensors. An ultrasonic ranging device [9]was used, which suffers from non-Gaussian noise—falseechoes and dropouts. The resulting raw data stream includes spikes and echoes that are difﬁcultto mitigate, and most successfully handled by rejection of infeasible measurements prior to Kalman ﬁltering.

In order to accommodate this combination of noise and disturbances, two distinct approaches are adopted. Integral Sliding Mode (ISM)control [10]–[12]takes the approach that the disturbances cannot be modeled, and instead designs

a control law that is guaranteed to be robust to disturbances as long as they do not exceed a certain magnitude. Model-based reinforcement learning [13]creates a dynamic model based on recorded inputs and responses, without any knowledge of the underlying dynamics, and then seeks an optimal control law using an optimization technique based on the learned model. This paper presents an exposition of both methods and contrasts the techniques from both a design and implementation point of view.

II. S YSTEM D ESCRIPTION

STARMAC consists of a ﬂeetof quadrotors and a ground station. The system communicates over a Bluetooth Class 1network. The core of the aircraft are microcontroller circuit boards designed and assembled at Stanford, for this project. The microcontrollers run real-time control code, interface with sensors and the ground station, and supervise the system. The aircraft are capable of sensing position, attitude, and proximity to the ground. The differential GPS receiver is the Trimble Lassen LP, operating on the L1band, providing 1Hz updates. The IMU is the MicroStrain 3DM-G, a low cost, light weight IMU that delivers 76Hz attitude, attitude rate, and acceleration readings. The distance from the ground is found using ultrasonic ranging at 12Hz.

The ground station consists of a laptop computer, to interface with the aircraft, and a GPS receiver, to provide differential corrections. It also has a battery charger, and joysticks for control-augmented manual ﬂight,when desired.

III. Q UADROTOR D YNAMICS

The derivation of the nonlinear dynamics is performed in North-East-Down (NED)inertial and body ﬁxedcoordinates. Let {e N , e E , e D }denote the inertial axes, and {x B , y B , z B }denote the body axes, as deﬁnedin Figure 2. Euler angles of the body axes are {φ,θ,ψ}with respect to the e N , e E and e D axes, respectively, and are referred to as roll, pitch and yaw. Let r be deﬁnedas the position vector from the inertial origin to the vehicle center of gravity (CG),and let ωB be deﬁnedas the angular velocity in the body frame. The current velocity

direction is referred to as e v in inertial coordinates.

aerodynamic torque, Q i , and thrust, T i , both parallel to the rotor’saxis of rotation, and both used for vehicle control.

Here, T i ≈u i , where u i is the voltage applied to the motors, as determined from a load cell test. In ﬂight,T i can vary greatly from this approximation. The torques, Q i , are proportional to the rotor thrust, and are given by Q i =k r T i . Rotors 1and 3rotate in the opposite direction as rotors 2and 4, so that counteracting aerodynamic torques can be used independently for yaw control. Horizontal velocity results in a moment on the rotors, R i , about −e v , and a drag force, D i , in the direction, −e v .

The body drag force is deﬁnedas D B , vehicle mass is m , acceleration due to gravity is g , and the inertia matrix is I ∈R 3×3. A free body diagram is depicted in Figure 2. The total force, F , and moment, M , can be summed as,

F =−D B e v +mg e D +M =

4 i =1

(−T i z B −D i e v ) (1)

(Q i z B −R i e v −D i (r i ×e v ) +T i (r i ×z B ))

(2)

m ¨r

I ω˙B +ωB ×IωB

(3)

The full nonlinear dynamics can be described as,

where the total angular momentum of the rotors is assumed to be near zero, because they are counter-rotating. Near hover conditions, the contributions by rolling moment and drag can be neglected 4in Equations (1)and (2).Deﬁnethe total thrust as T =i =1T i . The translational motion is deﬁnedby,

m ¨r =F =−R ψ·R θ·R φT z B +mg e D

(4)

where R φ, R θ, and R ψare the rotation matrices for roll,

pitch, and yaw, respectively. Applying the small angle ap-proximation to the rotation matrices,

⎤⎤⎡⎤⎡⎤⎡⎡

001ψθr ¨x

¨y ⎦=⎣ψ1φ⎦⎣0⎦+⎣0⎦(5)m ⎣r

mg −T θ−φ1r ¨z Finally, assuming total thrust approximately counteracts grav-¯=mg , except in the e D axis, ity, T ≈T

⎤⎡⎤⎤⎡⎤⎡⎡

¯0φ00−T r ¨x

¯00⎦⎣θ⎦(6)¨m ⎣r y ⎦=⎣0⎦+⎣T 001T r ¨mg z For small angular velocities, the Euler angle accelerations

are determined from Equation (3)by dropping the second order term, ω×Iω, and expanding the thrust into its four constituents. The angular equations become,

⎡⎤

⎤⎡⎤⎡T 1¨I x φ0l 0−l ⎢T 2⎥

⎥¨⎦=⎣l ⎣I y θ0−l 0⎦⎢⎣T 3⎦(7)

¨K r −K r K r −K r I z ψT

Fig. 2. Free body diagram of a quadrotor aircraft.

The rotors, numbered 1−4, are mounted outboard on the

x B , y B , −x B and −y B axes, respectively, with position vectors r i with respect to the CG. Each rotor produces an

where the moment arm length l =||r i ×z B ||is identical for all rotors due to symmetry. The resulting linear models can now be used for control design.

IV. E STIMATION AND C ONTROL D ESIGN

Applying the concept of spectral separation, inner loop control of attitude and altitude is performed by commanding motor voltages, and outer loop position control is performed by commanding attitude requests for the inner loop. Accurate attitude control of the plant in Equation (7)is achieved with an integral LQR controller design to account for thrust biases. Position estimation is performed using a navigation ﬁlterthat combines horizontal position and velocity information from GPS, vertical position and estimated velocity information from the ultrasonic ranger, and acceleration and angular rates from the IMU in a Kalman ﬁlterthat includes bias estimates. Integral LQR techniques are applied to the horizontal com-ponents of the linear position plant described in Equation (6).The resulting hover performance is shown in Figure 6.

As described above, altitude control suffers exceedingly from unmodeled dynamics. In fact, manual command of the throttle for altitude control remains a challenge for the authors to this day. Additional complications arise from the ultrasonic ranging sensor, which has frequent erroneous readings, as seen in Figure 3. To alleviate the effect of this noise, rejection of infeasible measurements is used to remove much of the non-Gaussian noise component. This is followed by altitude and altitude rate estimation by Kalman ﬁltering,which adds lag to the estimate. This section proceeds with a derivation of two control techniques that can be used to overcome the unmodeled dynamics and the remaining noise.

is assumed that ξ(·) satisﬁes ξ ≤γ, where γis the upper bounded norm of ξ(·) .

In early attempts to stabilize this system, it was observed that LQR control was not able to address the instability and performance degradation due to ξ(g, x ) . Sliding Mode Control (SMC)was adapted to provide a systematic approach to the problem of maintaining stability and consistent perfor-mance in the face of modeling imprecision and disturbances. However, until the system dynamics reach the sliding mani-fold, such nice properties of SMC are not assured. In order to provide robust control throughout the ﬂightenvelope, the Integral Sliding Mode (ISM)technique is applied.

The ISM control is designed in two parts. First, a standard successive loop closure is applied to the linear plant. Second, integral sliding mode techniques are applied to guarantee disturbance rejection. Let

u u p

=u p +u d

=−K p x 1−K d x 2

(9)

where K p and K d are proportional and derivative loop gains that stabilize the linear dynamics without disturbances. For disturbance rejection, a sliding surface, s , is designed,

s =s 0(x 1, x 2) +z s 0=α(x 2+kx 1)

(10)

such that state trajectories are forced towards the manifold s =0. Here, s 0is a conventional sliding mode design, z is an additional term that enables integral control to be included, and α,k ∈R are positive constants. Based on

s , the the following Lyapunov function candidate, V =˙

guranteeing convergence to the sliding manifold.

˙=s s ˙1) +z ˙V ˙=s α(x ˙2+k x

be guaranteed to satisfy,

s u d +ξ(g, x ) 0(12)Since the disturbances, ξ(g, x ) , are bounded by γ, deﬁneu d

to be u d =−λswith λ∈R . Equation (11)becomes,

˙V =s α(−λs+ξ(g, x )

(13)≤α−λ|s |+γ|s |0. As a result, for u p and

u d as above, the sliding mode condition holds when,

γ|s |>(14)

With the input derived above, the dynamics are guaranteed to evolve such that s decays to within the boundary layer,

Fig. 3. Characteristic unprocessed ultrasonic ranging data, displaying spikes, false echoes and dropouts. Powered ﬂightcommences at 185seconds.

A. Integral Sliding Mode Control

A linear approximation to the altitude error dynamics of a quadrotor aircraft in hover is given by,

x ˙1x ˙2

=x 2

=u +ξ(g, x )

(8)

{(r z,des −r z ) , (r ˙z,des −r ˙z ) }are the altitude where {x 1, x 2}= 4

error states, u =i =1u i is the control input, and ξ(·) is a bounded model of disturbances and dynamic uncertainty. It

of the sliding manifold. Additionally, the system does not suffer from input chatter as conventional sliding mode controllers do, as the control law does not include a switching function along the sliding mode.

V. R EINFORCEMENT L EARNING C ONTROL

An alternate approach is to implement a reinforcement learning controller. Much work has been done on continuous state-action space reinforcement learning methods [13],[14].For this work, a nonlinear, nonparametric model of the system is ﬁrstconstructed using ﬂightdata, approximating the system as a stochastic Markov process [15],[16].Then a model-based reinforcement learning algorithm uses the model in policy-iteration to search for an optimal control policy that can be implemented on the embedded microprocessors. In order to model the aircraft dynamics as a stochas-tic Markov process, a Locally Weighted Linear Regression (LWLR)approach is used to map the current state, S (t ) ∈R n s , and mate, ˆinput, u (t ) ∈R n u , onto the subsequent state esti-(t +1) . In this application, S =[r z r ˙where S

V is the battery level. In the altitude loop, z r ¨the z V ], input, u mapping ∈R , is is the the summation total motor of power, the traditional u . The subsequent LWLR estimate, state using the current state and input, with the random vector, v ∈R n s , representing unmodeled noise. The value for v is drawn from the distribution of output error as determined by using a maximum likelihood estimate [16]of the Gaussian noise in the LWLR estimate. Although the true distribution is not perfectly Gaussian, this model is found to be adequate. The LWLR method [17]is well suited to this problem, as it ﬁtsa non-parametric curve to the local structure of the data. The scheme extends least squares by assigning weights to each training data point according to its proximity to the input value, for which the output is to be computed. The technique requires a sizable set of training data in order to reﬂectthe full dynamics of the system, which is captured from ﬂightsﬂownunder both automatic and manually controlled thrust, with the attitude states under automatic control.

For m training data points, the input training samples are stored in X ∈R (m ) ×(n s +n u +1), and the outputs correspond-ing to those inputs are stored in Y ∈R m ×n s . These matrices are deﬁned⎡as

1S (t T 1)

u (t T

1) ⎤⎡T ⎤X =⎢⎣. . . . . . . . . ⎥S (t 1+1) ⎦, Y =⎢⎣. . . ⎥⎦1

S (t m )

u (t m )

S (t m +1)

(15)

The column of ones in X enables the inclusion of a constant offset in the solution, as used in linear regression.

The diagonal weighting matrix W ∈R m ×m , which acts on X , has one diagonal entry for each training data point. That entry gives more weight to training data points that are close

to the S (t ) and u (t ) for which S

ˆ(t +1) is to be computed. The distance measure used in this work is

W i,i =exp

−||x (i ) −x ||

(16)

where x (i ) is the i th row of X , x is the vector

[the 1range S (t ) T of inﬂuenceu (t ) T

], and of training ﬁtparameter points. τThe is value used for to adjust τcan be tuned by cross validation to prevent over-or under-ﬁttingthe data. Note that it may be necessary to scale the columns before taking the Euclidean norm to prevent undue inﬂuenceof one state on the W matrix.

The subsequent state estimate is computed by summing the LWLR estimate with v ,

S ˆ(t +1) = X T

W X −1X T W T x +v (17)Because W is a continuous function of x and X , as x is

varied, the resulting estimate is a continuous non-parametric curve capturing the local structure of the data. The matrix computations, in code, exploit the large diagonal matrix W ; as each W i,i is computed, it is multiplied by row x (i ) , and stored in W X .

The matrix being inverted is poorly conditioned, because weakly related data points have little inﬂuence,so their contribution cannot be accurately numerically inverted. To more accurately compute the numerical inversion, one can

perform a singular value decomposition, (X T W X ) =U ΣV T

. Then, numerical error during inversion can be avoided by using the n singular values σthe value of C i with values of i

where max is chosen by cross validation. In this work, C error, and was max typically ≈10satisﬁedwas found by n to =minimize 1. The inverse numerical can be directly computed using the n upper singular values in the diagonal matrix Σn ∈R n ×n , and the corresponding singular vectors, in U model n ∈becomes

R m ×n and V n ∈R m ×n . Thus, the stochastic Markov S ˆ(t +1) =V T n Σ−n

1U T n X T W x +v (18)

Next, model-based reinforcement learning is implemented,

incorporating the stochastic Markov model, to design a controller. A quadratic reward function is used,

R (S , S ref ) =−c 1(r z −r z,ref ) 2−c 2r ˙2

(19)

where R :R 2n s →R , c 1>0and c 2>0are constants

giving reward for accurate tracking and good damping re-spectively, and S ref =[r ˙reference state desired for z,ref r

the the system.

z,ref r ¨z,ref V ref ]is The control policy maps the observed state S onto the input command u . In this work, the state space has the constraint of r 0≤z u ≤≥u 0, and the input command policy is chosen has the to constraint be of max . The control π(S , w ) =w 1+w 2(r z −r z,ref ) +w 3r ˙z +w 4r ¨z

(20)

where w ∈R n c is the vector of policy coefﬁcientsw 1,... , w n c . Linear functions were sufﬁcientto achieve good stability and performance. Additional terms, such as battery level and integral of altitude error, could be included to make the policy more resilient to differing ﬂightconditions.

Policy iteration is performed as explained in Algorithm 1. The algorithm aims to ﬁndthe value of w that yields the greatest total reward R total , as determined by simulating the system over a ﬁnitehorizon from a set of random initial conditions, and summing the values of R (S , S ref ) at each state encountered.

Algorithm 1Model-Based Reinforcement Learning 1:Generate set S 0of random initial states

2:Generate set T of random reference trajectories 3:Initialize w to reasonable values 4:R best ←−∞, w best ←w 5:repeat 6:R total ←07:for s 0∈S 0, t ∈T do 8:S (0)←s 09:for t =0to t max −1do 10:u (t ) ←π(S (t ) , w ) 11:S (t +1) ←LW LR (S (t ) , u (t )) +v 12:R total ←R total +R (S (t +1)) 13:end for 14:end for 15:if R total >R best then 16:R best ←R total , w best ←w 17:end if 18:Add Gaussian random vector to w best , store as w 19:until w best converges

In policy iteration, a ﬁxedset of random initial conditions and reference trajectories are used to simulate ﬂightsat each iteration, with a given policy parameterized by w . It is neces-sary to use the same random set at each iteration in order for convergence to be possible [15].After each iteration, the new value of w is stored as w best if it outperforms the previous best policy, as determined by comparing R total to R best , the previous best reward encountered. Then, a Gaussian random vector is added to w best . The result is stored as w , and the simulation is performed again. This is iterated until the value of w best remains ﬁxedfor an appropriate number of iterations, as determined by the particular application. The simulation results must be examined to predict the likely performance of the resulting control policy.

By using a Gaussian update rule for the policy weights, w , it is possible to escape local maxima of R total . The highest probability steps are small, and result in reﬁnementof a solution near a local maximum of R total . However, if the algorithm is not at the global maximum, and is allowed to continue, there exists a ﬁniteprobability that a sufﬁciently

large Gaussian step will be performed such that the algorithm can keep ascending.

VI. F LIGHT T EST R ESULTS

A. Integral Sliding Mode

The results of an outdoor ﬂighttest with ISM control can be seen in Figure 4. The response time is on the order of 1-2seconds, with 5seconds settling time, and little to no steady state offset. Also, an oscillatory character can be seen in the response, which is most likely being triggered by the nonlinear aerodynamic effects and sensor data spikes described earlier.

Compared to linear control design techniques implemented on the aircraft, the ISM control proves a signiﬁcantenhance-ment. By explicitly incorporating bounds on the unknown disturbance forces in the derivation of the control law, it is possible to maintain stable altitude on a system that has evaded standard approaches. B. Reinforcement Learning Control

One of the most exciting aspects of RL control design is its ease of implementation. The policy iteration algorithm arrived at the implemented control law after only 3hours on a Pentium IV computer. Figure 5presents ﬂighttest results for the controller. The high ﬁdelitymodel of the system, used for RL control design, provides a useful tool for comparison of the RL control law with other controllers. In fact, in simulation with linear controllers that proved unstable on the quadrotor, ﬂightpaths with growing oscillations were predicted that closely matched real ﬂightdata.

The locally weighted linear regression model showed many relations that were not reﬂectedin the linear model, but that reﬂectthe physics of the system well. For instance, with all other states held ﬁxed,an upward velocity results in more acceleration at the subsequent time step for a throttle level, and a downward velocity yields the opposite effect. This is essentially negative damping. The model also shows a strong ground effect. That is, with all other states held ﬁxed,the closer the vehicle is to the ground, the more acceleration it will have at the subsequent time step for a given throttle level.

Time [s]

Fig. 5. Reinforcement learning controller response to manually applied step input, in outdoor ﬂighttest. Spikes in state estimates are from sensor noise passing through the Kalman ﬁlter.

VII. C ONCLUSION

This paper summarizes the development of an autonomous quadrotor capable of extended outdoor trajectory tracking control. This is the ﬁrstdemonstration of such capabilities on a quadrotor known to the authors, and represents a critical step in developing a novel, easy to use, multi-vehicle testbed for validation of multi-agent control strategies for au-tonomous aerial robots. Speciﬁcally,two design approaches were presented for the altitude control loop, which proved a challenging hurdle. Both techniques resulted in stable controllers with similar response times, and were a signiﬁcantimprovement over linear controllers that failed to stabilize the system adequately.

Acknowledgments

The authors would like to thank Dev Gorur Rajnarayan and David Dostal for their many contributions to STARMAC development and testing, as well as Prof. Andrew Ng of Stanford University for his advice and guidance in developing the Reinforcement Learning control.

R EFERENCES

[1]Hoffmann, G., Rajnarayan, D. G., Waslander, S. L., Dostal, D., Jang,

J. S., and Tomlin, C. J., “TheStanford Testbed of Autonomous Ro-torcraft for Multi-Agent Control (STARMAC),”23rd Digital Avionics System Conference , Salt Lake City, UT, November 2004.

[2]Lambermont, P., Helicopters and Autogyros of the World , 1958. [3]DraganFly-Innovations, “www.rctoys.com,”2003.

[4]Pounds, P., Mahony, R., Hynes, P., and Roberts, J., “Designof a

Four-Rotor Aerial Robot,”Australian Conference on Robotics and Automation , Auckland, November 2002.

[5]Altug, E., Ostrowski, J. P., and Taylor, C. J., “QuadrotorControl Using

Dual Camera Visual Feedback,”ICRA , Taipei, September 2003.

[6]Bouabdallah, S., Murrieri, P., and Siegwart, R., “Designand Control

of an Indoor Micro Quadrotor,”ICRA , New Orleans, April 2004. [7]Castillo, P., Dzul, A., and Lozano, R., “Real-TimeStabilization and

Tracking of a Four-Rotor Mini Rotorcraft,”IEEE Transactions on Control Systems Technology , V ol. 12, No. 4, 2004.

[8]Bramwell, A., Done, G., and Blamford, D., Bramwell’sHelicopter

Dynamics , Butterworth-Heinemann, 2nd ed., 2001.

[9]Devantech, “http://www.robot-electronics.co.uk/htm/srf08tech.shtml,”

SRF08Ultrasonic Ranger. ¨[10]Utkin, V ., Guldner, J., and Shi, J., Sliding Mode Control in Electro-mechanical Systems , Taylor-Francis Inc., 1999.

[11]Khalil, H. K., Nonlinear Systems , Prentice Hall, 1996.

[12]Jang, J. S., Nonlinear Control Using Discrete-Time Dynamic Inversion

Under Input Saturation:Theory and Experiment on the Stanford DragonFly UAVs , Ph.D. thesis, Stanford University, 2004.

[13]Sutton, R. S. and Barto, A. G., Reinforcement Learning:An Introduc-tion , MIT Press, Cambridge, MA, 1998.

[14]Doya, K., Samejima, K., ichi Katagiri, K., and Kawato, M., “Multiple

Model-based Reinforcement Learning,”Tech. rep., Kawato Dynamic Brain Project Technical Report, KDB-TR-08, Japan Science and Tech-nology Corporatio, June 2000.

[15]Ng, A. Y . and Jordan, M. I., “PEGASUS:A policy search method

for large MDPs and POMDPs,”Uncertainty in ArtiﬁcialIntelligence , 2000.

[16]Ng, A. Y ., Coates, A., Diel, M., Ganapathi, V ., Schulte, J., Tse,

B., Berger, E., and Liang, E., “Autonomousinverted helicopter ﬂightvia reinforcement learning,”International Symposium on Experimental Robotics, , 2004.

[17]Atkeson, C. G., Moore, A. W., and Schaal, S., “LocallyWeighted

Learning,”ArtiﬁcialIntelligence Review , V ol. 11, No. 1-5, 1997, pp. 11–73.

The reinforcement learning control law is susceptible to system disturbances for which it is not trained. In particular, varying battery levels and blade degradation may cause a reduction in stability or steady state offset. Addition of an integral error term to the control policy may prove an effective means of mitigating steady state disturbances, as was seen in the ISM control law.

Comparison of the step response for ISM and RL control reveals both stable performance and similar response times, although the transient dynamics of the ISM control are more pronounced. RL does, however, have the advantage that it incorporates accelerometer measurement into its control, and as such uses a more direct measurement of the disturbances imposed on the aircraft. C. Autonomous Hover

Applying ISM altitude control and integral LQR position control techniques, ﬂighttests were performed to achieve the goal of autonomous hover. Position response was maintained within a 3m circle for the duration of a two minute ﬂight(seeFigure 6), which is well within the expected error bound for the L1band differential GPS used.

Fig. 6. Autonomous hover ﬂightrecorded position, with 3m error circle.

与《四轴飞行器算法研究》相关的范文

06-26 网络传媒系工作总结

网络传媒系工作总结时光飞逝，一个学期过去了，回顾这一年所从事的教学工作，总的说来是比较顺利地完成任务。在工作中我享受到收获的喜悦，当然也发现一些问题。现将本学年工作情况总结如下: 在思想方面，本人能积极参加政治学习，关心国家大事，拥护党中央的正确领导，坚持四项基本原则，拥护党的各项方针政策，遵守劳动纪律，团结同志；教育目的明确，态度端正，钻研业务，勤奋刻苦。从教学上讲我主要做了这样一些工作： ...

12-30 民航加强飞行队伍的思想政治工作经验材料

把握特点与时俱进努力做好飞行队伍思想政治工作　　当前，我们正处在一个深刻变动而又极其复杂的时代环境之中。新的历史条件，使民航面临难得的机遇，又面临着严峻的挑战。以胡锦涛同志为核心的党中央在迈进新世纪的关键时刻，提出了重温“两个务必”，并在十六届四中全会上作出了“中共中央关于加强党的执政能力建设的决定”的重大决策，号召全党掀起学习“三个代表”重要思想新高潮，为我们加强党的建设，为民航深化改革指明了 ...

08-02 下学期高一数学教学计划2

本学期我负责07电子和07幼师（1）（2）共三个班的数学教学工作，本学期所选用的教材是是根据教育部职业教育与成人教育司的规划，人民教育出版社组织中等职业学校的数学教学研究人员和数学教师组成编写组，并结合中等职业学校学生学习的实际编写了这套文化基础课程教材。这套教材的主要特点是： 1.注重基础，降低知识起点该教材在编写中，以中学的基础知识与基本方法为纲，使学生在掌握基础知识和基本方法的基础上能够解 ...

08-17 标兵现场演讲稿

标兵现场演讲稿尊敬的各位老师，亲爱的同学们：大家晚上好！我是来自电子与信息学院08级的同学曹x，今天很荣幸站在这里与大家一起回忆我大学生活的点点滴滴。把老师和家长的期望背在肩上，将高中岁月获得的荣誉藏进行囊，我在自己18岁生日的那一天走进了华南理工大学，人生的新一段旅程开始起航。刚进大学的时候，和很多人一样，告别了“小学生、初中生、高中生”的身份，我在思考如何重新诠释“大学生”这个充满希 ...

07-14 新课程理念下课堂教学的几点思考

　　随着新一轮国家课程教材改革实验的逐步实施，基础教育的课程环境得到了极大的改善。数学成为开发儿童潜能的重要工具，动手实践、自主探索、合作交流成为数学主要的学习方式，情感、态度、价值观已成为数学教学的重要目标，这一切使数学课堂教学发生了深刻的变化。有些教师意识到新课改的重要性，并尝试将这些理论应用于自己的课堂教学中，可是由于种种原因却遭到了失败，于是这些教师宁可对这种教学采取敬而远之的态度。我们 ...

11-04 长空铸剑影评观后感

近期单位组织大家观看了<长空铸剑>这部电视剧,本剧以我们空军飞行部队日常飞行训练为素材,紧扣如何提高部队战斗力这一历史性主题,尖锐地揭示了实际工作中存在的一些问题.可以说本剧给我们上了生动的一课,对大家的教育,启发意义深远,使广大官兵更加深刻地认识到我们的工作中心是什么,部队的现状如何.部队需要什么.我们作为朝气蓬勃的年轻干部.新生力量该向什么方向努力. 本剧给我感触最深,使我久久无法 ...

05-19 班组建设汇报材料

下面我代表飞行部分工会汇报飞行部班组建设情况。一、基本情况飞行部班组由三个部分组成，一是分部班组，二是机关班组，三是食堂班组。分部班组目前有两种考虑，一种是以飞行机组为班组，每一个机长为班组长，人员不固定，流动性强，管理难度大；另一种则是将每个分部划分为五个小分队，每个小分队指定一名骨干机长负责，人员相对固定。目前，第二种班组划分方法，正在讨论之中。机关班组则是以三个办公室为主体，分别成立班组 ...

02-27 现代科技文阅读六

·现代科技文阅读六　　现代科技文阅读生物全息律　　（1）在70年代末，我国学者首先发现了在生命系统中存在生物全息律。“全息”是从全息照相技术中借用过来的，全息照片的每一部分都能反映出整体的图案。生物全息律的表述形式就是：生命机体的整体与部分之间具有相似性和对应性。　　（2）________。植物叶片上的叶纹与整株植物的外形十分相似，而任意一点的碎片在显微镜下显示出来的纤维纹也与整张叶片的叶纹 ...

04-18 计算机科学与技术专业(本科)毕业设计(论文)要求

（一）教学目标　　毕业设计是完成教学计划达到本科生培养目标的重要环节，是教学计划中综合性最强的实践教学环节，它对培养学生的思想、工作作风及实际能力、提高毕业生全面素质具有很重要的意义。　　毕业设计的教学目标应使学生在以下几方面的能力得到训练和提高：　　1．综合运用所学专业知识分析、解决实际问题的能力；　　2．掌握文献检索、资料查询的基本方法以及获取新知识的能力；　　3．计算机软件、硬件或 ...

06-29 高一数学下学期教学计划

一、指导思想：使学生在九年义务教育数学课程的基础上，进一步提高作为未来公民所必要的数学素养，以满足个人发展与社会进步的需要。具体目标如下。 1．获得必要的数学基础知识和基本技能，理解基本的数学概念、数学结论的本质，了解概念、结论等产生的背景、应用，体会其中所蕴涵的数学思想和方法，以及它们在后续学习中的作用。通过不同形式的自主学习、探究活动，体验数学发现和创造的历程。 2．提高空间想像、抽象概括、 ...

随机推荐

猜你喜欢

四轴飞行器算法研究

·X乡党委社会治安综合治理工作先进交流材料

·学习心得体会中共重庆市动物卫生监督总站第三支部 ---蔡娟

·[中国值得去的旅游景点]

·初中记叙文写作指导反思体会

·基因工程基础知识梳理(二)

·曾侯乙编钟

·助管工作总结

·传统农业如何被"互联网+"改造

·对外汉语教学计划

·红楼梦前五回故事梗概

·幼儿园小班环境布置设计方案

·县农业局2014年工作总结报告

·大学生入党自传范文(七)

·入党通表会发言稿3篇

·冬雪(300字)作文

·装修合同陷阱

·新丰小学教师职称评聘管理办法

·某电厂疏水收集系统技改

·员工离职证明书

·大学生思想汇报-价值观培育的必要性

四轴飞行器算法研究

与《四轴飞行器算法研究》相关的范文

·X乡党委社会治安综合治理工作先进交流材料

·学习心得体会 中共重庆市动物卫生监督总站第三支部 ---蔡娟

·[中国值得去的旅游景点]

·初中记叙文写作指导反思体会

·基因工程基础知识梳理(二)

·曾侯乙编钟

·助管工作总结

·传统农业如何被"互联网+"改造

·对外汉语教学计划

·红楼梦前五回故事梗概

·幼儿园小班环境布置设计方案

·县农业局2014年工作总结报告

·大学生入党自传范文(七)

·入党通表会发言稿3篇

·冬雪(300字)作文

·装修合同陷阱

·新丰小学教师职称评聘管理办法

·某电厂疏水收集系统技改

·员工离职证明书

·大学生思想汇报-价值观培育的必要性

·学习心得体会中共重庆市动物卫生监督总站第三支部 ---蔡娟