"Those who do not remember the past
are condemned to repeat it"
This document presents summary results of a limited "lessons learned" survey that was conducted within the Engineering Directorate in late May/early June 1989. As such, it is an initial attempt to preserve some of the experiences and practical advice of senior engineering personnel, for future use as an aid to minimize repeating past mistakes on flight programs.
The survey was confined to space flight hardware/software and related test/support equipment, and limited in scope to the development phase from the start of execution (i.e. phase C/D) up to but not including launch. Pre-phase C/D, launch, and in-orbit experience (i.e. problems) were excluded, and the number of survey participants was limited in order to simplify information gathering for an initial report. Thus both a wider scope and broader information base are envisioned for future revisions to this document. However, the number of survey participants was large enough to represent a wide and diverse experience base covering many flight projects.
In this document, emphasis is placed on mistakes and other problems encountered by senior engineering personnel on current/ past in-house and out-of-house flight programs. Except for some corresponding items listed under Advice/Recommendations, no effort was made to include the remedies associated with resolving these matters. However, the remedies in most cases are rather obvious. While this may appear to be a negative approach, it is considered to be a more effective way to teach a lesson learned, than for example, to describe a hint, tip or design requirement without addressing its genesis. Consequently, the lessons learned are to not repeat these mistakes, thereby avoiding potentially serious cost impacts and schedule delays.
For the purpose of this document, phase C/D was divided into three stages: Design; Subsystem Fabrication, Assembly, Integration and Test; and System Integration and Test. The lessons learned that apply to each of these stages have been classified as: most often repeated mistakes/problems; other problems/worst practices observed; and advice/recommendations. In the category of most often repeated mistakes/problems, only those in design were solicited in the survey. Consequently, the paucity of entries in this category in sections 3.1 and 4.1 merely reflect the limitations of the survey. The category of "other problems"primarily includes the worst practices observed by survey respondents, which may also include some of those "most often repeated." The items listed under advice/recommendations could have included the remedy for each problem in the preceding sections; however, many have been omitted to avoid unnecessary repetition. The sample PDR checklist (Attachment A) repeats some, but not all of the applicable items identified in sections 2,3, and 4. Such a one-for-one correspondence was not considered necessary for the purpose of this document. It is left to the reader to add these items as he/she sees fit, if one chooses to create an individual PDR checklist.
Although this document is directed at engineers with limited development experience in hardware/software that has flown in space, it is likely to also be read by senior, experienced engineers. All readers are invited to submit inputs to improve this document, and to keep it current. A format for such inputs is provided in Attachment B.
Although not specifically excluded, no effort was made to include items that are also recorded in formal documentation, such as the Preferred Parts List (PPL), Materials Tips, Malfunction Reports (MR), Spacecraft Orbital Anomaly Reports (SOAR), and design review (i.e., PDR, CDR, etc.) reports for specific projects/activities. Information regarding access to this documentation can be obtained from the Office of Flight Assurance (OFA, i.e., Code 300) representative assigned to cover the reader's activity.
Although some elements of design persist throughout phase C/D, the design stage is generally considered to cover the period from start of the execution phase to start of manufacture of flight components. It encompasses breadboard as well as some engineering model testing at the component, subsystem, and occasionally at system levels. It also includes the Preliminary and Critical Design Reviews (PDR and CDR), and the design freeze. While some of the mistakes/problems that are listed below refer to later phases of development, they are included here since they could have been prevented through proper design.
1. Lack of clear definition of requirements early in phase C/D. This included:
2. Poorly defined technical interfaces between subsystems and other subsystems/spacecraft (i.e., inadequate systems engineering). This included:
3. Inadequate consideration of testability, to assure that the design can be tested to demonstrate specification compliance. This included:
4. Inadequate test planning early in phase C/D, including:
5. Failure to think the design through, to completion of integration. This included:
6. Failure to analytically verify the design prior to fabrication and test. This included:
7. Insufficient attention paid to the effects of mission operations on hardware/software design requirements (including how subsystem is to be operated in space and how data will be reduced and analyzed).
8. Insufficient housekeeping telemetry for evaluation of performance and for failure analysis.
9. Lack of adequate early consideration of zero "g" effects on the design. This included inadequate torque margins on drive mechanisms for operation in ground tests, and other proposed designs that were not testable in 1-g (due to lack of early "g" negation planning/design).
10. Incorrect and/or excessive dimensional tolerances on drawings . . . unnecessarily tight tolerances and accumulation of tolerances have resulted in improper fits.
11. Inadequate documentation of the hardware and software designs. This included:
12. Lack of engineering model to develop/refine/verify flight model specifications.
13. Inadequate consideration of "fail-safe" designs
14. Failure to assure that space qualified (or qualifiable) parts are available
1. Incorrect/poor material selection including mismatching of materials in applications where low thermal expansion/stress is required
2. Failure to specify proper lubrication
3. Failure to provide adequate radiation protection
4. Failure to analyze fasteners to prevent stress corrosion/cracking, and improper use of fasteners
5. Inadequate consideration of the effects of cryogenic temperatures on stress corrosion and testing.
6. Designing rivet patterns on structures that are impossible to implement.
1. Inadequate consideration of previous optical designs
2. Attempted use of optical components as load bearing structures
3. Relying on a load bearing structure for critical positioning between optical components
4. Thermally sinking optical components, causing them to act as traps for contaminants
5. Inadequate early (initial) design, including:
1. Inadequate design to prevent corona discharge/high voltage failures
2. Inadequate parts (EEE) selection and screening, including insufficient derating and not selecting parts with the highest radiation tolerance.
3. Unnecessarily complex designs
4. Inadequate consideration and use of EMC techniques. This included:
5. Improper partitioning of functions (i.e., either PC boards or boxes) within a subsystem.
6. Insufficient attention to grounding (i.e., lack of, or inadequate grounding philosophy)
7. Overstressing resistors or power carrying components
8. Incorrectly assuming that a breadboard model can be directly converted into a flight model. Stated differently, incorrently assuming that a design proven with non-flight parts will perform identically with flight parts.
9. Choosing a design approach that is biased toward the designer's expertise, rather than as a result of a trade-off study.
10. Inadequate specification of plug (pin) to jack (socket) signal characteristics, leading to interface incompatibilities between assemblies and components.
1. Designing for the "best case" scenario instead of the "worst case" (insufficient or inadequate design margins)
2. Inadequate early requirements definition and design consideration of ground/lab/bench test equipment (including needs for interface simulation)
3. Not factoring ground test requirements into system (especially C& DH) design. Inability to speed up sample rates contributed to excessive checkout time.
4. Not performing simulations or building breadboards to verify the design
5. Lack of adequate computer design tools
6. No provision for venting of components
7. Inadequate system design, including:
8. Designing in pieces before full scope/complete requirements are known.
9. Human failings, including:
1. Inadequate attention to effects of thermal variances and/or hard vacuum in specifying dimensional tolerances.
2. Designs using friction as a load path in a structure.
1. Failure to perform first order performance margin calculations on critical designs.
2. Use of "ideal" optical surfaces in baseline designs that meet specifications but are very difficult to implement.
3. Inadequate contamination control planning
4. Failure to consider, or lack of knowledge of, previous/existing optical designs
1. Planning to use non-existing parts and use of newest/unproven parts.
2. Not allowing adequate space to accommodate wiring.
3. Overestimating circuit efficiency
4. Use of brush type dc motors (that are short-lived in vacuum) to reduce cost
5. Improperly sized PC board traces for power lines (i.e., inadequate current capacity)
6. Gross underestimation of battery heat dissipation
7. Inadequate heat sinking
8. Unclear division of responsibilities on the design of electronics boxes.
1. Don't make it complicated when a simple design will suffice
2. More emphasis needs to be placed, early in phase C/D, on tooling and manufacturing, test equipment, simulators and GSE
3. Need to conduct internal and peer design reviews early in phase C/D.
4. Commence software development with hardware development.
5. Don't accept a "proven" design, without thoroughly analyzing its intended use in the proposed design.
6. Subsystem (especially instrument)/spacecraft interface details should be refined and delivered to the project at set points in the development cycle. One example could be: "best estimate" at subsystem PDR; "preliminary at subsystem CDR: and "final" at subsystem Pre-Environmental Review (PER).
7. Relative to flight software, system architecture and control systems, the hardware and software designs should be modularized through functional decomposition.
8. For logic design, use a higher voltage (power) input than required, so that a decoupling RC filter can be incorporated later (if required) without unacceptable power loss.
9. Assure that selection of materials for GSE is appropriate for intended use (i.e., some may need to be vacuum compatible to support environmental testing)
10. Design for testing in a one-g, one atmosphere environment as well as for operation in space. Recommendations include:
11. Be aware of concerns related to space vacuum, including:
This stage of phase C/D extends from "design freeze" to delivery of protoflight and flight subsystems for system integration. It includes fabrication, assembly, and integration and test (including calibrations and any environmental testing) at the subsystem level, as well as packaging and shipping to the system integrator. Usually, formal reviews during this stage are implemented only for subsystems (e.g., instruments) and consist of the Pre-Environmental Review (PER) and/or the Pre-Shipment Review (PSR). It is within this stage that most redesign occurs.
1. Many unnecessary and/or undocumented changes, including:
2. Inadequate testing at the subsystem level. This is often due to poor early test planning, including lack of test design in parallel with subsystem design.
3. Inadequate contamination control/clean room practices, including unsatisfactory environmental control of the assembly area.
4. Attempts to adjust the position of precise optical components by use of shims.
5. Inadequate relief loops in wiring harnesses.
6. Substitution of parts that are not exactly equivalent.
1. Lack of communications between designer, fabricator and tester. Among other things, this problem includes:
2. Inadequate assembly, component and subsystem test plans and procedures including:
3. Lack of respect/concern for hardware integrity. Some examples include:
4. Inadequate documentation, other than that listed under item #2 above, including:
5. Delay in starting acceptance test procedures because tools to assemble the piece parts were not procured.
6. Failure to use adequately calibrated tools and equipment (torque wrenches, scopes, gages).
7. Lack of adequate computer assisted tools (CAM)
8. Lack of good process (i.e, fabrication/assembly/integration/test) planning. Problems included:
9. "Quick-fix" approach taken too often
10. Bowing to schedule pressure, including:
11. Maintaining an attitude that certification logs are not important
12. Poor electrical/electronics practices, including:
13. Electronic interface mismatches between subsystems/GSE/spacecraft
14. Proof of design (electronics) model unable to accommodate flight parts
1. Assure that hardware and software configuration management plans are properly implemented. This includes keeping detailed logs to assure configuration traceability.
2. Plan for early testing of subsystem design, at increasingly higher levels of integration.
3. Assure that subsystem test plans address the following:
4. Assure that subsystem test procedures address the following:
5. Assure that test load levels are reasonable and that all environments are covered.
6. When using a ground computer to monitor tests, consider employing an automatic data dump onto hard copy that would be initiated whenever an anomaly occurs.
7. Use automated test sets for subsystem testing to the fullest extent practical.
8. Use CAD/CAM to the fullest extent practical.
9. Electronics proof-of-design model(s) built using commercial parts should be able to later accommodate flight parts for design verification.
This stage of phase C/D extends from the delivery of protoflight and flight subsystems (for system integration) up to launch. This stage includes: the integration of the various subsystems into a complete spacecraft/observatory/platform system; system tests and calibrations, including generation of baseline system data for future trend analysis; system environmental tests, including qualification and acceptance testing; pre-shipment preparations; shipment and launch preparations. Formal reviews that usually take place during this stage include: the System Operations Review (SOR); Pre-Environmental Review (PER); Pre-Shipment Review (PSR); Flight Operations Review (FOR), also called the Mission Operations Review (MOR); and the Flight Readiness Review (FRR).
1. Interface verification on actual hardware occurs too late.
2. Inadequate preparation to handle (i.e., reduce, analyze and report on) huge amounts of test data.
1. Inadequate system integration/test plans and/or procedures. Almost half of the survey respondents listed variations of this problem, including the following observations:
2. Inadequate instrumentation, especially in engineering development and qualification testing. This includes having either too much improperly placed or not enough instrumentation for vibration tests, resulting in inadequate data to verify structural analyses.
3. Poor scheduling of integration activities, including lack of adequate definition of work to be done, who does it, and when.
4. Inadequate time allowed for testing, due to poor scheduling and/or funding/schedule constraints.
5. Lack of communications between project/designer/fabricator/test engineers.
6. Inadequate investigation/resolution of anomalies, including ignoring/waiving away discrepancies and/or functional failures that occur during testing.
7. Inadequate analysis of test results, including:
8. Poor documentation of test data/results
9. Incompatible interfaces
10. Inoperable test equipment
11. Testing flight hardware to full vibration levels without first characterizing responses at lower levels
12. Numerous electrical/electronics problems, including:
1. Assure that the overall system test plan includes the following:
2. Assure that system test procedures address the following:
3. Begin developing test plans early (i.e., at the breadboard test stage)
4. Low-level vibration tests should be performed to determine validity of analytical predictions and to reduce risk of failure at qualification/acceptance levels. The testing and analyses should be complementary, and fixes considered/implemented if analyses indicate probability of failure.
5. Assure that ground cooling is adequate to prevent overheating during testing (heat transfer modes are completely different on the ground, vs. in space)
6. Assure that test configuration management is in place so that there is traceability of performance (at least one project had problems stemming from testing hardware of unknown configuration).
7. Use computer controlled testing as much as possible/practical.
8. Beware of cost savings proposals to substitute temperature tests for thermal vacuum tests. This is not a recommended practice due to moisture problems that nearly always come up.
The survey questionnaire asked participants if they used checklists as aids in assessing the completeness of documentation (such as specifications, test plans, etc.) and in raising questions/concerns at different stages of a program (such as at PDR, CDR, etc.). Less than half of the respondents indicated that they used such checklists, and some of them indicated using only "mental" checklists. However, an overwhelming majority (88%) of those responding to the question thought that such lists should be developed to promote consistency/completeness of engineering practices.
Checklists are only tools. They are not substitutes for thinking, common sense, or experience. They can be useful as personal aids, guides, or mind-joggers to reduce the number of items that "slip through the crack" or that are "momentarily" forgotten, or to assure that some aspect of design or test activity is not overlooked. Obviously, except in a very general sense, there is no such thing as a common, or standard checklist that is universally useful. However, within each technical discipline (i.e., C& DH, power, etc.) there can be "standard" checklists for each unique stage of development and hardware/software item that can be tailored or modified to reflect individual project needs. Although there may be some commonality between more than one discipline, "standard" checklists will be different for each discipline and function (i.e., flight hardware, software, PDR, etc.). More often than not, items on checklists come from someone's experience. These different "standard" lists should not be viewed as requirements to be followed blindly, but considered for applicability to the user's situation, and modified accordingly. Use of such "standard" checklists, updated/modified as circumstances warrant, could be issued/controlled at the Section, or Branch level if desired. If inclined to prepare a checklist, the reader is advised to seek out experienced, senior personnel to enhance its usefulness.
One of the questions in the survey asked respondents to list the first things they look for when reviewing an early design, such as at PDR. The responses to this question have been compiled into a preliminary
(sample) PDR checklist (Attachment A). While written from the
viewpoint of a review team member, the list is equally useful to the responsible subsystem as well as system engineer, in that he/she may be prepared to respond to the items listed. Although incomplete and not entirely applicable to all disciplines, Attachment A may be used as a starting point for an individual PDR checklist that the reader may wish to create.
Checklists, then, can be useful "lessons learned" tools that could prevent recurrence of some past problems. Many items in sections 2, 3, and 4 of this document could be incorporated into a variety of individual checklists.
Back to - Home Page