At CES I had the opportunity to sit down with OCZ's CEO, Ralph Schmitt, to discuss the state of OCZ after Toshiba acquired the company in late 2013. We talked about how the company has changed and evolved under the new ownership and how Toshiba has brought in some much needed NAND supply and expertise. In the article we posted summarizing that discussion I also mentioned that I would be taking a closer look at OCZ's manufacturing and validation in the coming months, and today's article focuses on that.

OCZ flew me out to Taiwan to get an in-person look at the factory and I had Jason Ruppert, Senior Vice President of Operations, and Jim Van Patten, Vice President of World Wide Quality, as my hosts and guides throughout the trip, and a big thanks to both of them. A bit of background, to ensure these were the people to speak to: Mr. Ruppert has been with OCZ since March 2012 and before joining OCZ he was the Vice President of Manufacturing Operations & Engineering at Harmonic Inc, which focuses on video delivery infrastructures. Mr. Van Patten was actually Mr. Ruppert's first hire and joined the company in May 2012 from Logitech where he was the Vice President of World Wide Quality Assurance. Mr. Ruppert holds a Master's degree in Systems Engineering from North Carolina State University (for your interest, Anand did his BS in Computer Engineering in the same university), while Mr. Van Patten received his Ph.D. in Instructional Design & Evaluation from Syracuse University. It goes without saying that both Mr. Ruppert and Mr. Van Patten have extensive knowledge and experience within their operational areas, so the two were the best people to guide me through the manufacturing and validation.

The Development Process of an SSD

Before we move on to the actual factory tour and see how SSDs are made, let's outline the development process first. After all, a product must be designed and developed before it can be manufactured and there are some items that show up in both development and manufacturing processes.

As with any product, the development process starts from an idea, which can be practically anything (completely new model, refresh of existing model with new NAND, higher capacity model, different form factor, new software etc.). In phase zero the idea is shaped to become a concept and usually results in a short (1-2 pages) document that describes the opportunity presented by the product.

Once the concept is clear, the marketing and engineering teams will give their initial feedback. Both are very important because a product must be marketable, but at the same time it needs viable to execute from the engineering standpoint. Normally phase one takes about three weeks and results in a more in-depth description of the opportunity, and if it fails at this point it is either scrapped or moved back to a concept stage.

In phase two, OCZ starts to commit more significant resources to the project. The first two phases merely outline the concept and determine its opportunity, so phase three begins the actual planning of the product. The two key documents that are finalized in this phase are the marketing requirements and engineering response documents, but each function group (e.g. quality and supply chain) will also deliver their support plans. Basically, the purpose of phase two is to construct a comprehensive project plan that includes all aspects and teams involved in the product, including the budget.

Phase two is probably the most critical phase because the project plan and budget are used to decide whether OCZ puts hundreds of thousands (or even millions) of dollars behind the product, so all documents must be carefully made and evaluated in order to make the best decision for the company. The length of phase two depends on the complexity of the product and what teams need to be involved, but it typically takes from one to three months to build the final plan and budget.

If the project is funded, OCZ moves to phase three, which is where most of the engineering work is done. OCZ of course wants to keep the exact details of this phase close to its chest, but the ultimate goal is to build the first working prototypes, so the project can move to testing the prototypes. While the engineers are busy with their work, the remaining teams work on their own functions and prepare to manufacture the pilot samples (this includes tasks such as qualifying suppliers, securing long lead time parts, developing preliminary spec sheets and marketing materials). The length of the design and implementation phase depends greatly on the product, but even a drive that uses an existing controller can spend up to a year in this phase. The development of a totally new controller like the JetExpress is obviously a multi-year project given the sheer amount of engineering work required.

The Validation Phase

Phase four, which is validation and qualification, essentially consists of three main parts: the Engineer Verification Test (EVT), the Design Verification Test (DVT) and the Production Verification Test (PVT). EVT is run on the first engineering prototypes and it tests that the drive works in real life as it was designed to work on paper. The test suite is relatively straightforward and tests aspects such as power levels, signals and interface timings to ensure that the prototype works as it was planned to. There is also some preliminary performance testing in EVT phase, but because the firmware is usually far from final the results almost never illustrate the performance of a final product.

DVT is further broken down to two areas: normal DVT and quality/reliability. The normal DVT has a broader set of tests than EVT and more variables (e.g. power, temperature and host variations) are added to the mix to ensure that the drive operates and performs as it was supposed to in a variety of environments. Each test is also run on at least four samples, whereas the initial EVT testing is usually performed on just one or two samples. I'm not going to list and describe every individual test here because the DVT phase consists of dozens of different tests, but there are product compliance, data retention, power loss and die failure tests to mention a few, along with thorough performance testing to evaluate the firmware.

The reason why DVT is split into two parts is because EVT and normal DVT are both performed by the engineering team (who also designed the drive), which can create conflicts of interest during the validation process (in the end, human beings tend to be blind to their own faults and mistakes). Most of the DVT tests are rerun as reliability tests, but in this case the tests are performed by the independent quality team that is lead by Mr. Van Patten. The number of samples is also considerably higher and each test is run for a longer duration to verify the reliability of the design. Again, the full list of tests is several pages long, but aspects such as durability against vibration, shocks and low/high temperatures are tested in addition to the normal DVT tests. Basically every spec that is mentioned in the spec sheet is tested in this phase, including all standard JEDEC tests and certifications.

The first level of PVT tests are also run later during phase four and focus on the reliability and repeatability of the manufacturing process. Basically, the purpose of the PVT tests is to ensure that every drive coming out of the mass production line will be of the same quality and that is done by examining the drives from the production line using dye-and-pry and x-ray to inspect the PCB for any defects caused by the soldering process. The other PVT tests evaluate the readiness of the factory's quality system (incoming and in-process quality inspection, final quality control and out of box inspection) to make sure that all quality control phases are capable of separating good and bad, and that no defective products will get through to the customers. Ongoing Reliability Testing (ORT) is also set up to test a few drives from every production run to guarantee that nothing changes over time.

The total length of the validation phase varies greatly. It can be as short as two months if the design is relatively simple and similar to previous ones, but it can easily take over six months for more complex ones. Usually OCZ creates 2-5 sets of engineering samples during validation as issues are found and fixed, but there isn't really any preset duration for validation -- it always depends on what is found during the verification and how significant modifications are needed. Ultimately a drive cannot move to the next phase until it passes all quality and reliability tests, so setting a strict deadline would be a bad idea (for the company and for consumers) to begin with.

Entering Production

In phase five the drive moves from engineering and verification to operations (i.e. manufacturing), which usually takes 3-6 weeks to complete. Final PVT tests are conducted to ensure that the manufacturing quality meets the specifications and that necessary tests are in place to spot any changes/errors in the production. Other teams also finish up their actions to be ready for the launch and this is also the point when OCZ contacts us and other media about an upcoming product launch and sends out the review samples (i.e. the samples we get are typically manufacturing pilots as the mass production hasn't begun yet).

When the manufacturing side is ready to start putting out the new drive, a public announcement of the new product is made and the mass production as well as shipments to customers begin. As part of this visit, we had an inside look into the mass production side of the equation.

The Factory

OCZ's factory is located in Zhongli, which is a district of Taoyuan City, about a 45-minute drive away from the center of Taipei. Technically the factory is now owned and operated by Powertech Technology Inc (commonly known as PTI) because Toshiba wanted OCZ to sell the factory as a part of the acquisition. PTI is a relatively big name in the manufacturing industry with over 10,000 employees and a number of high value customers (including Apple, for instance). Aside from assembly, PTI also does wafer probing and die packaging and actually all Toshiba NAND is packaged by PTI (to my knowledge Samsung is the only NAND supplier that does all NAND packaging in-house), so the company has strong relationship with OCZ and Toshiba. While visiting the factory, I certainly got the feeling that OCZ and PTI are very well integrated as the cooperation between the two is effectively frictionless, which is partially explained by the fact that prior to the acquisition, the employees of the factory had paychecks directly from OCZ.

Two surface-mount technology (SMT) lines in the factory are dedicated to OCZ with a total capacity of approximately 70,000 units per month. OCZ also has burst capacity with other SMT lines and factories that PTI owns in case there's a sudden spike in demand (e.g. a large enterprise order). The facility has room for up to ten SMT lines and OCZ is looking to increase production capacity in the future. However, the factory isn't fully exclusive to OCZ as PTI does manufacture other vendors' drives too and during the tour I spotted some Kingston SSDs.

Assembling an SSD

Assembling an SSD isn't really any different from any other component that is built on a printed circuit board (PCB). The process itself is very straightforward and contains only a handful of steps which should be the same for every manufacturer.

The process begins by printing the circuit board and coating the chip/resistor sockets with solder paste, which is done in the machine pictured above.

The solder paste must be stored in near-zero temperature at all times or otherwise it will lose its soldering characteristics. It can only withstand room temperature for a couple of hours before becoming waste, which is why the paste is stored in small cans to minimize the loss.

Once the PCB has been printed and solder paste applied, the end result is what's pictured above. Four 2.5" PCBs can be processed at the same time, but obviously a smaller PCB would result in higher throughput since more PCBs would fit in the same area and could be processed simultaneously for better cost efficiency. That's why we've seen some manufacturers adopting smaller PCBs in 2.5" drives.

The PCB is now ready to take on the actual chips, which are mounted on the PCB by the above mounter. Every individual chip, resistor and capacitor is loaded to the machine in a circular "tape and reel" and the machine then automatically mounts the components to their right places on the PCB.

The controller come in a slightly different tray and it's the last component to be mounted before the drive moves to the cooking stage.

With all the components in place, the drives enters the reflow oven that melts the solder paste and secures electrical connectivity between all chips. The whole reflow process takes about five minutes and the temperature increases gradually at first before dropping quickly in the last stage of the reflow. The exact temperature profile is unique and has been achieved through science, but there's also trial and error in the mix.

After the reflow the PCBs go through an automatic optical inspection, which compares the produced PCBs against a picture of a perfect PCB to spot any errors (such as misaligned chips and insufficient solders). The automatic inspection is followed by a manual human eye inspection to ensure that all PCBs passing this point should be, at least visually, suitable to go on sale.

For double-sided PCBs, the whole procedure from printing the PCB to visual inspection is then repeated because only one side of the PCB can be worked at a time. That's why many smaller capacity SSDs we see are single-sided because it essentially cuts the PCB assembly steps in half, which reduces cost.

The final step of the PCB assembly is to connect the SATA and power connector, which is separate from the rest of the PCB. I've seen a couple of SSDs where the SATA and power connectors are actually integrated to the PCB, which saves one assembly step, and it's also something that OCZ is considering, but for now the connector remains separate. In M.2 the connector is always a part of the PCB, so from a manufacturing perspective M.2 is slightly more cost efficient than 2.5" drives are.

Once the PCB assembly is fully complete, the PCBs are separated and cut from the frame. The PCB is then put inside the final metal chassis and the screws are screwed automatically by the machine above.

Firmware Installation

Now that the hardware side of the drive is ready, it's time to put some intelligence (the firmware) inside.

The firmware download is done by custom PC setups that consist of normal PC hardware (if you look closely, you can see ASUS' logo on a motherboard or two) running some sort of a Linux distro with OCZ's custom firmware download tool. If you zoom into the monitor you can see that in this case the system is applying firmware to 240GB ARC100 drives.

Once the firmware has been loaded, the drives will move to run-in testing. OCZ has developed a custom script that writes and reads all LBAs eight times with the purpose of identifying bad blocks. If a drive has more bad blocks than a preset threshold allows, it will be pulled away and either fixed or destroyed. The scripts also test performance using common benchmarking tools (e.g. AS-SSD and ATTO) to ensure that all drives meet the spec.

Currently OCZ has two different test setups. One half of the test systems are regular PCs that are very similar to the firmware download systems, whereas the other half are custom racks pictured above. OCZ is looking to move all testing to rack-based cabins since one cabin can simultaneously test 256 drives, which is far more efficient than having dozens of PC setups around that can only test a handful of drives each at a time. The test regime is the same in both cases, so it's purely a matter of space and labor efficiency.

At the moment SATA based drives are tested through the host, which means that the IO commands are sent by the host similar to how we test SSDs. For PCIe drives, however, OCZ is developing a Manufacturing Self Test (MST) that is essentially a custom firmware that is loaded into the drive, which then reads and writes all LBAs to test for bad blocks. The benefit of MST is the fact that it bypasses the host interface (i.e. all IO commands are generated by the controller/firmware), making the test cycle faster as the host overhead is removed.

Additionally, every month a sample of finished drives go through a more rigid tests called Ongoing Reliability Testing (ORT) to ensure that nothing has changed in production quality. The tests consist of Thermal Cycle Test (TCT) where the drive is subjected to thermal shocks to validate the quality of manufacturing and Reliability Demonstration Test (RDT) where drives are tested at elevated temperature (~70°C) to demonstrate that the mean time before failure (MTBF) meets the specification.

The run-in testing hasn't changed much since Toshiba took over, but Toshiba did help OCZ to align to its quality standards. All the processes running today have been inspected by Toshiba and meet the strict standards set by the company. Note that the purpose of run-in testing isn't to screen for firmware bugs, but to ensure that the hardware is functional. The firmware development and validation is done before the mass production begins and after Toshiba took over OCZ has modified its development process to increase the quality and reliability of its products.

OCZ's whole philosophy has actually changed since the previous CEO left the company because in the past OCZ always tried to be the first to the market at any cost and tried to cover every possible micro-niche, which resulted in too many product lines for the resources OCZ had. Nowadays OCZ is putting a lot of effort into product qualification and it no longer has a dozen products in development at the same time, meaning that there's now sufficient resources to properly validate every product before it enters mass production.

The run-in testing may seem light with only eight full LBA read/write spans, but honestly I don't think it's necessary to hammer a drive for days because any apparent hardware flaw should surface very quickly. Basically, the hardware either works or it doesn't, and once the drive leaves the factory it's more likely to fail due to firmware anomaly than a physical hardware failure.

Packaging

Once the drive has exited the validation station without any errors, it's considered to be fully functional and is ready to be packaged.

Before the drive is put inside the retail package, the labels are put on the metal chassis. Currently this is done manually and requires extreme precision from the worker, but this is an area OCZ is looking to automate to reduce costs and increase the throughput of the factory.

The folding of the cardboard retail boxes has already been automated and is done by the machine above.

The drives and accessories are put inside the retail box by hand and the last step in the process is to wrap the complete box in plastic. The wrap is folded over itself, resulting in a pocket where the retail package is pushed into. The package will then go through a heat tunnel, which shrinks the plastic a creates a tight wrap.

The retails boxes are then put inside 10-drive shipping boxes. The shipping boxes and the black holders have actually been designed by OCZ and are something that Mr. Van Pattern introduced shortly after joining the company. The new design is much more tolerable to drops and pressure in order to keep the retail packages cosmetically in tact and the 10-drive size is easy to ship around. OCZ does drop tests periodically to ensure that both the retail package as well as the shipping box quality hasn't degraded and the drives are safe.

The last stop for the drive before it's shipped out is, of course, the warehouse, which also concludes our factory tour.

Final Words

First of all, I'd like to thank OCZ and PTI for giving us the exclusive inside look of their factory. Usually we only get to see the final product, but not the steps that are taken to develop and build the drive. As we saw, the actual assembly process of an SSD is not very complex, but there is a tremendous amount of continuous testing to keep the quality high. Some tests, such as the box drop test, may seem a bit redundant, but ultimately it's these little things and aspects that build a high quality product.

That said, I'm not an engineering or manufacturing expert, so it would be wrong for me to "review" OCZ's processes. What I can say, though, is that OCZ's attitude towards quality and reliability has completely changed. In the past it wasn't unheard of that a product would only spend roughly three months in development before entering mass production, but now even the qualification phase of the development process is typically longer.

All in all, the new OCZ is well aware of its questionable quality reputation from the past and is now doing everything it can to build the trust back. It won't happen overnight, but opening up the whole development and manufacturing process is a way of showing that OCZ has nothing to hide when it comes to quality.