Quantum simulation is crucial for developing practical quantum algorithms, as limitations in current hardware necessitate robust classical methods for testing and refinement. Researchers from the University of Science and Technology of China have developed a scalable approach to simulating quantum circuits within the Q Chemistry software package, delivering substantial performance gains on both conventional CPUs and powerful GPUs. This research demonstrates a significant leap forward in simulation speed and portability, consistently outperforming existing open-source simulators across a range of quantum circuit designs and paving the way for more complex algorithm development. Key technologies underpinning these advancements include multi-core CPU parallelization, distributed computing, and the use of tensor network methods to efficiently represent quantum states. State vector simulation alongside techniques like matrix product states are employed to balance accuracy and computational cost, enabling researchers to tackle increasingly complex quantum systems. The Q2Chemistry software package has significantly enhanced the performance of full-amplitude quantum circuit simulation within the software package, enabling accurate and efficient simulations of complex quantum circuits. The team implemented Batch-Buffered Overlap Processing, a multi-buffering strategy that partitions quantum state amplitudes into smaller batches, and Staggered Multi-Gate Parallelism, a two-dimensional thread block strategy for GPU execution. These optimizations enable researchers to tackle increasingly complex quantum circuits and explore the potential of quantum chemistry with greater efficiency and accuracy.