Objective: The development of communication systems and increasing demand for low-latency and high-reliability communications has led to a rise in application scenarios requiring soft information interaction for joint iterative decoding to improve system performance. The soft-output successive cancellation list (SO-SCL) decoding algorithm of polar codes can achieve relatively accurate soft information output and decoding with the complexity of traditional successive cancellation list (SCL) decoding by estimating the codebook and posterior probabilities. However, the serial characteristics of SCL decoding result in a high decoding delay of the SO-SCL decoding algorithm, making it difficult to satisfy the 1 Tbps peak throughput requirement of the sixth-generation mobile communication system. To reduce the decoding delay, the existing soft-output fast SCL decoding algorithm (SO -FSCL) realizes fast decoding by identifying four special nodes; however, some nodes still have a high decoding delay. Therefore, a high-performance soft-output decoding algorithm for polar codes with lower decoding delay is required. Methods: In previous studies, special nodes were identified, aiming to achieve fast decoding of different special nodes and combined decoding between nodes. Based on the SO -FSCL decoding algorithm, this study introduces five other special nodes, namely REP-2, REP-3, REP-4, SPC-2, and SPC-3, and proposes a faster soft-output SCL decoding algorithm (FS-SCL). The developed decoding algorithm enables fast decoding of the five new nodes and provides a posterior probability formula for the deleted path. For the REP-2, REP-3, and REP-4 nodes, only 4, 8, and 16 possible decoding paths need to be considered, respectively, and the sum of the probabilities of the deleted paths is calculated. According to the distribution characteristics of the information bits in the nodes, SPC-2 can be simplified to the parallel decoding of two SPC sub-nodes with Ns/2 bits. After combination, the SPC-3 node can be regarded as a repetitive code with a code rate of 1, thereby simplifying the decoding process. Moreover, compared with SCL decoding, which requires flipping all bits, while with the SPC-2 and SPC-3 nodes, only the min{L-1, Ns-2}, min{L-1, Ns-3} bits require flipping during the decoding process, thereby reducing the decoding delay. The time steps required for decoding different nodes are also analyzed herein to evaluate the decoding delay. The five newly added nodes require 3, 4, 5, max{log2Ns+1, min{L, Ns-1}}, and max{log2Ns+3, min{L, Ns-2}} time steps, respectively. Compared with the original SCL decoding algorithm, the node significantly reduces the time steps required for decoding. Results: The simulation results show that by employing the AWGN channel, the proposed FS-SCL decoding algorithm maintains a BER performance similar to that of SO-SCL in different modulation methods, especially under 16QAM. By reducing the proportion of high-code rate nodes, the performance loss is reduced. Compared with the existing SO -FSCL decoding algorithm, the FS-SCL decoding algorithm can further reduce the decoding delay by more than 12.5% (up to 28.7%) and can further reduce the decoding complexity. Moreover, by merging shorter nodes, the FS-SCL decoding algorithm reduces the number of nodes by 42% and achieves a minimum node code length of 8, which is conducive to improving the decoding parallelism. Conclusions: The developed FS-SCL decoding algorithm with lower delay and complexity affords lossless BER performance using different modulation methods. The research results can provide an efficient polar-code decoding scheme for low-latency communication scenarios with soft output, which has important theoretical value and application prospects.