Objective: Image- based smoke detection is a vital component of early fire warning systems. However, existing methods face considerable challenges in reliability when applied to environments with complex backgrounds, high noise levels, and low image contrast. In particular, during the early stages of a fire, smoke often appears small in size, low in density, blurred in shape, and irregular in morphology, which further complicates detection. To address these challenges, this study proposes a smoke detection method that integrates spatial perception and saliency modeling. The aim is to improve the robustness, adaptability, and accuracy of smoke detection systems, providing highly reliable and effective solutions for real-world fire surveillance across diverse environments. Methods: The proposed method consists of three key components: the multi-kernel parallel convolution module (MKPCM), dynamic histogram axial interaction module (DHAIM), and spatial decay residual block (SDRB). The MKPCM employs a parallel architecture with convolution kernels of varying sizes, enabling the network to capture features across multiple spatial scales simultaneously. This design allows for an effective representation of the variable dispersion scales of smoke. The embedded context anchor mechanism further refines this process by assigning differentiated spatial weights, enhancing the focus on relevant visual regions while suppressing background noise and irrelevant features. The DHAIM uses dynamic histogram-based segmentation to partition feature maps into high- and low-contrast areas, and then applies hybrid attention mechanisms tailored to each partition to improve semantic differentiation and precise extraction of subtle smoke cues in low-contrast zones. The SDRB introduces a spatial attention generation process based on Manhattan distance, where attention weights decay as spatial distance increases, to effectively reduce interference from remote pixels and improve feature consistency in regions with blurred boundaries. These components are jointly optimized in an end-to-end learning framework to enhance the model's sensitivity to complex spatial patterns and ambiguous edge transitions of smoke plumes. Results: To evaluate the effectiveness of the proposed method, a multi-scene smoke detection dataset is constructed, encompassing various indoor and outdoor scenarios with diverse background complexities. Experimental results show that the proposed method achieves an average precision of 94.0%, outperforming the baseline real-time detection transformer model by 5.5%. The method consistently delivers high detection accuracy across different environmental conditions and maintains strong robustness against low contrast, occlusion, and scale variation. Ablation studies confirm the individual and combined contributions of MKPCM, DHAIM, and SDRB to enhancing performance metrics such as precision, recall, and F1 score. In addition, the method demonstrates efficient inference and computational performance, making it highly suitable for real-time deployment in intelligent surveillance, early fire warning systems, and automated safety platforms. Conclusions: This study presents a robust and efficient smoke detection method that integrates multi-scale spatial perception and contrast-adaptive saliency modeling. The experimental findings validate the method's ability to address key challenges in early fire smoke detection, especially in visually complex environments. With its strong detection performance and practical adaptability, the proposed method holds significant potential for integration into real-world fire prevention infrastructures, thereby enhancing early warning capabilities and contributing to improved public safety outcomes and emergency responsiveness.