국립한밭대학교 마이크로사이트

주요 메뉴 바로가기 본문 바로가기

메인 슬라이드 이미지

정보기술대학 지능미디어공학과

Visual Media Lab.

지능미디어공학과 최해철 교수 최해철

Research result

해외논문

Probabilistic Principal Component Analysis and Channel Attention for End-to-End Image Compression Optimization

In recent years, deep learning has shown significant progress for image compression compared to traditional image compression methods.
Although conventional standard-based methods are still used, they are limited in handling repetitive patterns and complex calculations, which can lead to image reconstruction issues.
In this study, we propose a novel learning-based image compression method that integrates both channel attention (CA) and probabilistic principal component analysis (PPCA) blocks as core components to enhance encoding efficiency.
PPCA is used to focus on essential features and manage noise.
Unlike traditional PCA, PPCA’s probabilistic approach better preserves meaningful data structure, enhancing compression and robustness.
The CA mechanism in our model emphasizes significant image features by prioritizing dominant pixel values, allowing the compression process to retain essential details while minimizing less relevant information.
Furthermore, a foveated image quality assessment metric is proposed, prioritizing visually significant regions to enhance the evaluation of dominant information guided by attention mechanisms and to assess the impact of the CA and PPCA blocks on image reconstruction.
Experimental results demonstrate that the proposed method obtained significant coding efficiency across various metrics on the Kodak and Tecnick datasets compared with state-of-the-art methods.

2025-08-11 10:47
FR-IBC: Flipping and Rotation Intra Block Copy for Versatile Video Coding

Screen content has become increasingly important in multimedia applications owing to the growth of remote desktops, Wi-Fi displays, and cloud computing.
However, these applications generate large amounts of data, and their limited bandwidth necessitates efficient video coding.
While existing video coding standards have been optimized for natural videos originally captured by cameras, screen content has unique characteristics such as large homogeneous areas and repeated patterns.
In this paper, we propose an enhanced intra block copy (IBC) method for screen content coding (SCC) in versatile video coding (VVC) named flipping and rotation intra block copy (FR-IBC).
The proposed method improves the prediction accuracy by using flipped and rotated versions of the reference blocks as additional references.
To reduce the computational complexity, hash maps of these blocks are constructed on a 4 × 4 block size basis.
Moreover, we modify the block vectors and block vector predictor candidates of IBC merge and IBC advanced motion vector prediction to indicate the locations within the available reference area at all times.
The experimental results show that our FR-IBC method outperforms existing SCC tools in VVC.
Bjøntegaard-Delta rate gains of 0.66% and 2.30% were achieved under the All Intra and Random Access conditions for Class F, respectively, while corresponding values of 0.40% and 2.46% were achieved for Class SCC, respectively

2025-02-20 19:39
Tackling Dual Gaps in Remote Sensing Segmentation: Task-Oriented Super-Resolution for Domain Adaptation

Semantic segmentation of remote sensing images plays a crucial role in various applications, such as land cover mapping and urban planning.
However, the performance of semantic segmentation models often degrades when applied to images from different domains or with varying spatial resolutions.
In this paper, we propose a novel task-oriented super-resolution method for domain adaptation in remote sensing semantic segmentation.
Our approach aims to adapt a segmentation model trained on high-resolution images from a source domain to perform accurately on low-resolution images from a target domain.
We introduce a super-resolution network that learns to enhance the spatial resolution of the target domain images while simultaneously optimizing the segmentation performance of a pre-trained and fixed segmentation model.
The super-resolution network is trained using a combination of losses, including a segmentation loss, a perceptual loss, and a contrastive loss,
which together ensure that the adapted images are both visually similar to the source domain images and semantically consistent with the ground-truth segmentation masks.
We evaluate our method on two challenging remote sensing datasets, ISPRS Potsdam and Vaihingen, and demonstrate significant improvements in segmentation accuracy compared to state-of-the-art domain adaptation techniques.
Our approach achieves mean Intersection over Union (mIoU) scores of 0.523 and 0.567 on the Potsdam and Vaihingen datasets, respectively.
The proposed task-oriented super-resolution method offers a promising solution for adapting semantic segmentation models to new domains and resolutions in remote sensing applications.

2025-02-20 19:07
Visual Quality Assessment of Point Clouds Compared to Natural Reference Images

This paper proposes a point cloud (PC) visual quality assessment (VQA) framework that reflects the human visual system (HVS). The proposed framework compares natural images acquired using a digital camera and PC images generated via 2D projection in terms of appropriate objective quality evaluation metrics. Humans primarily consume natural images; thus, human knowledge is typically formed from natural images. Thus, natural images can be more reliable reference data than PC data. The proposed framework performs an image alignment process based on feature matching and image warping to use the natural images as a reference which enhances the similarities of the acquired natural and corresponding PC images. The framework facilitates identifying which objective VQA metrics can be used to reflect the HVS effectively. We constructed a database of natural images and three PC image qualities, and objective and subjective VQAs were conducted. The experimental result demonstrates that the acceptable consistency among different PC qualities appears in the metrics that compare the global structural similarity of images. We found that the SSIM, MAD, and GMSD achieved remarkable Spearman rank-order correlation coefficient scores of 0.882, 0.871, and 0.930, respectively. Thus, the proposed framework can reflect the HVS by comparing the global structural similarity between PC and natural reference images.

2024-01-29 17:21
Online Learning-Based Hybrid Tracking Method for Unmanned Aerial Vehicles

Tracking unmanned aerial vehicles (UAVs) in outdoor scenes poses significant challenges due to their dynamic motion, diverse sizes, and changes in appearance. This paper proposes an efficient hybrid tracking method for UAVs, comprising a detector, tracker, and integrator. The integrator combines detection and tracking, and updates the target’s features online while tracking, thereby addressing the aforementioned challenges. The online update mechanism ensures robust tracking by handling object deformation, diverse types of UAVs, and changes in background. We conducted experiments on custom and public UAV datasets to train the deep learning-based detector and evaluate the tracking methods, including the commonly used UAV123 and UAVL datasets, to demonstrate generalizability. The experimental results show the effectiveness and robustness of our proposed method under challenging conditions, such as out-of-view and low-resolution scenarios, and demonstrate its performance in UAV detection tasks.

2024-01-29 17:18
Adaptive block tree structure for video coding

The Joint Video Exploration Team (JVET) has studied future video coding (FVC) technologies with a potential compression capacity that significantly exceeds that of the high-efficiency video coding (HEVC) standard. The joint exploration test model (JEM), a common platform for the exploration of FVC technologies in the JVET, employs quadtree plus binary tree block partitioning, which enhances the flexibility of coding unit partitioning. Despite significant improvement in coding efficiency for chrominance achieved by separating luminance and chrominance tree structures in I slices, this approach has intrinsic drawbacks that result in the redundancy of block partitioning data. In this paper, an adaptive tree structure correlating luminance and chrominance of single and dual trees is presented. Our proposed method resulted in an average reduction of −0.24% in the Y Bjontegaard Delta rate relative to the intracoding of JEM 6.0 common test conditions.

2024-01-29 17:15
Microstrip antenna using H-slotted ground structure for orthogonally polarized dual-band operation

This article presents a novel dual-band orthogonally polarized square microstrip antenna for vehicle-to-nomadic devices communication system. The proposed antenna consists of a perpendicular feed for utilizing orthogonal linear polarizations and an H-shaped slotted ground structure for obtaining dual-band operation. Because of the geometrically axis-symmetric H-slot loading effect, the orthogonal polarization at each resonant frequency can be achieved. The measurement results of the proposed antenna have been successfully demonstrated in good agreement with the simulations of reflection coefficients, antenna gains, and radiation patterns. © 2016 Wiley Periodicals, Inc. Microwave Opt Technol Lett 58:136–139, 2016

2024-01-29 17:12
Alternative Intra Prediction for Screen Content Coding in HEVC

Screen content generally consists of text, images, and videos variously generated or captured by computers and other electronic devices. For the purpose of coding such screen content, we introduce alternative intra prediction (AIP) modes based on the emerging high efficiency video coding (HEVC) standard. With text and graphics, edges are much sharper and a large number of corners exist. These properties make it difficult to predict blocks using a one-directional intra prediction mode. The proposed method provides two-directional prediction by combining the existing vertical and horizontal prediction modes. Experiments show that our AIP modes provide an average BD-rate reduction of 2.8% relative to HEVC for general screen contents, and a 0.04% reduction for natural contents.

2024-01-29 17:10
Fast transform unit decision for HEVC

For the High Efficiency Video Coding (HEVC) standard, a fast transform unit (TU) decision method is proposed. HEVC defines the TU representing a region sharing the same transformation, and it supports various transform sizes from 4×4 to 32×32 by using a quadtree of TUs. The various sizes of TUs can provide good coding efficiency, whereas it may increase dramatically encoding complexity. Assuming that a TU with highly compacted energy is unlikely to be split, the proposed method determines an appropriate TU size according to the position of the last non-zero transform coefficient and the number of zero transform coefficients. Experimental results show that this reduces encoding run time by 14% with a negligible coding loss of 0.38% BD-rate from Random_access_main case.

2024-01-29 17:07
Fast HEVC Intra Mode Decision Based on Bayesian Classification Framework with Relative SATD

HEVC (high efficiency video coding) achieves much higher coding efficiency compared with previous video coding standards at cost of significant computational complexity. This paper proposes a fast intra mode decision scheme, where a Bayesian classification framework using relative sum of absolute Hadamard transformed difference (SATD) is introduced and combined with conventional fast encoding methods. Experimental results show that this scheme reduces encoding run time by about 30% with a negligible coding loss of 0.9% BD-rate for the all intra coding scenario.

2024-01-29 17:04
Pixel-domain Wyner-Ziv residual video coder with adaptive binary-to-Gray code converting process

A pixel-domain Wyner-Ziv residual video coding scheme is presented. In this scheme, based on the statistical distribution characteristics of the residual signal, an adaptive binary-to-Gray code converting process is designed so that virtual channel noises can be lowered over bit-planes. Through simulations, it is shown that the best case performs better than the DISCOVER scheme as well as the worst case.

2024-01-29 16:59
Interactive-based Distributed Video Coding for Low-power Video Surveillance System

This paper presents a novel Wyner-Ziv video coding system which is applicable for low-power video surveillance systems in an interactive way. In order to improve the performance of the conventional DVC (Distributed Video Coding) systems for these applications, first, the proposed system evaluates the quality of previously reconstructed Wyner-Ziv frame in a block unit and then, estimates the unreliable blocks of the current Wyner-Ziv frame by exploiting temporal correlation between the previously reconstructed Wyner-Ziv frame and the generated side information. The block location information of the unreliable blocks is provided to the encoder side and thus it enables the encoder to selectively encode the unreliable blocks of the Wyner-Ziv frame. Through several simulations, it is shown that the coding efficiency of the proposed scheme is greatly improved, compared to the conventional DVC scheme.

2024-01-29 16:48
Scalable video coding with large block for UHD video

Ultra-high definition (UHD) which has 4 to 16 times as many pixels as existing high definition (HD) is expected as a next generation video format. To deliver UHD and HD videos simultaneously in the communications-broadcasting convergence environment, scalable video coding (SVC) is a highly attractive solution. We propose an improved scalable video coding method to achieve high coding efficiency particularly for UHD and HD videos. The basic idea is to allow large block size beyond the block size of 16×16 pixels in H.264/AVC SVC, which results in more efficient inter-layer prediction and syntax elements coding. The experimental results show that it achieves an average 5.34% reduction in BD-rate relative to H.264/AVC SVC.

2024-01-29 16:45
Adaptive Pre-/Post-Filters for NRT-BASED Stereoscopic Video Coding

Non-real-time delivery of stereoscopic video has been considered as a service scenario for 3DTV to overcome the limited bandwidth in the terrestrial digital television system. A hybrid codec combining MPEG-2 and H.264/AVC has been suggested for the compression of stereoscopic video for 3DTV. In this paper, we propose a stereoscopic video coding scheme using adaptive pre-/post-filters (APPF) to improve the quality of 3D video while retaining compatibility with legacy video coding standards. The APPF are applied adaptively to blocks of various sizes determined by the macroblock coding mode and reference frame index. Experiment results show that the proposed method achieves up to 24.86% bit rate savings relative to a hybrid codec of MPEG-2 and H.264/AVC including the inter-view prediction.

2024-01-29 16:41
Highly Efficient Video Codec for Entertainment-Quality

We present a novel video codec for supporting entertainment-quality video. It has new coding tools such as an intra prediction with offset, integer sine transform, and enhanced block-based adaptive loop filter. These tools are used adaptively in the processing of intra prediction, transform, and loop filtering. In our experiments, the proposed codec achieved an average reduction of 13.35% in BD-rate relative to H.264/AVC for 720p sequences.

2024-01-29 16:37
Loss-aware rate-distortion optimization for redundant picture allocation in H.264/AVC

A redundant picture is one of the H.264/AVC tools for increasing error resiliency when video is delivered over error prone environments. We present a loss-aware redundant picture allocation method that determines whether the redundant picture is inserted for each primary coded picture or not. The determination is based on an error rate of transmission network and the distortion of decoded picture caused by the error. Simulation results showed that the proposed method alleviates the distortion and, thereby, it achieves higher quality of the decoded picture than the conventional methods, including the hierarchical redundant picture. In particular, the proposed method produces outstanding results at low bit rates; thus, the method is highly applicable to low bit-rate wireless video transmission.

2024-01-29 16:30
Tiny Drone Tracking Framework Using Multiple Trackers and Kalman-based Predictor

Unmanned aerial vehicles like drones are one of the key development technologies with many beneficial applications. As they have made great progress, security and privacy issues are also growing. Drone tacking with a moving camera is one of the important methods to solve these issues. There are various challenges of drone tracking. First, drones move quickly and are usually tiny. Second, images captured by a moving camera have illumination changes. Moreover, the tracking should be performed in realtime for surveillance applications. For fast and accurate drone tracking, this paper proposes a tracking framework utilizing two trackers, a predictor, and a refinement process. One tracker finds a moving target based on motion flow and the other tracker locates the region of interest (ROI) employing histogram features. The predictor estimates the trajectory of the target by using a Kalman filter. The predictor contributes to keeping track of the target even if the trackers fail. Lastly, the refinement process decides the location of the target taking advantage of ROIs from the trackers and the predictor. In experiments on our dataset containing tiny flying drones, the proposed method achieved an average success rate of 1.134 times higher than conventional tracking methods and it performed at an average run-time of 21.08 frames per second.

2022-02-10 18:14
Enhanced Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Recently, the demand for high-quality video content has rapidly been increasing, led by the development of network technology and the growth in video streaming platforms. In particular, displays with a high refresh rate, such as 120 Hz, have become popular. However, the visual quality is only enhanced if the video stream is produced at the same high frame rate. For the high quality, conventional videos with a low frame rate should be converted into a high frame rate in real time. This paper introduces a bidirectional intermediate flow estimation method for real-time video frame interpolation. A bidirectional intermediate optical flow is directly estimated to predict an accurate intermediate frame. For real-time processing, multiple frames are interpolated with a single intermediate optical flow and parts of the network are implemented in 16-bit floating-point precision. Perceptual loss is also applied to improve the cognitive performance of the interpolated frames. The experimental results showed a high prediction accuracy of 35.54 dB on the Vimeo90K triplet benchmark dataset. The interpolation speed of 84 fps was achieved for 480p resolution.

2022-02-10 18:13
Novel video coding methods for versatile video coding

Versatile video coding (VVC), which is the next generation video coding standard, is being developed to provide greater coding efficiency than existing video coding standards. In VVC, various coding tools related to intra and inter prediction modes have been adopted. This paper introduces several methods that improve coding efficiency or reduce computational complexity on top of VVC adopted tools. The first method enhances the most probable mode list derivation with the statistics of the intra modes of neighbouring blocks. The second method reduces the number of contexts of the merge with motion vector difference mode. The third method excludes invalid block vector predictors early for the intra block copy mode to improve block vector coding. The experimental results show that the three proposed methods show coding efficiencies of –0.05% for all intra coding, –0.02% for random access, and –0.14% for random access coding scenarios, respectively.

2022-02-10 18:11
A high-quality frame rate up-conversion technique for Super SloMo

In this paper, we propose several methods to improve Super SloMo, a deep learning-based frame rate up-conversion technique for the temporal quality improvement of video. In the proposed methods, the training dataset and hyper-parameter are changed and trained to obtain optimal results while maintaining the existing network structure of Super SloMo. The first method improves the cognition of images when trained with the validation set of characteristics similar to the training set. The second method reduces video loss in all validation sets when trained by adjusting the hyper-parameters of the error function value. The experimental results show that the two proposed methods improved the peak signal-to-noise ratio and the mean of the structural similarity index by 0.11 dB and 0.033% with the specialised training set and by 0.37 dB and 0.077% via adjusting the reconstruction and warping loss parameters, respectively.

2022-02-10 18:08
Relative SATD-based Minimum Risk Bayesian Framework for Fast Intra Decision of HEVC

High Efficiency Video Coding (HEVC) enables significantly improved compression performance relative to existing standards. However, the advance also requires high computational complexity. To accelerate the intra prediction mode decision, a minimum risk Bayesian classification framework is introduced. The classifier selects a small number of candidate modes to be evaluated by a rate-distortion optimization process using the sum of absolute Hadamard transformed difference (SATD). Moreover, the proposed method provides a loss factor that is a good trade-off model between computational complexity and coding efficiency. Experimental results show that the proposed method achieves a 31.54% average reduction in the encoding run time with a negligible coding loss of 0.93% BD-rate relative to HEVC test model 16.6 for the Intra_Main common test condition.

2022-02-10 18:06
FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

Multiple reference pictures and variable prediction block sizes in motion estimation/compensation (ME/MC) adopted in video coding standards such as H.264/AVC and H.265/HEVC achieve high coding efficiency, but these tools require heavy encoding complexity. This paper introduces a reference picture selection method based on the spatial correlation between neighboring coded blocks and the temporal correlation between the reference pictures of a reference picture list. This method can reduce the number of reference picture to be searched in ME process. This reduction provides competitive performance with reduced computational complexity. Experimental results show that the proposed method reduces encoding run-time by 47%, with a negligible degradation of coding efficiency.

2022-02-10 18:04
Analysis of Implementing Mobile Heterogeneous Computing for Image Sequence Processing

On mobile devices, image sequences are widely used for multimedia applications such as computer vision, video enhancement, and augmented reality. However, the real-time processing of mobile devices is still a challenge because of constraints and demands for higher resolution images. Recently, heterogeneous computing methods that utilize both a central processing unit (CPU) and a graphics processing unit (GPU) have been researched to accelerate the image sequence processing. This paper deals with various optimizing techniques such as parallel processing by the CPU and GPU, distributed processing on the CPU, frame buffer object, and double buffering for parallel and/or distributed tasks. Using the optimizing techniques both individually and combined, several heterogeneous computing structures were implemented and their effectiveness were analyzed. The experimental results show that the heterogeneous computing facilitates executions up to 3.5 times faster than CPU-only processing.

2022-02-10 18:01
Fast Transform Unit Decision for HEVC

For the High Efficiency Video Coding (HEVC) standard, a fast transform unit (TU) decision method is proposed. HEVC defines the TU representing a region sharing the same transformation, and it supports various transform sizes from 4×4 to 32×32 by using a quadtree of TUs. The various sizes of TUs can provide good coding efficiency, whereas it may increase dramatically encoding complexity. Assuming that a TU with highly compacted energy is unlikely to be split, the proposed method determines an appropriate TU size according to the position of the last non-zero transform coefficient and the number of zero transform coefficients. Experimental results show that this reduces encoding run time by 14% with a negligible coding loss of 0.38% BD-rate from Random_access_main case.

2022-02-10 17:56

view more

국내논문

view more

News

News 더보기

Research

우리 연구실의 연구정보를 안내합니다.

자세히 보기

Quick Menu