Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in CVPR Oral, 2022
This paper is about repetitive action counting(RAC). Previous studies have focused on performing RAC in short videos, which is challenging when dealing with longer videos in more realistic scenarios, such as interruptions during actions or inconsistent action cycles. In our data-driven deep learning approach, we introduced a new repetitive action counting dataset with fine-grained annotations. To address repetitive action counting in more realistic scenarios, we propose encoding multi-scale temporal correlations with transformers that consider both performance and efficiency. Inspired by crowd counting, we designed a method based on density map regression to predict the action periods with the assistance of fine-grained annotations. Our approach yields better performance with sufficient interpretability and achieves SoTA results. The paper was accepted to CVPR2022 as an oral presentation.
Published in CVPR, 2023
This paper is about weakly supervised video representation learning, where the accurate time-stamp level text-video alignment is not provided. Borrowing ideas from CLIP, we aggregate frame-level features for video representation and encode the texts corresponding to each action and the whole video, respectively. We design a multiple granularity contrastive learning loss which uses the fact that video actions happen sequentially in the temporal domain to generate pseudo frame-sentence correspondence. Extensive experiments on video sequence verification and text-to-video matching show the effectiveness of our proposed approach. The paper was accepted to CVPR2023.
Published in 3DV, 2024
This paper is about style-consistent and shape-compatible indoor scene generation. Indoor scene generation aims at creating shapecompatible, style-consistent furniture arrangements within a spatially reasonable layout. However, most existing approaches primarily focus on generating plausible furniture layouts without incorporating specific details related to individual furniture pieces. To address this limitation we propose a two-stage model integrating shape priors into the indoor scene generation by encoding furniture as anchor latent representations.
Published in Preprint, 2024
This paper is about Tool Agent Learning based on a large multi-modal model. The astonishing performance of large language models (LLMs) in natural language comprehension and generation tasks triggered lots of exploration of using them as central controllers to build agent systems. Multiple studies focus on bridging the LLMs to external tools to extend the application scenarios. To remedy the previous works only accepting single text instructions, in this paper, we introduce a novel system, Tool-LMM, integrating multi-modal encoders with opensource LLMs to synthesize multi-modal information for correct external tool identification.
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.