Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Published in CVPR Oral, 2022

This paper is about repetitive action counting(RAC). Previous studies have focused on performing RAC in short videos, which is challenging when dealing with longer videos in more realistic scenarios, such as interruptions during actions or inconsistent action cycles. In our data-driven deep learning approach, we introduced a new repetitive action counting dataset with fine-grained annotations. To address repetitive action counting in more realistic scenarios, we propose encoding multi-scale temporal correlations with transformers that consider both performance and efficiency. Inspired by crowd counting, we designed a method based on density map regression to predict the action periods with the assistance of fine-grained annotations. Our approach yields better performance with sufficient interpretability and achieves SoTA results. The paper was accepted to CVPR2022 as an oral presentation.

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos.

Published in CVPR, 2023

This paper is about weakly supervised video representation learning, where the accurate time-stamp level text-video alignment is not provided. Borrowing ideas from CLIP, we aggregate frame-level features for video representation and encode the texts corresponding to each action and the whole video, respectively. We design a multiple granularity contrastive learning loss which uses the fact that video actions happen sequentially in the temporal domain to generate pseudo frame-sentence correspondence. Extensive experiments on video sequence verification and text-to-video matching show the effectiveness of our proposed approach. The paper was accepted to CVPR2023.

RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

Published in 3DV, 2024

This paper is about style-consistent and shape-compatible indoor scene generation. Indoor scene generation aims at creating shapecompatible, style-consistent furniture arrangements within a spatially reasonable layout. However, most existing approaches primarily focus on generating plausible furniture layouts without incorporating specific details related to individual furniture pieces. To address this limitation we propose a two-stage model integrating shape priors into the indoor scene generation by encoding furniture as anchor latent representations.

Tool-LMM: A Large Multi-Modal Model for Tool Agent Learning

Published in Preprint, 2024

This paper is about Tool Agent Learning based on a large multi-modal model. The astonishing performance of large language models (LLMs) in natural language comprehension and generation tasks triggered lots of exploration of using them as central controllers to build agent systems. Multiple studies focus on bridging the LLMs to external tools to extend the application scenarios. To remedy the previous works only accepting single text instructions, in this paper, we introduce a novel system, Tool-LMM, integrating multi-modal encoders with opensource LLMs to synthesize multi-modal information for correct external tool identification.

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.