top of page

Online Data Organizer: Micro-video Categorization by Structure-guided Multimodal Dictionary Learning

abstract

Abstract

Micro-videos have fast become one of the most dominant trends in the land of social media. Accordingly, how to organize them draws our attention. Distinct from the traditional long videos that would have multi-site scenes and tolerates the hysteresis, a microvideo: 1) usually records contents at one specific venue within a few seconds, whereby the venues are structured hierarchically regarding their category granularity. The geo-nature of micro-videos make it possible to organize them via their venue structure. 2) demands timely propagation over social circles. Thus the timeliness of microvideos desires effective online processing. However, only 1.22% of micro-videos are labeled with venue information when uploaded at the mobile end. To address this problem, we present a framework to organize micro-videos online. In particular, we first build a structureguided multi-modal dictionary learning model to learn the conceptlevel micro-video representation by jointly considering their venue structure and modality relatedness. We then develop an online learning algorithm to incrementally and efficiently strengthen our model, as well as categorize the micro-videos into a tree structure. Experiments on a real-world dataset validate our model well. In addition, we release the codes to facilitate other researchers.

pipeline

Pipeline

framework2.png
algorithm

Algorithm

In this part, we show  the INTIMATE algorithm for you. The Algorithm 1 detailed describe the main pipeline of our method  and  Algorithm 2 shows the proposed tree-guided multi-modal dictionary learning.   And all the equations used here you can find in our paper.

QQ截图20170419104025.png
Dataset

Dataset

In our work, we use the dataset proposed by the work of the paper "Shorter-is-Better: Venue Category Estimation from Micro-Video". They crawled the micro-videos from Vine through its public API (https://github.com/davoclavo/vinepy). In particular, they first manually chose a small set of active users from Vine as our seed users. They then adopted the breadth-first strategy to expand our user sets via gathering their followers. They terminated their expansion after three layers. For each collected user, they crawled his/her published videos, video descriptions and venue information if available. In such way, they harvested 2 million micro-videos. Thereinto, only about 24,000 micro-videos contain Foursquare check-in information. After removing the duplicate venue IDs, they further expanded their video set by crawling all videos in each venue ID with the help of vinepy API. This eventually yielded a dataset of 276,264 videos distributed in 442 Foursquare venue categories. Each venue ID was mapped to a venue category via the Foursquare API (https://developer.foursquare.com/categorytree), which serves as the ground truth.  And 99.8% of videos are shorter than 7 seconds. 

​

      We adopt 10 fold validation method to test the effective of our proposed model. For each fold, we randomly select 5,396 videos as our offline data for the Tree-guided multi-modal dictionary learning, 10,807 videos as our online learning data for the online dictionary update and 2,170 videos as the test set.

Code

Download

Our code can be available in here:

    Matlab version:  INTIMATE.rar

      Dataset:  Compelted_dataset.rar

                        We have completed the missing data of raw feature.

The code of baselines can be available in here:

Contact
Contact

Copyright (C) <2017>  Shandong University

 

This program is licensed under the GNU General Public License 3.0 (https://www.gnu.org/licenses/gpl-3.0.html). Any derivative work obtained under this license must be licensed under the GNU General Public License as published by the Free Software Foundation, either Version 3 of the License, or (at your option) any later version, if this derivative work is distributed to a third party.

 

The copyright for the program is owned by Shandong University. For commercial projects that require the ability to distribute the code of this program as part of a program that cannot be distributed under the GNU General Public License, please contact <mengliu.sdu@gmail.com> to purchase a commercial license.

  • Black Facebook Icon
  • Black Twitter Icon
  • Black Instagram Icon
  • Black YouTube Icon
  • Black Google+ Icon

500 Terry Francois Street

San Francisco, CA 94158

​​

Tel: 123-456-7890

Fax: 123-456-7890

​

info@mysite.com

Thanks for submitting!

bottom of page