@snuffles 2016-12-15T12:06:49.000000Z 字数 5852 阅读 952

mac

未分类

1

good afternoon ,
my name is Jiayao
my advisor is professor Liu
I’m gald to present my master thesis proposal
using semantic feature in monocular SLAM
for indoor environments.

it will contains 4 parts.
The first part is background and significant.

From the last century util now.

The robot located on the assenmble line
Come into the real world.
someone give named the huge improvement :the spring of robot.
SLAM is an essential component of the spring of robot.
an autonomous robots
In the picture on the right is the boston dynamic robot
It can exploit the environment and and go percifict place.
Using the SLAM techonogy .

What is slam.
slam is short for the simutanousl l and map
it answer the question of
where am i
and what around me.

simultaneously can minimizaiton both the lolication and mapping error

the slam system
in the robot see and act cycle,
in a important position

the input of slam is the sensor data from the real world.

the output of slam is the map and robot localization.

as mentioned that
there are lots of aplication to slam
not only in different kinds of robot,
but also virtual or augmented reality. which
need the 3D estimation of the surrounding scencec and the pose of the device form sensing data, sequentially and in real-time.

slam can be divided into diffent kind by
the environment and sensor.

in our research we focus on the
indoor ground robot which only use the monocluar camera.

the indoor enviroment means it would contain many man-made things in the environment

not like the laser, camear is more cheap and
useful for the robot,
as well as we should concern the compution of SLAM.

the visual slam can be divide into 3 parts.
the filter based and the keyframe ba
and the direct tracking.

filter based method

Visual SLAM solutions are either filter-based
(e.g., Kalman filter, Particle filter)
or non-filter-based
(i.e., posing it as an optimization problem).
Figure 1a shows the data links between different components of filter-type systems; the camera
pose Tn with the entire state of all landmarks in the map
are tightly joined and need to be updated at every processed
frame. In contrast to non-filter-based systems (shown in Fig.
1b), where the data connections between different components
allow the pose estimate of the camera at Tn to be estimated
using a subset of the entire map, without the need to
update the map’s data at every processed frame.

As a consequence
of these differences, Strasdat et al. in 2010 proved
that non-filter based methods out perform filter-based ones.

It is therefore not surprising that since then, most new releases
of Visual SLAM systems are non-filter-based (see Table 1).
In this paper we will focus on analyzing only non-filterbased
techniques and for filter-based ones we will suffice
on listing them.

http://qiqitek.com/blog/?p=93

One of the hardest challenges in monocular SLAM is the
estimation of a fully dense map of the imaged scene.
Pixels in textureless areas cannot be reliably matched across views
and standard 3D reconstructions from monocular SLAM are
limited to areas of high photometric gradients.

Rely on the scence texture, very sensity to the salients points in the scence.
Speciall indoor scence, there are lots of walls that have little features to track, traditional SLAM always track and map fail.
The dense SLAM can get the map, but has high compution, usually compute by GPU, not real time on the robot.

Most man-made environments, such as urban and indoor scenes, consist of a set of parallel and orthogonal planar structures. These structures are approximated by Manhattan world assumption and be referred to Manhattan Frame (MF). Given a set of inputs such as surface normals or vanishing points, we pose an MF estimation problem as a consensus set maximization that maximizes the number of inliers over the rotation search space. Conventionally this problem can be solved by a branch-and-bound framework which mathematically guarantees global optimality. However, the computational time of the conventional branch-and-bound algorithms is rather far from real-time performance. In this paper, we propose a novel bound computation method on an efficient measurement domain for MF estimation, i.e., the extended Gaussian image (EGI). By relaxing the original problem, we can compute the bounds in real-time performance, while preserving global optimality. Furthermore, we quantitatively and qualitatively demonstrate the performance of the proposed method for various synthetic and real-world data. We also show the versatility of our approach through three different applications: extension to multiple MF estimation, video stabilization and line clustering.

Select 100-1000 representative points (or lines, planes) and discard the others.
Estimation the motion from the key-points.
Track the key-points using descriptors.
Sparse
Robust to outliers

Estimate the motion directly from pixels.
Use all information from images.
Dense
Slower
Difficult to remove the outliers
Needs good initialization

Large Scale Direct Monocular SLAM (LSD-SLAM) is another type of monocular slam that has been developed to allow the building of large-scale consistent maps of the environment
(1). Instead of using keypoints to create an abstraction of the images being inputted, it works directly on the images using the intensities of the image to track and map
(2). This method is found to be extremely powerful as it allows to map a large area, and doesn't require any specialised hardware – it can be done on a typical modern smartphone in real time

Estimate the ego-motion between frames.
Basically Two-view geometry

Minimization the gray scale values of pixels.
Assume the camera moves slowly, smoothly and the light condition does not change much.
Reconstruct dense results instead of sparse feature points.

Take the advantage of the contour, plannar, object information, remove the rely on the feature, improve the alg robustness.
It based on the semi-dense SLAM, which is direct method implicitly use the edge information.
Take in the superpixel segment, assume that the color homugenous, optimize the traditional superpixel tracking method.
Take the layout information, such as the room is usually a 3D box, and object recognition such as furniture, and other middle level semantic feature to tracking and mapping

Dense SLAM: Differently from the above ones, dense
visual SLAM methods aim to estimate a depth for every pixel
both high and low-gradient ones. [7], [15] where the first
ones presenting dense results in real-time using a monocular
camera. They not only minimize the difference between
image intensities, but include a regularization term enforcing
smooth solutions. This latest term is crucial for reconstructing low-gradient pixels. GPU processing is usually required
to achieve real-time.

What is super pixel

We assume that superpixelscorrespond to planar surfaces, use a homography model andminimize the distance between the contours.

mac

1

simultaneously can minimizaiton both the lolication and mapping error

the output of slam is the map and robot localization.

内容目录