Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS) is a Java-based file system that provides scalable and reliable data storage that is designed to span large clusters of commodity servers. HDFS was designed to be a scalable, fault-tolerant, distributed storage system that works closely with MapReduce.

HDFS has a master/slave architecture. An HDFS cluster consists of a single Name Node (a master server that manages the file system namespace and regulates access to files by clients) and a number of Data Nodes. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks (typically 64Mb in size) and these blocks are stored in a set of Data Nodes.

The Name Node executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to Data Nodes. The Data Nodes are responsible for serving read and write requests from the file system’s clients. The Data Nodes also perform block creation, deletion, and replication upon instruction from the Name Node.

HDFS can be managed via console application (the ‘hadoop fs’ command) with a set of Linux-like commands:
Command description
Command syntax
Create directory in the given path
hadoop fs -mkdir <paths>
List the file for the given path
hadoop fs -ls <args>
Recursive version of ls. Similar to Unix ls -R.
hadoop fs -lsr <args>
Creates file in the given path
hadoop fs -touchz <path[filename]>
Copies source paths to STDOUT
hadoop fs -cat <path[filename]>
Copy files from source to destination. In case of multiple sources the destination must be a directory
hadoop fs -cp <source> <dest>
Copy single item, or multiple items from local file system to the destination file system. Also reads input from STDIN and writes to destination file system
hadoop fs -put <source:localFile> <destination>
Copy files to the local file system
hadoop fs -get  <source> <dest:localFileSystem>
Similar to put except that the source is limited to local files
hadoop fs -copyFromLocal <src:localFileSystem> <dest:Hdfs>
Similar to get except that the destination is limited to local files
hadoop fs -copyToLocal <src:Hdfs> <dest:localFileSystem>
Move file from source to destination.
Except moving files across file system is not permitted
hadoop fs -mv <src> <dest>
Remove files specified as argument.
Deletes directory only when it is empty
hadoop fs -rm <arg>
Recursive version of delete
hadoop fs -rmr <arg>
Returns the stat information on the path
hadoop fs -stat <path> hadoop fs -stat /user/akceptor/dir1
Similar to tail in Unix Command
hadoop fs -tail <path[filename]>
-e check to see if the file exists. Return 0 if true.
-z check to see if the file is zero length. Return 0 if true
-d check return 1 if the path is directory else return 0
hadoop fs -test -[ezd]<path>
Outputs the file in text format. The allowed formats are zip and TextRecordInputStream
hadoop fs -text <src>
Display the aggregate length of a file
hadoop fs -du <path>
Displays the summary of a file length
hadoop fs -dus <args>
Empty the trash
hadoop fs -expunge
Change group association of files. With -R, make the change recursively.
The user must be the owner of files, or a super-user
hadoop fs -chgrp [-R] GROUP <path>
Change the permissions of files. With -R, make the change recursively.
The user must be the owner of the file, or a super-user
hadoop fs -chmod [-R] <MODE[,MODE]  | OCTALMODE> <path>
Change the owner of files. With -R, make the change recursively.
The user must be a super-user
hadoop fs -chown [-R] [OWNER][:[GROUP]] <path>
Takes a source directory and a destination file as input and concatenates files in src dir into the destination local file.
Optionally addnl can be set to enable adding a newline character at the end of each file
hadoop fs -getmerge <src> <localdst> [addnl]
Changes the replication factor of a file. With -R, make the change recursively.
hadoop fs -setrep [-R] <path>

Videos to understand what HDFS is:
Introduction to HDFS and live demo
HDFS tutorial video
