I recently invested in my home setup and backup with a Synology Nas. I did this because I have many digital items that are extremely valuable to me. This includes home movies from when I was just a toddler and a large set of pictures of my gap year in South America. This allows me to keep these items with Raid 1 at my house and synced to my dropbox in case of destruction of the physical object.
In addition to the items previously mentioned, I have maintained my music collection since I was in early high school. Backing this up would be as easy as dragging and dropping onto the NAS GUI. The problem lies with the constant revisions that happen. I buy new albums/songs, and make playlists. I don't want to recopy the files every time, that would be BORING! I was looking for an excuse to do a little python scripting and this was a perfect scenario.
Part of my Home Office
I knew these would be the general steps after copying the library to the NAS for the first time:
1. get a list of files/directories on computer
2. get a list of files/directories on NAS
3. find the differences
4. copy/update/delete on the NAS to reflect the same file structure as computer
The idea is simple enough right? Let's get started.
Mac and iTunes contain some files that I don't want to save. I avoided these with an ignored files list, this list is always checked before adding a file to the index. We will also uses some these constants shown as keys later. In addition, let's define a local path for the mac index, shared folder path on the nas and a nested folder within the shared folder. The server path can only be one shared folder, hence defining a second nested constant to define the complete path.
FILES = 'files'
SUBDIRS = 'subdirs'
INDEX = 'index'
DELETED_DIRS = 'deleted_dirs'
DELETED = 'deleted'
CREATED = 'created'
UPDATED = 'updated'
start_path = 'itunes_path'
server_path = 'my_shared_folder'
additional_global_nested = 'iTunes/'
ignored_files = ['.DS_Store', '.ini', '.iTunes Preferences.plist']
I went about getting a directory index on my own before I realized somebody else has probably already done that before. I found this helpful blog. To compute a directory index, we can use the os.walk(path) method. This gives the root, directories and filenames in the path. I used basically the same function to create a map with the files and subdirectories. In addition, I added a modification with os.path.getmtime(path). This is to be used later for calculating the directory. See the function with a sample index.
def compute_dir_index(p):
files = []
subdirs = []
for root, dirs, filenames in walk(p):
for subdir in dirs:
if subdir not in ignored_files:
subdirs.append(path.relpath(path.join(root, subdir), p))
for f in filenames:
if f not in ignored_files:
files.append(path.relpath(path.join(root, f), p))
index = {}
for f in files:
index[f] = int(round(path.getmtime(path.join(p, f))))
return { FILES: files, SUBDIRS: subdirs, INDEX: index }
##############
{'files': ['068 Radio_Video.m4a', '071 Question_.m4a', '066 Revenga.m4a', 'tottenham/sonn-1.png', 'tottenham/test.txt'], 'subdirs': ['tottenham'], 'index': {'068 Radio_Video.m4a': 1550444133, '071 Question_.m4a': 1550444131, '066 Revenga.m4a': 1550444130, '067 Cigaro.m4a': 1550444134, 'tottenham/sonn-1.png': 1550443813, 'tottenham/test.txt': 1550444133}}
Computing the index on the NAS is the same idea but a little more complicated because it's necessary to connect and behaves differently than macOS. After a little research, I discovered the pysmb SMB package from pip for python3. I'm still not sure why its on port 445 because my NAS runs on 5000 but stackoverflow was my savior once again. listPath is similar to walk and does the following:
listPath
(service_name,path,search=65591,pattern='*',timeout=30)Retrieve a directory listing of files/folders at path
The only caveat is that it only lists immediate directory. For each folder it's necessary to use the function recursively. Therefore we define the initial empty index before calling the function and add to it. As mentioned previously, the server path has to be a single shared folder. Hence the reason for the additional_global_nested and additionalSlash variables.
conn = SMBConnection(my_user_name, my_password, 'andrew', 'nas', '', is_direct_tcp=True)
conn.connect('192.168.100.3', 445)
server_index = { FILES: [], SUBDIRS: [], INDEX: {}}
...
def compute_server_index(p):
server_items = conn.listPath(server_path, additional_global_nested + p)
additionalSlash = '/' if p else ''
for item in server_items:
if item in ignored_files or item.filename[0] == '.':
continue
if item.isDirectory:
newPath = p + additionalSlash + item.filename
server_index[SUBDIRS].append(newPath)
compute_server_index(newPath)
else:
filePath = p + additionalSlash + item.filename
server_index[FILES].append(filePath)
server_index[INDEX][filePath] = int(round(item.last_write_time))
######
{'files': ['tottenham/sonn-1.png', 'tottenham/test.txt', '071 Question_.m4a', '064 Soldier Side _Intro_.m4a', '067 Cigaro.m4a', '066 Revenga.m4a', '068 Radio_Video.m4a'], 'subdirs': ['tottenham'], 'index': {'tottenham/sonn-1.png': 1550443813, 'tottenham/test.txt': 1550444133, '071 Question_.m4a': 1550444132, '064 Soldier Side _Intro_.m4a': 1550444134, '067 Cigaro.m4a': 1550444135, '066 Revenga.m4a': 1550444131, '068 Radio_Video.m4a': 1550444134}}
We now have the information for each directory and compute the difference. Once again, this function is basically the same as the one provided by Mr. Sileo in his blog post. I did modify the update slightly to include a margin in the modification time because my directory includes over 20,000 files and takes a couple min to run.
def compute_diff(dir_start, dir_dest):
data = {}
data[DELETED] = list(set(dir_dest[FILES]) - set(dir_start[FILES]))
data[CREATED] = list(set(dir_start[FILES]) - set(dir_dest[FILES]))
data[UPDATED] = []
data[DELETED_DIRS] = list(set(dir_dest[SUBDIRS]) - set(dir_start[SUBDIRS]))
for f in set(dir_dest[FILES]).intersection(set(dir_start[FILES])):
# round to nearest hundred seconds to avoid small margin while copying
if int(round(dir_start[INDEX][f], -2)) != int(round(dir_dest[INDEX][f], -2)):
data[UPDATED].append(f)
return data
#####
{'deleted': [], 'created': ['064 Soldier Side _Intro_.m4a'], 'updated': [], 'deleted_dirs': []}
Now we have arrived at the juiciest part because we know which files need to be copied, overwritten or deleted. We will use the storeFile, deleteFiles, deleteDirectory, and createDirectory methods from pysmb. If a file was completely new in the root directory, we could simply copy it. However, we haven't defined any information on new directories, so we'll check if the directory exists before storing, and if not create the directory.
def get_dir_path(file):
f_array = file.split('/')
return '/'.join(f_array[0:len(f_array)-1])
The above function simply grabs the current directory location.
def make_dirs(dir, prev_dir):
if not len(dir):
return
current_location_dirs = []
dir_list = dir.split('/')
additionalSlash = '/' if len(prev_dir) else ''
dir_location = prev_dir + additionalSlash + dir_list[0]
currentIndex = conn.listPath(server_path, additional_global_nested + prev_dir)
for item in currentIndex:
if item.isDirectory and item.filename[0] != '.':
current_location_dirs.append(prev_dir + additionalSlash + item.filename)
if dir_location not in current_location_dirs:
try:
conn.createDirectory(server_path, additional_global_nested + dir_location)
except Exception as e:
print(e)
make_dirs('/'.join(dir_list[1:]), dir_location)
Then we can precede with the copying.
def sync_dirs(diff):
for f in diff[CREATED]:
file_location = get_dir_path(f)
make_dirs(file_location, '')
try:
with open(start_path + f, 'rb', buffering=0) as file_obj:
conn.storeFile(server_path, additional_global_nested + f, file_obj)
file_obj.closed
except IOError as e:
print(e)
Then afterwards, we can delete the files defined in the diff.
def sync_dirs(diff):
...
for f in diff[DELETED]:
try:
conn.deleteFiles(server_path, additional_global_nested + f)
except Exception as e:
print(e)
In order to coordinate the update, we need to manually update the local directory with the same time as we are updating on the server.
def sync_dirs(diff):
...
for f in diff[UPDATED]:
# set start folder with modified time of current time in seconds to match server write time
current_time = int(round(time()))
utime(start_path + f, (current_time, current_time))
try:
file_location = additional_global_nested + f
with open(start_path + f, 'rb', buffering=0) as file_obj:
conn.storeFile(server_path, file_location, file_obj)
file_obj.closed
except IOError as e:
print(e)
...
Now the only thing left is deleting the directory. When deleting directories, we may encounter errors when when the index states that both foo and foo/bar and have been deleted. The order could be potentially break here. For this reason, let's get the current list of directories before deleting to avoid any errors.
def creater_server_dir_index(p):
server_items = conn.listPath(server_path, additional_global_nested + p)
additionalSlash = '/' if p else ''
for item in server_items:
if item in ignored_files or item.filename[0] == '.':
continue
if item.isDirectory:
newPath = p + additionalSlash + item.filename
current_dir_index.append(newPath)
creater_server_dir_index(newPath)
Then we do the directory delete without throwing any errors.
def sync_dirs(diff)
....
for d in diff[DELETED_DIRS]:
global current_dir_index
current_dir_index = []
creater_server_dir_index('')
is_empty = is_empty_dir(d)
if d in current_dir_index and d:
try:
conn.deleteDirectory(server_path, additional_global_nested + d)
except Exception as e:
print(e)
All of the functions are now defined. We simply need to execute them.
compute_server_index('')
start_index = compute_dir_index(start_path)
diff = compute_diff(start_index, server_index)
sync_dirs(diff)
The final step is automating the script on my macbook so that is runs once a week. That's next on my agenda.
In order to run the script, we first need to install pysmb.
pip3 install pysmb
python3 copy_itunes.py
The final script looks like this:
from os import walk, path, utime
from time import time
from io import BytesIO
from smb.SMBConnection import SMBConnection
conn = SMBConnection(my_user_name, my_password, 'andrew', 'nas', '', is_direct_tcp=True)
conn.connect('192.168.100.3', 445)
FILES = 'files'
SUBDIRS = 'subdirs'
INDEX = 'index'
DELETED_DIRS = 'deleted_dirs'
DELETED = 'deleted'
CREATED = 'created'
UPDATED = 'updated'
ignored_files = ['.DS_Store', '.ini', '.iTunes Preferences.plist']
current_dir_index = []
server_index = { FILES: [], SUBDIRS: [], INDEX: {}}
start_path = '../../../Music/iTunes/'
server_path = 'dropbox'
additional_global_nested = 'iTunes/'
def get_dir_path(file):
f_array = file.split('/')
return '/'.join(f_array[0:len(f_array)-1])
def make_dirs(dir, prev_dir):
if not len(dir):
return
current_location_dirs = []
dir_list = dir.split('/')
additionalSlash = '/' if len(prev_dir) else ''
dir_location = prev_dir + additionalSlash + dir_list[0]
currentIndex = conn.listPath(server_path, additional_global_nested + prev_dir)
for item in currentIndex:
if item.isDirectory and item.filename[0] != '.':
current_location_dirs.append(prev_dir + additionalSlash + item.filename)
if dir_location not in current_location_dirs:
try:
conn.createDirectory(server_path, additional_global_nested + dir_location)
except Exception as e:
print(e)
make_dirs('/'.join(dir_list[1:]), dir_location)
def creater_server_dir_index(p):
server_items = conn.listPath(server_path, additional_global_nested + p)
additionalSlash = '/' if p else ''
for item in server_items:
if item in ignored_files or item.filename[0] == '.':
continue
if item.isDirectory:
newPath = p + additionalSlash + item.filename
current_dir_index.append(newPath)
creater_server_dir_index(newPath)
def compute_server_index(p):
server_items = conn.listPath(server_path, additional_global_nested + p)
additionalSlash = '/' if p else ''
for item in server_items:
if item in ignored_files or item.filename[0] == '.':
continue
if item.isDirectory:
newPath = p + additionalSlash + item.filename
server_index[SUBDIRS].append(newPath)
compute_server_index(newPath)
else:
filePath = p + additionalSlash + item.filename
server_index[FILES].append(filePath)
server_index[INDEX][filePath] = int(round(item.last_write_time))
def compute_dir_index(p):
files = []
subdirs = []
for root, dirs, filenames in walk(p):
for subdir in dirs:
if subdir not in ignored_files:
subdirs.append(path.relpath(path.join(root, subdir), p))
for f in filenames:
if f not in ignored_files:
files.append(path.relpath(path.join(root, f), p))
index = {}
for f in files:
index[f] = int(round(path.getmtime(path.join(p, f))))
return { FILES: files, SUBDIRS: subdirs, INDEX: index }
def compute_diff(dir_start, dir_dest):
data = {}
data[DELETED] = list(set(dir_dest[FILES]) - set(dir_start[FILES]))
data[CREATED] = list(set(dir_start[FILES]) - set(dir_dest[FILES]))
data[UPDATED] = []
data[DELETED_DIRS] = list(set(dir_dest[SUBDIRS]) - set(dir_start[SUBDIRS]))
for f in set(dir_dest[FILES]).intersection(set(dir_start[FILES])):
# round to nearest hundred seconds to avoid small margin while copying
if int(round(dir_start[INDEX][f], -2)) != int(round(dir_dest[INDEX][f], -2)):
data[UPDATED].append(f)
return data
def sync_dirs(diff):
print('Connection is starting sync...')
for f in diff[CREATED]:
file_location = get_dir_path(f)
make_dirs(file_location, '')
try:
with open(start_path + f, 'rb', buffering=0) as file_obj:
conn.storeFile(server_path, additional_global_nested + f, file_obj)
file_obj.closed
except IOError as e:
print(e)
for f in diff[DELETED]:
try:
conn.deleteFiles(server_path, additional_global_nested + f)
except Exception as e:
print(e)
for f in diff[UPDATED]:
# set start folder with modified time of current time in seconds to match server write time
current_time = int(round(time()))
utime(start_path + f, (current_time, current_time))
try:
file_location = additional_global_nested + f
with open(start_path + f, 'rb', buffering=0) as file_obj:
conn.storeFile(server_path, file_location, file_obj)
file_obj.closed
except IOError as e:
print(e)
for d in diff[DELETED_DIRS]:
global current_dir_index
current_dir_index = []
creater_server_dir_index('')
if d in current_dir_index:
try:
conn.deleteDirectory(server_path, additional_global_nested + d)
except Exception as e:
print(e)
print('Folders are now synced. Have a good day')
compute_server_index('')
start_index = compute_dir_index(start_path)
diff = compute_diff(start_index, server_index)
sync_dirs(diff)
Tags: Python, Sync, Synology