ios – Getting Python using os.walk() to generate a textfile containing the directory structure for a given path


A shot in the dark without more information, but I’m guessing that your error looks like this:

Traceback (most recent call last):
  File "C:\Users\BoppreH\tmp\dirs.py", line 16, in <module>
    record_directory_structure("path", "DirectoryStructure.txt")
  File "C:\Users\BoppreH\tmp\dirs.py", line 14, in record_directory_structure
    file.write('{}{}\n'.format(subindent, f))
  File "C:\Python312\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 4-11: character maps to <undefined>

The issue is at this line:

    with open(file_name, "w") as file:

When you open a text file, Python needs to know the encoding for that file. The encoding is a way to map logical characters like “letter ‘a’ with acute accent” into actual bytes that you can write to a file. Python strings are UTF-8, which is extremely versatile, but has many characters that are simply untranslatable to other encodings.

The default encoding used by open depends on the environment, including operating system, language, and what terminal you’re using. You can check what’s your default encoding with this snippet:

import locale
print(locale.getpreferredencoding())
# Prints "cp1252" on my Windows machine, *not* UTF-8.

So what happens is that Python saw a file with Cyrillic/Greek characters, and created strings from it just fine (UTF-8). But when writing to the text file, the default encoding has no way to represent those characters. Hence the error.

The solution is to simply specify the file encoding to be UTF-8 too:

def record_directory_structure(directory_path: str, file_name: str):
    #       Add this:         vvvvvvvvvvvvvvvv
    with open(file_name, "w", encoding='utf-8') as file:
        for root, dirs, files in os.walk(directory_path):
            level = root.replace(directory_path, '').count(os.sep)
            indent=" " * 4 * (level)
            file.write('{}{}/\n'.format(indent, os.path.basename(root)))
            subindent=" " * 4 * (level + 1)
            for f in files:
                file.write('{}{}\n'.format(subindent, f))

Now you’re free to write UTF-8 strings to the file, because Python doesn’t need to translate the encoding.

Latest articles

spot_imgspot_img

Related articles

Leave a reply

Please enter your comment!
Please enter your name here

spot_imgspot_img