Registered Member
|
I've wondered for years how come KDE still depends (mainly) on the file extention to determine a file's MIME type.
Not only is it possible that some files are named wrongly and could either not work properly -- at least not out of the box -- or even hide malicious code; it's also not unusual that a certain extention can be used to annotate several MIME/file types. The interesting thing is a small CLI utility -- called file that should come with every GNU/Linux system, already exists and does an amazing job at guessing the MIME type. For example it can tell apart plain TeX and LaTeX. It does so by analysing the contents of the file and matches it to a collection of patterns that are typical of certain MIME/file types. Implementing this or a similar improved method of detecting the right MIME/file type -- at least in places where it's more important -- would be great! It should also be possible to include patterns that would e.g. tell it that the package is a plasmoid or emoticon archive etc. and not just a generic tarball.
It's time to prod some serious buttock!
|
Registered Member
|
I had the opposite impression, KDE seems very very good at detecting a proper file type despite its extension. For example I have accidentally set KSnapshot name each file window.pnx where x is a number, and Dolphin properly identifies the file window.pnx files as PNGs.
Proudly dual-booting openSUSE 11.1 with KDE 4.3 and Windows Vista on a Toshiba A205-S4577 since July 2007.
|
KDE Developer
|
KDE has had that for years. I am not sure when it was introduces, as in whether it was already in KDE2 or if it got added in KDE3.
KDE4 switched to a different implementation also used by other Free Software desktop projects, e.g. GNOME, called the Shared Mime Info specification and database http://www.freedesktop.org/wiki/Specifi ... -info-spec It contains rules on how to match "globs" (file patterns), when extensions can have different meanings (e.g. rpm), how certain types are related (a vcard contact file is a text file, etc), which part of the content can be used to identify a type, etc. Which of the available methods is being using depends a lot on the context. An application might access data from a source which already provides a MIME type, e.g. a web server sending a content type. File name extensions are primarily used for speed on local files, since it is information already available when listing the file. Inspecting each file would result in lots of "seek" operations on the medium (e.g. harddisc) and would slow down listing operations considerably. It is basically a tradeoff based on the most common cases, i.e. extensions mostly correct, ambiguous extensions know (so those few files can be inspected). Ideally this information would be available along the files, i.e. in extended attributes of the file system. However, while most modern file system are actually capabable of that, most data has been transferred from older ones, or from systems not capabable of doing that, etc. Applications actually reading the files can always do an inspection if they want to handle different types differently. The situation might improve in the future through the use of "semantic data", e.g. keeping information about stuff around so it can be retrieved again later on without the performance penalty currently associated with more advanced ways of detection. Cheers, _
anda_skoa, proud to be a member of KDE forums since 2008-Oct.
|
KDE Developer
|
When I've files without an extension they're often displayed as folders. And when I want to open it I get sometimes the message: "Directory Expected" and it doesn't work.
|
Registered Member
|
I don't know - I have never really had problems with files with no extensions, and most of my music/video just ends in .ogg and KDE can tell the difference easily.
For example, word/text documents and images are all recognized even without an extension. Nepomuk was also what I had considered with this: if most-all the rest of the file's information is already stored and retrieved (When I tried Dolphin's search, it would match dates, content, comments, tags, names and even image dimensions), then why not the file type? Of course, KDE would default to the old system if Nepomuk didn't work for some reason, or did run but didn't report the type properly for some reason... I also noticed that Nepomuk DOES lump together all files with a bunch of different file types as, "Music" quite reliably. Perhaps this functionality could also be used in one of my previous ideas (This one). If it did, it would better fade the line between 'files' and, 'content', which is always nice. Hey, that's actually pretty good. Should I make that a new idea?
Madman, proud to be a member of KDE forums since 2008-Oct.
|
Registered Member
|
@anda_skoa:
The fact that KDE has joined forces with GNOME, LXDE and others on the common FreeDesktop.org MIME specification is great, but as you said yourself, there's still room for improvement. I skimmed through the specification and from what I can tell, the "glob" part is basically just a more elaborate way to tell file extensions apart, while the real matching of the patterns is done by the "magic" specification. I do realise that inspecting each and every file's content for it to match a specific MIME "magic" would be both a waste of CPU as well as the user's time. Currently one of the bigger problems though is that the user him-/her-self cannot (that I know of how) specify his/her prefered action for a file, by its "magic" properties. You mention that file's MIME type should (and can) be stored in the extended attributes of modern file systems. Would this mean that KDE4 (and/or other FD.org-compatible DE's) can already store and read MIME types into the extended attributes of e.g. Ext4 or BtrFS? I've already seen Alessandro's SmartSave and it looks very practical, but this would still mean that KDE (and/or apps themselves) would need to figure out the right MIME type on save for NEPOMUK to later catch on. Either that, or the user would have to change it every time by hand. I do have high hopes for semantic data in KDE in the foreseeable future, but all that I'm saying is that this needs to be tackled the right way and I hope people smarter then me find it. :]
It's time to prod some serious buttock!
|
Registered Member
|
It's this bit I disagree with. Changing it once would change the entry in the database, which would then be pulled up in the future by KDE.
Madman, proud to be a member of KDE forums since 2008-Oct.
|
Registered Member
|
But unless the the app adds to the database the pattern on how to tell which MIME type it is or write the MIMe type in the file system extended data, I don't see how it would work reliably. (...I could be terribly wrong though.)
It's time to prod some serious buttock!
|
Registered Member
|
To clarify some of the previous comments, KDE will open files without an extension correctly, but it chokes on files with an incorrect extension (e.g. a text file named foo.png will still open in Gwenview).
I'm not sure how GNOME handles this, but last time I used it, it behaved differently. In particular, I had some fake "GIF"s that were part of a driveby attack that my university was experiencing. They were actually full of malicious javascript code. If I tried to open them by double-clicking in Nautilus, it would detect that the file was not, in fact, a GIF, and pop up a message warning the user, with the option to manually open it as normal. If the exploit had been for Lin/FF, rather than Win/IE, I would have just been saved some misery. I think your DE letting you know that Something Is Wrong is a good thing, as long as it's done right. (e.g. with usability taken into account) Are there design/technical issues that prevent KDE from implementing a similar system?
- Jeffery MacEachern
http://ffejery.creativemisconfiguration.com/ |
KDE Developer
|
The case is: KDE relies on a file extension at first. If there is none or an unknown one then it actually skimms through the file detecting its file type by magic numbers. So I often end up detecting that game files are often just zip files with a different extension.
The file command though always lists the file correctly, no matter what is its filename “extension”. The problem often is that even KDE itself often relies on file extensions. For example the file chooser for a Plasma Desktop background image states “image files (*.jpg, *.png)” and so on. Also some plasmoids for displaying files only allow you to filter by file name. Fortunately, the folder view allows you to check each file type instead of having to add a wildcard filter. |
Registered Member
|
I don't think the question is whether it can be done, the question is whether it's practical to scan every file in every folder of the user's home directory. Have you tried doing that with the, "file" command?
Either way, a good way to start would be if KDE applications would save the file's MIME information somewhere separate from the extension, such as Nepomuk or supported file system's metadata storage. For example, opening/saving an extension from KMail might result in KMail scanning the file to figure out the MIMEtype then save that information in Nepomuk. At least then, KDE wouldn't need to scan every single file every time you e.g. opened a directory, and would instead refer to the appropriate database of files and associated MIMEtypes. Then, KDE could prefer whichever database it's used over file extensions, then revert to the current methods for any file that isn't listed in the database.
Madman, proud to be a member of KDE forums since 2008-Oct.
|
Registered users: Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]