This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Dolphin: Weird behaviour on text-files with html-code.

Tags: None
(comma "," separated)
SysGhost
Registered Member
Posts
12
Karma
0
Greetings.

Got a problem with Dolphin that has been bugging the **** out of me.
I got various text only files. For instance "textfile.tpl".

I tell Dolphin/KDE to open these files with kate. No problems.
But then, something weird happens:
Whenever this tpl-file (or other text-only files) happen to contain a beginning of a html document. (<html>, <head>, <meta>, <title> ) it changes its association to "html-document" and will open in my default web browser instead. Keep in mind that I never renamed the file. It can be named pretty much anything, and wiull still be seen as a html-document for as long it contain these html-tags

My question:
How do I force Dolphin to ignore the content of the file, and look ONLY at the filename ending?
To be honest, I don't want Dolphin to care about files contents in any case... at all.
So turning that "feature" off would be the best. How do I do that?
User avatar
google01103
Manager
Posts
6668
Karma
25

Wed Feb 12, 2014 8:58 pm
in systemsettings -> file associations create a new file type and place it in the text group with file pattern *.tpl



might need to restart Dolphin or reload the folder


OpenSuse Leap 42.1 x64, Plasma 5.x

SysGhost
Registered Member
Posts
12
Karma
0

Re:

Wed Feb 12, 2014 11:04 pm
google01103 wrote:in systemsettings -> file associations create a new file type and place it in the text group with file pattern *.tpl



might need to restart Dolphin or reload the folder


Well. Kinda works, but...
But it doesn't solve my problem where any kind of file, no matter file extension, will be treated as a html-document if it contains the mentioned tags.
With the suggested method, I have to include all possible extensions I don't want to be treated as html-documents.

I'll use this as a temporary makeshift solution for now.

EDIT:
No. It didn't work. Despite this, the files that contain the mentioned html tags are still treated as html-documents. If I try to force a change through "Open With" menu option, it'll change all html-documents, and will even change "default browser".
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
This is due to a system called "magic bytes" - which is the primary mechanism used to determine the type of a file. I'm not sure how hard it is to make extensions preferred however, but it is done for *.odt files so it should be possible.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
SysGhost
Registered Member
Posts
12
Karma
0
bcooksley wrote:This is due to a system called "magic bytes" - which is the primary mechanism used to determine the type of a file. I'm not sure how hard it is to make extensions preferred however, but it is done for *.odt files so it should be possible.


Yes. I know these magic bytes, but they're usually at the beginning of the file, and often 2 bytes long. (Sometimes longer, but the initial detection is done on the two first bytes)
Hashbangs are a good example of text/script files that utilises these magic bytes ( #! )

The problem here is different. It doesn't matter where these html tags sits. They can be anywhere in the file, and still trigger "I'm a html document" event.
If I remove these html-tags (<html>, <head>, <meta>, <title>), the file falls back to its "original" association based on the file name extension.

There is something behind the scenes that I cannot figure out.
User avatar
google01103
Manager
Posts
6668
Karma
25
text and therefore html files don't use magic bytes (numbers) so it's a different process doing the association

maybe this will help (maybe not) from /etc/htdig/HtFileType-magic.mime
note: I replaced tabs with "@"s for readability
# Magic data for for file(1) command
#
# The format is 4-5 columns:
# Column #1: byte number to begin checking from, ">" indicates continuation
# Column #2: type of data to match
# Column #3: contents of data to match
# Column #4: MIME type of result
# Column #5: MIME encoding of result (optional)
#
# Modified by <mailto:lha@users.sourceforge.net> for compatibility with
# different versions of file(1):
# - Columns are separated by TABs (for traditional versions)
# - spaces and '<'s within a column are escaped by '\' (for new versions)
# - Hex numbers in strings are given as '\0x' (traditional) and '\x' (new)
# - Null characters (\000) traditionally terminate strings, but now don't
<snip>
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# html: file(1) magic for HTML (HyperText Markup Language) docs
#
# from Daniel Quinlan <quinlan@yggdrasil.com>
# modified by Lachlan Andrew <lha@users.sourceforge.net> to
# match leading whitespace, but still work with old versions
# of file(1) which don't recognise the /cb options
#
0 @ string @ @ \<HEAD @ @ @ text/html
0 @ string @ @ \<head @ @ @ text/html
0 @ string @ @ \<TITLE @ @ @ text/html
0 @ string @ @ \<title @ @ @ text/html
0 @ string @ @ \<HTML @ @ @ text/html
0 @ string @ @ \<html @ @ @ text/html
0 @ string @ @ \<!-- @ @ @ text/html
0 @ string @ @ \<H1 @ @ @ text/html
0 @ string @ @ \<h1 @ @ @ text/html
0 @ string @ @ \<!DOCTYPE\ HTML @ text/html
0 @ string @ @ \<!doctype\ HTML @ text/html
0 @ string @ @ \<!doctype\ html @ text/html
0 @ string @ @ \<!DOCTYPE\ NETSCAPE-Bookmark @ text/html
0 @ string/cb @ \ <head @ @ @ text/html
0 @ string/cb @ \ <html @ @ @ text/html
0 @ string/cb @ \ <title @ @ text/html
0 @ string/cb @ \ <!doctype\ html @ text/html
0 @ string @ @ \<!\ @ @ @ text/html


OpenSuse Leap 42.1 x64, Plasma 5.x

User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
From what I understand, the freedesktop.org mime association system matches magic strings using a number of different rules. It appears that the HTML rule covers the whole file - which is why you are observing this behaviour.

To confirm this is the case, please run "file --mime-type <filename>" against one of the *.tpl files in question.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
SysGhost
Registered Member
Posts
12
Karma
0
bcooksley wrote:From what I understand, the freedesktop.org mime association system matches magic strings using a number of different rules. It appears that the HTML rule covers the whole file - which is why you are observing this behaviour.

To confirm this is the case, please run "file --mime-type <filename>" against one of the *.tpl files in question.


These are the files:
"index.tpl"
Code: Select all
{* Smarty *}

<!DOCTYPE html>
<html lang="{$lang}">
    <head>
        <meta charset="utf-8">
        <title>{$title}</title>

* snip * (Don't need to post my whole website for this.)


"test.tpl"
Code: Select all
{* Smarty *}
<div class="no-thickness">
    Testing, 1 2 ... 3.... testing ....
</div>


Here's the results:
Code: Select all
$ file --mime-type index.tpl
index.tpl: text/html

$ file --mime-type test.tpl
test.tpl: text/plain


It is confirmed. If the file contains the earlier mentioned html-tags it will forcibly be treated as a html-document, despite that I have other/manual associations set up.

I wish that this detection only worked on files that has no other associations. Or at least a possibility to change association precedence. Perhaps even a possibility to turn all those content-detections off.
SysGhost
Registered Member
Posts
12
Karma
0
google01103 wrote:text and therefore html files don't use magic bytes (numbers) so it's a different process doing the association

maybe this will help (maybe not) from /etc/htdig/HtFileType-magic.mime
note: I replaced tabs with "@"s for readability
# Magic data for for file(1) command
#
# The format is 4-5 columns:
# Column #1: byte number to begin checking from, ">" indicates continuation
# Column #2: type of data to match
# Column #3: contents of data to match
# Column #4: MIME type of result
# Column #5: MIME encoding of result (optional)
#
# Modified by <mailto:lha@users.sourceforge.net> for compatibility with
# different versions of file(1):
# - Columns are separated by TABs (for traditional versions)
# - spaces and '<'s within a column are escaped by '\' (for new versions)
# - Hex numbers in strings are given as '\0x' (traditional) and '\x' (new)
# - Null characters (\000) traditionally terminate strings, but now don't
<snip>
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# html: file(1) magic for HTML (HyperText Markup Language) docs
#
# from Daniel Quinlan <quinlan@yggdrasil.com>
# modified by Lachlan Andrew <lha@users.sourceforge.net> to
# match leading whitespace, but still work with old versions
# of file(1) which don't recognise the /cb options
#
0 @ string @ @ \<HEAD @ @ @ text/html
0 @ string @ @ \<head @ @ @ text/html
0 @ string @ @ \<TITLE @ @ @ text/html
0 @ string @ @ \<title @ @ @ text/html
0 @ string @ @ \<HTML @ @ @ text/html
0 @ string @ @ \<html @ @ @ text/html
0 @ string @ @ \<!-- @ @ @ text/html
0 @ string @ @ \<H1 @ @ @ text/html
0 @ string @ @ \<h1 @ @ @ text/html
0 @ string @ @ \<!DOCTYPE\ HTML @ text/html
0 @ string @ @ \<!doctype\ HTML @ text/html
0 @ string @ @ \<!doctype\ html @ text/html
0 @ string @ @ \<!DOCTYPE\ NETSCAPE-Bookmark @ text/html
0 @ string/cb @ \ <head @ @ @ text/html
0 @ string/cb @ \ <html @ @ @ text/html
0 @ string/cb @ \ <title @ @ text/html
0 @ string/cb @ \ <!doctype\ html @ text/html
0 @ string @ @ \<!\ @ @ @ text/html


htdig ?
I don't have that package installed (Arch Linux: https://www.archlinux.org/packages/extr ... dig/files/)

This is my ~/.local/share/applications/mimeapps.list :
Code: Select all
[Added Associations]
application/x-php=geany.desktop;
text/css=geany.desktop;
text/html=google-chrome.desktop;geany.desktop;
text/plain=geany.desktop;
text/tpl=geany.desktop;
video/x-matroska=smplayer.desktop;
video/x-msvideo=smplayer.desktop;

[Default Applications]
text/html=google-chrome.desktop


I also have a separate association set up for *.tpl : ~/.local/share/mime/packages/text-tpl.xml :
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<mime-info xmlns="http://www.freedesktop.org/standards/shared-mime-info">
    <mime-type type="text/tpl">
        <comment>HTML Templates</comment>
        <icon name="text-plain"/>
        <glob-deleteall/>
        <glob pattern="*.TPL"/>
        <glob pattern="*.tpl"/>
    </mime-type>
</mime-info>


What other mimelists could override my own local settings?

(Some more information on Arch Linux "Default Applications": https://wiki.archlinux.org/index.php/De ... plications )
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
The magic files are checked first from what I am aware - as magic recognition is seen as more reliable than file extensions.
The following sequences will trigger recognition as a HTML file (at least on my system), subject to certain restrictions such as being at the start of a new line:
Code: Select all
<!DOCTYPE HTML
<!doctype html
<HEAD
<head
<TITLE
<title
<HTML
<html
<SCRIPT
<script
<BODY
<body
<!--
<h1
<H1


The only special detail I could see with *.odt files was the presence of the following directive:
Code: Select all
<sub-class-of type="application/zip"/>

If you edit your text/tpl association, to make it a subclass of text/html, then it will probably override the text/html association.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]


Bookmarks



Who is online

Registered users: abc72656, Bing [Bot], daret, Google [Bot], lockheed, Sogou [Bot], Yahoo [Bot]