Day changed to 2022-06-05
[14:29] caai: Good morning! I have complete lesson 9, assigment 3. Please note the following: http://welshcomputing.com/paste/kj6w9adp4k
[14:33] caai: Please let me know if the directory names are advisable or you recommend another naming scheme.
[14:39] caai: I placed the script under /home/user/scripts. I named the file bcbakscript.txt and made it executable (chmod +x). I did not place it in any PATH because I didn't consider that necessary.
Day changed to 2022-06-06
[13:42] caai: please note lesson 9, assignment 4: http://welshcomputing.com/paste/r4fnv6hazd
[14:47] jfw: caai: got 'em and will have a look in a bit.
Day changed to 2022-06-07
[22:57] jfw: caai, dorion: zoom/windows has decided I no longer have a microphone (notwithstanding it's located on the webcam which otherwise is working fine). going to try some futzing.
[22:59] dorion: aok, standing by.
Day changed to 2022-06-10
[23:49] jfw: caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4273 - a reasonable start. (My own bitcoind backup script at this point is a good 80 lines long because it's trying to be as efficient & robust as possible under the circumstances - not that such lengths were necessary here.)
[23:49] sourcerer: 2022-06-05 14:29:33 (#jwrd) caai: Good morning! I have complete lesson 9, assigment 3. Please note the following: http://welshcomputing.com/paste/kj6w9adp4k
[23:51] jfw: note that 'exec' has a particular meaning which usually isn't want you want; in this case it's harmless but kind of sticks out. it's used for instance in daemontools run scripts at the final point where the target program is started, because the invoking shell is no longer necessary and would get in the way of signal handling.
[23:53] jfw: as to advisable directory names, that's up to you but generally for a local backup script the point is to be copying things to a physically separate drive at least, so you'll need your script to agree with the path at which you mount that drive.
[23:56] jfw: and if that's a temporary external drive, I'd use something under /mnt rather than my home dir, for instance because the mount/umount at least need to be done as root so you don't need the extra levels of path overhead.
[23:58] jfw: the final point would be that you'll need to take the node down manually before running your script, otherwise there's a risk that you get an inconsistent snapshot of the database (when it's modified at the same time that you're copying it).
Day changed to 2022-06-11
[00:00] jfw: there *are* sometimes better ways to deal with this but nobody's worked it out yet for poor bitcoind, to my knowledge.
[00:00] jfw: of course you could automate the taking down & putting back up by extending the script.
[00:02] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4275 - works; one convention is to use a ".sh" extension for shell scripts.
[00:02] sourcerer: 2022-06-05 14:39:03 (#jwrd) caai: I placed the script under /home/user/scripts. I named the file bcbakscript.txt and made it executable (chmod +x). I did not place it in any PATH because I didn't consider that necessary.
[00:04] jfw: Putting a script in the search path is like a second level once it's become important or frequently used enough; it's basically defining a new word (command) in your CLI environment.
[00:09] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4276 - this one's not so viable.
[00:09] sourcerer: 2022-06-06 13:42:09 (#jwrd) caai: please note lesson 9, assignment 4: http://welshcomputing.com/paste/r4fnv6hazd
[00:12] jfw: ! and () are shell metacharacters in some contexts but in any case won't do what you want there. for a full system backup, cp -r will make a mess of the result by not preserving metadata (timestamps, permissions, owner/group, possibly other aspects like hard links)
[00:18] jfw: a simple approach here would be to use tar instead of cp, with its -X option to specify a file with the list of exclusions. finally, the file or directory you're writing the backup into will likely need to be excluded too; perhaps that's a stronger argument for using /mnt really.
[00:23] jfw: of course the way to figure most of this out or at least make it stick will be to try it, look closely at the result, and finally test restoring things from the backup.
[00:24] jfw: you might also add -v (for either cp or tar) especially while testing so you can see more easily what it's spending time on.
Day changed to 2022-06-12
[00:31] rodbl: Hi, I'm back again!
[00:38] jfw: Welcome back rodbl / rodolfo.
[00:39] rodbl: Thanks man!
[00:39] rodbl: Moving forward with what a left behind a while back
[00:41] jfw: Glad to hear it. I dunno if dorion filled you in but one idea we had was to try to get your existing code running on another VM with more conventional Linux so it's not quite as big of a leap to take all at once.
[00:42] rodbl: Correct
[00:43] rodbl: A couple of hours ago, I sent him some high-level explanation regarding the development and basic library requirements in order to deploy everything as soon as the "more conventional VM" is available
[00:44] rodbl: This will speed up the delivering of the MVP
[00:46] jfw: sounds good, that should help narrow down what OS to use.
[00:46] rodbl: Yes sir
[00:50] jfw: rodbl: it'll be simplest for me to just wipe the existing VM and repurpose its resources, so give a look as to whether there's any data or work you need saved from that environment.
[00:55] rodbl: Well, there are some py files (modules) and csv files (data) but it's 0K if you remove them
[01:02] jfw: ok; I can just copy it over to the new machine too if you point me to the path.
[01:04] jfw: do you have my email address to send that document to, or should I have dorion copy it to me?
[12:43] rodbl: Hi @jfw
[12:44] rodbl: Sorry, I did't see your message earlier
[12:44] rodbl: What's your email address?
[12:44] rodbl: I'll be great to have it
[12:45] rodbl: Hehehe I was telling RD the other day about having a call with you, just to talk randomness and finally "meet you"
[16:39] dorion: rodbl, welcome back. jfw, I'll forward you the email.
[20:17] rodbl: I'm setting up a local VM with Ubuntu, so I can start implementing the "Linux-version" of the MVP. Sounds good?
[20:18] rodbl: This is until the new remote VM is up and running
[20:39] dorion: rodbl, hold up on ubuntu because I don't think we'll be using that on the VM. I think a higher priority task is working on the db schema so we can move from csv files to mysql.
[20:43] rodbl: Roger that. Let me know if there is anything else I can do in order to move forward
[20:46] rodbl: About the importing data to mysql, "as per my last email", the best way I could come up to store scraped data was to save each classified ad as a json dict
[20:48] rodbl: This is because in order to concatenate every scraped ad, a dataframe will ensure that each value will land on a specific column due to the associated key.
[20:48] rodbl: *concatenate them together as a whole
[21:03] jfw: rodbl: what does the structure of that json end up looking like? I'd imagine after a certain number of ads, a fixed set of fields begins to emerge (though each field might not be populated in each listing), and they're pretty much flat i.e. just a key-value list where the values are just strings or numbers or the like.
[21:06] jfw: as a first step, that would translate directly to a single database table. then you'd look at redundancies like perhaps the neighborhood, which could be factored out into a separate table with one-to-many relationship (so a property record just has a neighborhood ID, then you join that to the neighborhoods table to get its full name, and perhaps coordinates or whatever else we want to add later).
[21:06] rodbl: I can send you a couple of examples. The main reason of saving ads as json dicts is because even though the HTML structure is standard for each ad (for example, the table where all details are shown) doesn't necessarily means that each ads has a specific piece of information (for example, the HTML table has <td> for location but the ad might not have a location value)
[21:07] jfw: json is certainly better structured than just stashing the HTML as a string, if that's what you're comparing.
[21:12] rodbl: Correct. I could come up with a new approach for the scraping instead of pointing to XPATH and, if necessary, I could also come up with a method that could ingest the scraped data directly to the db
[21:12] rodbl: Which I think is could be the best practice
[21:12] rodbl: *it could
[21:13] rodbl: I sent you guys examples of these JSON files
[21:18] jfw: rodbl: what's going on in 21087301.json with >> }miento\r", << on line 34 ?
[21:20] rodbl: Malformed, maybe it's supposed to be "Estacionamiento"
[21:20] jfw: but it's your code putting out the json, isn't it?
[21:21] jfw: seeing similar in 21090422.json
[21:21] rodbl: Yeah
[21:21] rodbl: I did have some issues during that run, it was with my old computer.
[21:22] rodbl: I can send you verified JSON files
[21:22] jfw: anyway, besides that it looks to be as I suspected, except that Detalles is broken into an array of separate lines which ought to be joined into one string since it's just free-form text. we'll probably want to map the field names to English.
[21:22] rodbl: So you can perform an adequate test on your side
[21:23] rodbl: Yeah, the keys from origin are hard to handle. New labels are required for sure
[21:23] rodbl: "Detalles" is broken in lines, but it can be treated as a string for ease of use
[21:23] jfw: no need to send me updated JSON unless there's something specific you want to show; I was just pointing it out.
[21:25] rodbl: Yes, that batch (from where those example are coming from) doesn't look very good. I'm suspecting some issues related to my previous computer.
[21:25] jfw: heh, seems the exercise has been productive already.
[21:38] rodbl: Hahaha well QA is always necessary
[21:44] rodbl: In the mean time, I'm going to be restructuring the scraper in order to assert quality of the data. I had a pending experiment to do: instead of "crawling" over each HTML (ad), I could determine if this web sites uses internal API calls.. if they do, then a more efficient method can be achieved..
[21:44] rodbl: *quality of the data and efficiency in the process
[23:21] jfw: rodbl: if this is encuentra24, last I checked it doesn't do anything so civilized; it makes XHR calls just to fetch html strings and splice them into the page. what you can do is exploit that to bypass whatever browser mechanisms and script the download of the full data set. not sure how you were doing it before; looks like someone finally sent me the docs so I can have a look in a bit.
[23:22] jfw: rodbl, dorion: did either of you get my reply to the thread in your gmail accounts?
Day changed to 2022-06-13
[00:35] dorion: jfw, not here and not even in spam, god damn.
[02:04] jfw: http://fixpoint.welshcomputing.com/2022/fixed-width-bit-fiddling-tuneups-for-gbw-signer/
[02:07] jfw: dorion: hm, I see it was actually so polite as to give me an explicit rejection this time: "The IP you're using to send mail is not authorized to send email directly to our servers. Please use the SMTP relay at your service provider instead. Learn more at https://support.google.com/mail/?p=NotAuthorizedError"
[02:10] jfw: "The determination of whether or not an IP address is authorized to send mail is made by the ISP that provides you with the IP address" - fucking typical, no hint as to *how* such authorization is determined, it's certainly not a normal ISP function. most likely it's total nonsense.
[02:20] jfw: http://welshcomputing.com/paste/2juhnqyu5p << my poor censored message, which even knew its own fate in advance.
[04:54] jfw: rodbl: do me (and yourself) a favor, would you, and keep textual documents in a text-based format when sending to me, even if that just means "save web page as html", rather than printing to pdf (see how it turns a maybe 200 word doc into a 1.6 MB, unsearchable blob demanding a GUI environment)
[04:55] jfw: for the sample report I understand it, it's supposed to be shiny and appealing to banker types, fine. talking about our internal documentation though.
[04:57] jfw: "selenium, chromedriver, chrome_browser" - aha, that's the mess I had in mind with bypassing
[04:57] sourcerer: 2022-06-12 23:21:55 (#jwrd) jfw: rodbl: if this is encuentra24, last I checked it doesn't do anything so civilized; it makes XHR calls just to fetch html strings and splice them into the page. what you can do is exploit that to bypass whatever browser mechanisms and script the download of the full data set. not sure how you were doing it before; looks like someone finally sent me the docs so I can have a look in a bit.
[04:58] jfw: yet another tradeoff of up-front labor vs. system complexity I suppose.
[05:16] jfw: rodbl: where does the CSV data fit in your current process? I'm not seeing it mentioned in the docs.
[15:49] jfw: rodbl: also, are you familiar with git yet? (the original CLI program, not the "Hub".) we're thinking to set you up with a shared repository on our server to help with the code, docs and communications.
[15:52] jfw: for instance, I could get you my old scraper code, probably not usable directly for your thing but might be good for some ideas.
[21:55] jfw: "Note, the first time you ever run the render() method, it will download Chromium into your home directory" << I see the "convenience at all costs" agenda is progressing
[21:56] jfw: "Only Python 3.6 is supported." << but they're behind the times!11 why not "only python 3.10.5 is supported"?!
[21:57] jfw: "Requests officially supports Python 3.7+." << incompatible with the same dude's other stuff, lolz
[22:43] jfw: https://peps.python.org/pep-0644/ << looks like efforts to murder libressl have been stepping up. "sure openssl stabbed us in the kidneys but so what, it still loves us so all is forgiven!"
[22:56] jfw: and I gather the new python doesn't specify its build dependencies at all
[23:08] jfw: rodbl, here's my attempt to track down your stated dependencies at least to the first level, with some ??s for names that didn't resolve sensibly on PyPI that you might need to point me to more specifically.
[23:09] jfw: generally I'm guessing maybe 3.9 would be the way to go if we need to build a Python.
[23:11] jfw: let me know if the list looks sane.
Day changed to 2022-06-14
[00:20] rodolfo: Hi!
[00:20] rodolfo: I think I read everything
[00:21] dorion: in other noose, someone I'm curious to talk to said they only use signal and I should install that... so I go and do it, using anbox android emulator on a public toilet box and shortly after launching it I see a new, peculiarly named process running in my top output. turns out, under da hood, signal calls itself "crime.securems" ... myeah, totally NOT a honeypot, lolz.
[00:21] dorion: heya, rodolfo, back with a new name !
[00:22] rodolfo: Yes, there is a possibility that Encuentra24 is using XHR, so "injecting" a query might be a possibility. Further observation is required in order to understand the async calls
[00:25] rodolfo: What is anbox?
[00:26] rodolfo: Well, regarding the "documentation" I kindly share with you guys, I'll be more thoughtful about your disk resources. A plain file txt could be enough next time
[00:27] dorion: rodolfo, it's an android emulator.. let's you run android on a desk/laptop so, e.g. you can you a real keyboard.
[00:28] dorion: rodolfo, while jfw mentioned the file size, it's more about flat files vs binary than disk space.
[00:28] rodolfo: I was going to export a requirements.txt with each dependency's version, I still can if you needed.
[00:29] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4356 -- unsearchable blob + gui environment are the bigger keys.
[00:29] sourcerer: 2022-06-13 04:54:53 (#jwrd) jfw: rodbl: do me (and yourself) a favor, would you, and keep textual documents in a text-based format when sending to me, even if that just means "save web page as html", rather than printing to pdf (see how it turns a maybe 200 word doc into a 1.6 MB, unsearchable blob demanding a GUI environment)
[00:29] rodolfo: Regarding Git CLI, I do have experience with it.
[00:30] dorion: see how that echos rodolfo ? you can paste the link of the line you're replying to rather than preface with, "regarding xyz.."
[00:30] dorion: the link for each line is the timestamp shown in the logger.
[00:31] rodolfo: Hahaha yeah, it's pretty handy actually
[00:32] dorion: yeah, and even more so as the channel gets more active.
[00:33] rodolfo: BTW the "selenium" requirements are not fundamentally required since most of the work is done with requests.
[00:34] rodolfo: Did you have the time to check out the video?
[00:34] dorion: that's good news, should really try to minimize complexity because this is going to be a cat and mouse game with the data providers.
[00:34] dorion: video of what now ?
[00:35] rodolfo: I didn't have the to show you the functionality
[00:35] dorion: is this new functionality from what you showed me a few months back ?
[00:37] rodolfo: Not the heavy-JS version. The plain HTML one, that can support multiple calcs for a given submit
[00:38] rodolfo: I have an improved version of the "bulk calculation", where the user gets more detailed scenarios and comparisons
[00:38] dorion: sure then, go ahead and link it.
[00:39] rodolfo: https://drive.google.com/file/d/1n9uAghSOwucCLgMZ4H8XzrkzqRth-iSU/view?usp=sharing
[00:40] rodolfo: Feel free to skip every other minute, I had to make a detailed explanation about this back then
[00:43] rodolfo: Most importantly, I would like you to check out the analysis example I attached. This is what I was telling you about the other day that it is parameterizable in order to deliver a professional document for the end user.
[00:46] rodolfo: Analysis:
[00:46] rodolfo:
[00:46] rodolfo: https://drive.google.com/file/d/13oq7y3z3wctwCzEEBo7UYQ0OIOHC60A4/view?usp=drivesdk
[01:18] dorion: rodolfo, the analysis is a solid start.
[01:26] rodolfo: Great!
[02:41] jfw: dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4374 - heh, was this 'top' from a shell inside the emulator or what?
[02:41] sourcerer: 2022-06-14 00:21:26 (#jwrd) dorion: in other noose, someone I'm curious to talk to said they only use signal and I should install that... so I go and do it, using anbox android emulator on a public toilet box and shortly after launching it I see a new, peculiarly named process running in my top output. turns out, under da hood, signal calls itself "crime.securems" ... myeah, totally NOT a honeypot, lolz.
[03:00] dorion: jfw, nah I don't recall all the details, but this anbox via snap thing isn't a pure vm... can see from the host some of the emulator processes.
[03:01] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4378 - apparently I'm not always such a nice guy, despite what first impressions may suggest. rodolfo: think of us like a peculiar order of monks, perhaps; you can learn quite a lot by working with us, if you make the most of it; but you will have to pick up some of our customs, funny hats and robes in order to get along. I'll remind
[03:01] sourcerer: 2022-06-14 00:26:41 (#jwrd) rodolfo: Well, regarding the "documentation" I kindly share with you guys, I'll be more thoughtful about your disk resources. A plain file txt could be enough next time
[03:01] jfw: myself that you're still pretty new to this stuff and perhpas I didn't need to hit you with it quite so early; but this was about the politest way I could come up with to express the situation at the time. There's a lot more substance behind that simple and seemingly annoying request than might first meet the eye.
[03:01] sourcerer: 2022-06-13 04:54:53 (#jwrd) jfw: rodbl: do me (and yourself) a favor, would you, and keep textual documents in a text-based format when sending to me, even if that just means "save web page as html", rather than printing to pdf (see how it turns a maybe 200 word doc into a 1.6 MB, unsearchable blob demanding a GUI environment)
[03:01] jfw: dorion: huh, weird.
[03:05] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4381 - certainly couldn't hurt. and I didn't mean to step on your toes there, I was just looking up the things you mentioned to get some notion of what they are & what they require, then realized that since I was looking I might as well be recording it too.
[03:05] sourcerer: 2022-06-14 00:28:41 (#jwrd) rodolfo: I was going to export a requirements.txt with each dependency's version, I still can if you needed.
[03:08] rodolfo: Don't worry, I try to be as emotional numbed as possible in terms of not attaching a specific "tone of voice" to a text message.
[03:12] rodolfo: I'll try to absorb as much wisdom as possible.
[03:13] jfw: especially given the language gap that's probably sensible about not imputing "tone of voice"; not sure about the "numbing" part as such though.
[03:30] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4391 - funnily enough, keeping up with site layout changes or anti-scraping antics is sounding like the easier part compared to keeping up with python ecosystem changes and getting code to run at all, at the moment
[03:30] sourcerer: 2022-06-14 00:34:49 (#jwrd) dorion: that's good news, should really try to minimize complexity because this is going to be a cat and mouse game with the data providers.
[05:15] rodolfo: I do enjoy exercising some literature resources, as euphemisms and sarcasm. I'll try to make my communication as HTML-ish as possible: plain and simple.
[13:30] rodolfo: Good morning friends
[13:31] rodolfo: Trying to stay on the loop
[13:47] rodolfo: I forgot to mention last night about an idea I shared with <dorion> the other day, about carrying out activities based on sprints in a way we all commit to specific action items for specific dates.
[13:56] dorion: good morning rodolfo. see jfw's questions/comments about python packages and version.
[13:56] sourcerer: 2022-06-13 23:08:15 (#jwrd) jfw: rodbl, here's my attempt to track down your stated dependencies at least to the first level, with some ??s for names that didn't resolve sensibly on PyPI that you might need to point me to more specifically.
[13:56] sourcerer: 2022-06-13 23:09:02 (#jwrd) jfw: generally I'm guessing maybe 3.9 would be the way to go if we need to build a Python.
[14:39] dorion: rodolfo, clarification re the above will help jfw in picking the os for the new vm. leaning towards cent os 6 atm.
[15:19] rodolfo: Python 3.9 would be fine
[15:19] rodolfo:
[15:19] rodolfo: I can provide the requirements.txt, if needed.
[15:20] dorion: sure, go ahead.
[15:51] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4424 - at minimum, communicating more about our plans or upcoming work (prospective reporting) sounds like an excellent idea. deadlines I'm not so sure, they can be costly (if taken seriously, and if not then what's the point) and I gather we all have pre-existing commitments that would come before this
[15:51] sourcerer: 2022-06-14 13:47:43 (#jwrd) rodolfo: I forgot to mention last night about an idea I shared with <dorion> the other day, about carrying out activities based on sprints in a way we all commit to specific action items for specific dates.
[15:55] rodolfo: The purpose is to focus efforts, either online or offline, in order to make the best use of our time.
[15:56] rodolfo: Deadlines can be flexible, but some kind of progress needs to be achieved within a time window.
[15:58] rodolfo: This is a good practice, even more if there are third-parties potentially interested in adding capital to this initiative.
[15:59] jfw: you mean something like, sharing projected dates of completion for particular tasks?
[16:06] jfw: example: currently I'm waiting on you to provide your requirements.txt and/or feedback on my handmade list, so that I can hammer down the lower-level software versions; once this is decided I can probably get the VM rebuilt by the end of the week (possibly with some software still to be worked out).
[16:14] jfw: my thinking on python 3.9 is that 3.8 emerged as the minimum demanded by the current versions of the first-level dependencies; going a little newer might help it stay workable for more things for longer; but in 3.10 they got more aggressive about breaking older SSLs and I'd rather not have to hand-build that too.
[16:14] sourcerer: 2022-06-13 23:09:02 (#jwrd) jfw: generally I'm guessing maybe 3.9 would be the way to go if we need to build a Python.
[16:15] jfw: python 3.6 is the latest I'm seeing for centos 6 even via the "software collections"
[16:22] rodolfo: Yes, I mean like stepping on the gas. I know that is not very monk-alike, but sometimes stress is needed.
[16:22] rodolfo:
[16:22] rodolfo: Let's get specifics when it comes to requesting info, I'm not (neither are you) interested in the granularity of your request. If you need the requirements.txt, you will get by the end of the day.
[16:24] rodolfo: I hope you are also "emotional numbed" (no language barriers at all, pretty straight forward), but let's get straight to the points. Frankly, I don't have time to read about tech philosophy, I like both of you guys so I need you to understand that in this context, my mindset is completely focused on doing something we can all get a monetary benefit.
[16:26] rodolfo: We can leave the didactic aspects for future references. No hard feelings.
[16:26] jfw: rodolfo: as far as stepping on the gas, had you been waiting for something from us? I thought it was you that stepped out for a month or something.
[16:26] rodolfo: You already did: you need the requirements.txt
[16:26] rodolfo: The ball is on my side
[16:27] jfw: before that, I meant.
[16:28] rodolfo: Nahh, a month ago we were wasting time since it made no sense my intervention in setting up a Py environment and modules from scratch when all of this can be easily achieved.
[16:28] dorion: rodolfo there are 2 sides of the monetary aspect. 1 is top line wrt getting something to market that starts to cashflow. 2 is ongoing costs of maintenance and future development.
[16:29] rodolfo: Besides the info I already sent via email (including video, examples, etc.) plus the requirements.txt that I'm going to send today, what else do you need?
[16:29] dorion: so what we're aiming to do is strike a balance so the thing can be sustainably scalable.
[16:30] jfw: also things like file formats are not idle "philosophical" points but are with an object toward working and collaborating most effectively.
[16:32] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4452 - that's your conclusion, I don't necessarily share it but it was not clear to me that you thought this at the time and stepped out because of it, if that's what you're saying.
[16:32] sourcerer: 2022-06-14 16:28:24 (#jwrd) rodolfo: Nahh, a month ago we were wasting time since it made no sense my intervention in setting up a Py environment and modules from scratch when all of this can be easily achieved.
[16:33] jfw: and it's fine, no hard feelings; I'm simply pointing out ways within your own control to achieve the speedup you're looking for.
[16:34] rodolfo: Correct. From a product perspective, let's sat that an improved version can be achieved almost anywhere. The infrastructure-related costs (aka running an app from your server) most likely is going to demand testing of resource consumption. However, we can walk and chew gome at the same time. Meaning that, while the CentOS env is being setup, I can get my hands on the "improved version" since, in tech slang, the development is agnostic and I can do it in a
[16:34] rodolfo: Win env.
[16:35] rodolfo: *chew gum
[16:36] jfw: rodolfo: do you mean you're going to set up mysql in windows now? to me that sounds rather like a waste of time. in general, a total change of environment tends to be a major source of "unexpected" costs when going into production
[16:37] jfw: if you can make progress with your code for now in your existing environment before ours is ready, that sounds fine though.
[16:40] jfw: rodolfo: what was your "Correct" in regards to? on the reread I can't quite tell.
[16:47] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4454 - to be explicit, nothing more needed for now; in general, maybe a bit more re-reading of the log so you don't miss things like that ( http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4414 - http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4431 )
[16:47] sourcerer: 2022-06-14 16:29:33 (#jwrd) rodolfo: Besides the info I already sent via email (including video, examples, etc.) plus the requirements.txt that I'm going to send today, what else do you need?
[16:47] sourcerer: 2022-06-14 03:05:46 (#jwrd) jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4381 - certainly couldn't hurt. and I didn't mean to step on your toes there, I was just looking up the things you mentioned to get some notion of what they are & what they require, then realized that since I was looking I might as well be recording it too.
[16:47] sourcerer: 2022-06-14 15:19:20 (#jwrd) rodolfo: I can provide the requirements.txt, if needed.
[17:03] rodolfo: Not at all, I need parameters in order to focus and move forward.
[17:05] rodolfo: My "lifestyle" is based on caffeine and stress.
[17:06] dorion: eh, I've seen you laugh here and there ;P
[17:06] rodolfo: Hahaahaa
[17:07] rodolfo: Hey man, we need to keep it corporate gangsta
[17:07] rodolfo: No room for feelings
[18:20] dorion: there's room for feelings, as long as you're not an idiot.
[18:46] rodolfo: "Smart feelings"
[23:22] rodolfo: requirements.txt sent
[23:22] rodolfo:
[23:22] rodolfo: Ball is on the monk's side =)
[23:28] jfw: whew, looks like I'm going to need a bigger hammer to wrangle that list of 347 requirements.
[23:31] jfw: rodolfo, do you know if all those come in as downstream dependencies of the data science / machine learning ones you listed earlier, or what?
[23:32] jfw: thanks for the file, good to have the reference for sure.
[23:35] rodolfo: It depends, pip can panick if you try to update or install a package if a dependency is absent or obsolete. In order to avoid that, I usually do "pip install pack_name[all]"
[23:35] rodolfo:
[23:35] rodolfo: The "[all]" arg will take care of every dependency, including does that you might not need right away.
[23:36] rodolfo: So, you could get sklearn without dependencies if you want to
[23:50] jfw: what I'm after is, at the high level, what's causing all those to be brought in & why.
[23:52] jfw: if we actually need to be *this* promiscuous then it might warrant getting a dedicated server for it and nevermind "virtual machines".
Day changed to 2022-06-15
[11:35] caai: jfw: thank you for the feedback. i have created a directory named .sh and moved my scripts into it
[11:37] caai: please note the script for assignment 3 with the corrections: http://welshcomputing.com/paste/kpwd4xhas8
[11:40] caai: in regards to taking down the node before running the script, or extending the script to include such commands. what commands would those be since it is a daemontools service? svc -d /service/bitcoind to take it down, next run the script, then upon completion, svc -u /service/bitcoind to bring it back up?
[11:51] caai: do you suggest including the wallet in this backup? i have it backed up elsewhere externally
[12:27] rodolfo: I understand.. what you are saying is that we should aim for the dependencies that for sure we will need, and forget about those modules/libraries that might require "extra care" but don't really add value at the moment. Right?
[12:44] dorion: right, if we don't need them, they're liabilities.
[13:29] rodolfo: Understood. Is it necessary for me to rectify the requirements.txt?
[13:41] dorion: rodolfo, I think the more practical path forward is setting up a dedicated server for this rather than vm. we can chip away at the dependency list over time. focus first on the db schema and moving to mysql. btw, did you see jfw's q about where the csv files fit ?
[13:41] sourcerer: 2022-06-13 05:16:11 (#jwrd) jfw: rodbl: where does the CSV data fit in your current process? I'm not seeing it mentioned in the docs.
[13:52] rodolfo: Perfect
[13:59] rodolfo: Regarding Jacob's question: inside the ETL&Model PDF, you will see that in the last paragraph I make a reference to "panel data (previously parsed from JSON)".
[13:59] rodolfo:
[13:59] rodolfo: The intention was to point out that those JSON dicts (that resulted from the scraping) are then transformed into a more readable format. The output CSV is a byproduct of this "transformation". Makes sense?
[13:59] rodolfo:
[13:59] rodolfo: Scraper
[13:59] rodolfo: | JSON
[13:59] rodolfo: | Panel Data -> CSV
[13:59] rodolfo: | Input for the model
[15:48] jfw: caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4490 - I meant .sh as a file extension for scripts themselves instead of .txt, not as a directory.
[15:48] sourcerer: 2022-06-15 11:35:34 (#jwrd) caai: jfw: thank you for the feedback. i have created a directory named .sh and moved my scripts into it
[15:49] jfw: caai: updated script looks fine.
[15:50] jfw: to capture timestamps too, which can be nice for historical reference at least, you could add -p to the 'cp' flags.
[15:53] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4492 - that was the idea, yes; though parallel to the svc -t (TERM signal) vs svc -k (KILL signal), I'd use svc -kd rather than svc -d. The latter only requests it to shutdown, which may (and indeed will) take some time, during which your script would proceed to copying, defeating the purpose. with -k it's effectively instant.
[15:53] sourcerer: 2022-06-15 11:40:31 (#jwrd) caai: in regards to taking down the node before running the script, or extending the script to include such commands. what commands would those be since it is a daemontools service? svc -d /service/bitcoind to take it down, next run the script, then upon completion, svc -u /service/bitcoind to bring it back up?
[15:54] jfw: the "fully graceful" way to do it I suppose would be to poll 'svstat' until it shows actually down; but I wouldn't bother.
[15:55] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4493 - indeed treating the wallet separately makes sense to me.
[15:55] sourcerer: 2022-06-15 11:51:20 (#jwrd) caai: do you suggest including the wallet in this backup? i have it backed up elsewhere externally
[16:00] jfw: rodolfo: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4493 - I'd say that's the underlying idea, yes, but so far I'm just trying to understand what's there & why - the diagnosis rather than prescription stage, so to speak.
[16:00] sourcerer: 2022-06-15 11:51:20 (#jwrd) caai: do you suggest including the wallet in this backup? i have it backed up elsewhere externally
[16:00] jfw: ah, wrong line, I meant: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4494 - I'd say that's the underlying idea, yes, but so far I'm just trying to understand what's there & why - the diagnosis rather than prescription stage, so to speak.
[16:00] sourcerer: 2022-06-15 12:27:11 (#jwrd) rodolfo: I understand.. what you are saying is that we should aim for the dependencies that for sure we will need, and forget about those modules/libraries that might require "extra care" but don't really add value at the moment. Right?
[16:01] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4495 - worse, they're liabilities in any case whether they're also assets or not!
[16:01] sourcerer: 2022-06-15 12:44:42 (#jwrd) dorion: right, if we don't need them, they're liabilities.
[16:03] jfw: rodolfo: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4496 - I don't know; what process do you have in mind for changing it?
[16:03] sourcerer: 2022-06-15 13:29:26 (#jwrd) rodolfo: Understood. Is it necessary for me to rectify the requirements.txt?
[16:05] rodolfo: Creating a new environment and installing the basic libraries (aka no "pip install xyz[all]")
[16:06] jfw: rodolfo: what do you mean by "panel data" ? looking it up as a statistics term it seems to deal with data sampled over time which I gather yours is not.
[16:06] jfw looks up what that [all] really means
[16:12] jfw: as yet not finding anything on "all" as such, though there can be specifically-named "extras"
[16:13] rodolfo: Actually, it is. What I mean is that data is tabulated (from dict to dataframe) but also organized by chronological order (date of the ad).
[16:16] jfw: are you able to get historical data rather than just a snapshot of the current market from the currently active ads?
[16:16] rodolfo: For future initiatives, yes
[16:16] rodolfo: WebArchive sounds promising
[16:17] rodolfo: However, currently you can find that for a given location, there are older classifieds that are still running
[16:18] jfw: for a given property?
[16:18] jfw: or do you just mean that there's a range of how long the current ads have been listed?
[16:19] rodolfo: Perhaps for that case a closer monitoring is required, but there are good examples of "for a given apartment building"
[16:19] rodolfo: Correct, there are ads posted N amount of time ago that are still visible
[16:21] jfw: ok, but if it's still an active listing, the price (or other fields) may have been modified over its life; so while the age of the listing is certainly one data point to collect, it seems to me that the principal time value associated with all the data is the time it was sampled, which is the time of the scrape i.e. the same everywhere.
[16:22] jfw: if you get it running for a while collecting many snapshots over time, and you track all that, it could indeed become a time series.
[16:22] jfw: anyway, minor point for now.
[16:24] jfw: rodolfo: can you expand on this panic situation? it sounds like possibly a real problem but not quite fit solution
[16:24] sourcerer: 2022-06-14 23:35:20 (#jwrd) rodolfo: It depends, pip can panick if you try to update or install a package if a dependency is absent or obsolete. In order to avoid that, I usually do "pip install pack_name[all]"
[16:29] rodolfo: Right, but I'm not specifically pointing to build a time series with this particular exercise.
[16:29] rodolfo:
[16:29] rodolfo: The chronological comparison is relevant for feature engineering.
[16:33] rodolfo: Meaning that pip will return an error if you try to install libraries which dependencies are not present or updated.
[16:33] rodolfo:
[16:33] rodolfo: Then, a way to avoid any error related to dependencies is to execute "pip install XYZ[all]".
[16:33] rodolfo:
[16:33] rodolfo: This only works for certain libraries though
[16:35] rodolfo: Is not actually a great deal. Worst case scenario, if there is a dependency-related error, it will just be a matter of installing whatever is needed
[16:38] jfw: Are you quoting from something there (what)?
[16:41] jfw: otherwise it just reads to me as a restatement of the previous incomplete definition...
[16:42] rodolfo: https://usercontent.irccloud-cdn.com/file/WI4220FN/Screenshot_20220615_114225.jpg
[16:43] jfw: let me try this way: when/where do those "dependency-related errors" arise - when doing the initial 'pip install' or later (when running code that imports the module perhaps) ?
[16:44] jfw: that link is a screenshot of the current thread, not sure what you're trying to say by that, lol.
[16:55] rodolfo: It could be either at installation or coding time, because there might be modules within libraries that have specific dependencies.
[16:55] rodolfo:
[16:55] rodolfo: However, this is not a great problem because it can be addressed at any point if the environment that is going to be implemented allows it.
[16:55] rodolfo: I'm talking about regular use of Python, nothing fancy
[16:56] jfw: aren't libraries supposed to declare their dependencies upfront? i.e. if an import fails at runtime it would be a bug in the packaging, no?
[16:58] rodolfo: Again, if this freedom of functionalities represents an obstacle, I can throw parts of the code at a different instance and just call those functions from the new environment that you are setting up.
[16:59] jfw: rodolfo: I'm afraid I don't know what any of that means (freedom of functionalities? throw code at different instance? just call from new environment?)
[16:59] rodolfo: Not at import, but at execution time because the error will show up when a function from that module is called (assuming that the "special dependency" is invoked at that point).
[17:01] jfw: ah, is this like a lazy-loaded dependency then, something like an import statement found inside a function rather than at top-level ?
[17:02] rodolfo: My friend, neither you or I want to waste time. The cliché of "analysis paralysis" is getting to frequent. I'm sure you have had experiences when trying to make something work in Python and a dependency errors occurs.
[17:03] rodolfo: Let's just see what happens when trying to install whatever package in new setup, then the problem will be clear. For now, we are speculating
[17:07] jfw: Just see what happens without trying to figure out what anything actually is or means, that's your entire approach I suppose, it's not surprising then that you get randomly hit by errors you don't understand and then dig up some other band-aid to ...numb the feelings? but no, that's not the only possible way to work and I don't really have the same experience in my python-coding. but sure, let's
[17:07] jfw: try just dropping the [all] and see what comes out.
[17:10] jfw: I'm going afk, and can resume this tomorrow.
[17:24] rodolfo: I understand. Perhaps we are not in the same frequency, I'm sorry
[23:35] dorion: rodolfo, we started this knowing it was a prototype. when we met at krume, you said you were glad to try gales out and were happy that we could help you narrow down the decision making process.
[23:36] dorion: since then, we've agreed gales is a bridge too far at present and are trying to work with you to get the thing going.
[23:36] dorion: on the one hand, you say you want to absorb wisdom. that's encouraging and jacob has a lot to offer, with a decade and a half at least with python on top of other languages in addition to all the sys admin exprience.
[23:36] sourcerer: 2022-06-14 03:12:08 (#jwrd) rodolfo: I'll try to absorb as much wisdom as possible.
[23:37] dorion: we all have the same goal of monetizing this thing, but for it to be ours (the 3 of us), we have to understand it. as far as I can tell, that's what his questions are aimed at. however, for each line of questioning, you seem to be reframing as monk mode, paralysis by analysis, etc.
[23:37] dorion: we don't want paralysis by analysis, but help us help you.
[23:38] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4569 -- fine with me, as a starting point.
[23:38] sourcerer: 2022-06-15 17:07:30 (#jwrd) jfw: try just dropping the [all] and see what comes out.
[23:38] dorion: welcome back jwm.
Day changed to 2022-06-16
[00:11] jwm: thanks! finally turned on the machine to check on the node - shes still purring
[13:25] caai: jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4508 - hahaha alright. i have changed the file extension to .sh and renamed the directory to /home/user/scripts
[13:25] sourcerer: 2022-06-15 15:48:05 (#jwrd) jfw: caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4490 - I meant .sh as a file extension for scripts themselves instead of .txt, not as a directory.
[13:28] caai: i have added the -p to the 'cp' flags to capture the timestamps, too
[13:31] caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4512 - noted. i will use svc -kd rather than svc-d
[13:31] sourcerer: 2022-06-15 15:53:06 (#jwrd) jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4492 - that was the idea, yes; though parallel to the svc -t (TERM signal) vs svc -k (KILL signal), I'd use svc -kd rather than svc -d. The latter only requests it to shutdown, which may (and indeed will) take some time, during which your script would proceed to copying, defeating the purpose. with -k it's effectively instant.
[13:32] caai: in addition, i will treat the wallet separately
[16:17] jfw: caai: what's your next step in the technical exercises?
[18:33] caai: i will send you the corrections for lesson 9, exercise 4 and then start the homework from lesson 10. i know that i need to integrate regex more (one of the assignments from lesson 9), but i will set aside that task for now so that i progress faster
[18:37] jfw: I suppose regex is the sort of thing that comes in tremendously handy but in the relatively few situations that it's actually called for. so, possibly it's good enough to practice the basics a bit and then just know that it's there in the toolkit.
[18:40] jfw: lesson 10 being Gales Bitcoin Wallet operation and a preview of the Gales Linux bootstrap process - sounds good.
Day changed to 2022-06-17
[13:35] caai: jfw: yes! today i am searching for, reading and testing examples of 'tar -X'
[17:43] jfw: caai: I'm finding the details of how that works to be a bit of a mess unfortunately, and poorly documented even for the full-blown GNU tar.
[17:45] jfw: but in short, it's tar -X EXCLUDE_FILENAME, where EXCLUDE_FILENAME is the path to a plain-text file listing the exclusion patterns, one per line. An exclusion pattern can be a simple path or include shell-style wildcards like *.
[17:46] jfw: however, the pattern can apply anywhere within a file's path, for instance if you list 'dev' then any directory named 'dev' will be excluded, not just /dev.
[17:48] jfw: you might think you could give it in absolute form i.e. '/dev' to match it exactly - and this *would* work except that for (misguided) security reasons or something it strips any leading slashes from the filenames when building the archive.
[17:51] jfw: a workaround I found is to start instead from './' ; for instance, 'cd /; tar -cf /mnt/something.tar .' instead of 'cd /mnt; tar -cf something.tar /'
[17:51] jfw: then the exclusions can be 'anchored' to the start of the path as in './dev'.
[17:53] jfw: the whole 'exclude' mechanism looks to be a relatively newer extension to 'tar' functionality though I can't quite imagine why it wasn't there from the start.
[17:54] jfw: (or at least, why the need for it wasn't apparent from the start.)
[18:03] jfw: caai: does that make sufficient sense?
[18:07] jfw: the exercise wasn't quite intended to involve this much digging, but so goes the gap between tidy classroom exercises and making useful & reliable things in real life.
[22:00] jfw: http://fixpoint.welshcomputing.com/2022/fixpoint-security-advisory-freefilesync-misses-host-key-check-allowing-breach-of-data-confidentiality-or-authenticity/
Day changed to 2022-06-18
[18:21] caai: jfw: thank you for the explanation. i am fairly confident about, and have successfully tested, how to exclude a single file. however, i am still a bit unclear about how to write the exclusion patterns for directories. nevertheless, i have written 2 scripts that may work. i have located an external hard drive that i will use for this project, which i will test the sysbackup script on and then
[18:21] caai: report back to you
[18:24] caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4603 - nice! good discovery!
[18:24] sourcerer: 2022-06-17 22:00:56 (#jwrd) jfw: http://fixpoint.welshcomputing.com/2022/fixpoint-security-advisory-freefilesync-misses-host-key-check-allowing-breach-of-data-confidentiality-or-authenticity/
Day changed to 2022-06-19
[14:06] caai: i have tested the system backup scripts and they do not work. please note the scripts: http://welshcomputing.com/paste/i3bf63sfng
[14:22] caai: on the other hand, i have tested the btc database backup script and it works. i just got around to testing it today because i needed to change the external hard drive's system id to 83 via 'fdisk' and create a new file system 'mkfs -t ext4'
[15:40] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4604 - directories work pretty much the same as files; if a directory is excluded then so is everything inside it
[15:40] sourcerer: 2022-06-18 18:21:24 (#jwrd) caai: jfw: thank you for the explanation. i am fairly confident about, and have successfully tested, how to exclude a single file. however, i am still a bit unclear about how to write the exclusion patterns for directories. nevertheless, i have written 2 scripts that may work. i have located an external hard drive that i will use for this project, which i will test the sysbackup script on and then
[15:42] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4606 - perhaps I missed my calling as a minesweeper.
[15:42] sourcerer: 2022-06-18 18:24:59 (#jwrd) caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4603 - nice! good discovery!
[15:45] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4608 - the first problem with option 1 is that -X works indirectly: it names a file which itself contains the list of excludes. This keeps the main script a bit tidier I'd say. the use of --exclude as in option 2 should work as a more direct alternative there.
[15:45] sourcerer: 2022-06-19 14:06:04 (#jwrd) caai: i have tested the system backup scripts and they do not work. please note the scripts: http://welshcomputing.com/paste/i3bf63sfng
[15:46] jfw: the second problem (which is with both) appears to be confusion about what the basic tar parameters do.
[15:50] jfw: you use -f sysbackup.tar.gz to name the output file - good, but note that's a relative path so where it ends up will depend on where you happened to run the script from! then you give /mnt/flashdrive perhaps as an attempt to fix that, but that's where tar expects to find the *source* file or directory (or multiple thereof)
[15:52] jfw: then maybe give a closer read of this example to see how it gets the paths listed in the archive to start with './' to match the exclusions.
[15:52] sourcerer: 2022-06-17 17:51:05 (#jwrd) jfw: a workaround I found is to start instead from './' ; for instance, 'cd /; tar -cf /mnt/something.tar .' instead of 'cd /mnt; tar -cf something.tar /'
[15:54] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4609 - 83 being the partition type code for Linux, where a listing of codes can be found within the fdisk menus. well done!
[15:54] sourcerer: 2022-06-19 14:22:05 (#jwrd) caai: on the other hand, i have tested the btc database backup script and it works. i just got around to testing it today because i needed to change the external hard drive's system id to 83 via 'fdisk' and create a new file system 'mkfs -t ext4'
Day changed to 2022-06-23
[00:59] jfw: http://fixpoint.welshcomputing.com/2022/freeing-windows-files-with-freefilesync/ and with some next-level alliterative pattern there.
[12:38] caai: jfw: it appears as though tar has syntax that i need to study (and use) more. based on your feedback, and a little bit of reading the documentation from the link you provided (GNU tar), i have a produced another script; option 3. please let me know if i am getting closer: http://welshcomputing.com/paste/922uj78cc3
[12:48] caai: i will download FreeFileSync and give it a try. what is the storage limit?
[13:01] caai: dorion/jfw: shall we have the next management meeting on: June 28 - 23:00 UTC?
Day changed to 2022-06-24
[16:30] jfw: caai: yes, option 3 is getting closer, in that / is used as the source, as intended for a full-system backup, and the destination will end up predictably at /mnt/flashdrive/sysbackup.tar.gz. The excludes still won't work though.
[16:40] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4624 - well, if it was unclear, the point is that it's syncing to an SFTP server running on your own Linux box; thus any limit is entirely up to you / that machine's disk space (perhaps partitioned).
[16:40] sourcerer: 2022-06-23 12:48:37 (#jwrd) caai: i will download FreeFileSync and give it a try. what is the storage limit?
[16:43] jfw: that article is aimed at the end-user (or perhaps a junior admin assisting the end-user) in a scenario where we're administering the server for them, so the details of the server-side setup are omitted. TBD if we'll publish that part too.
[16:45] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4625 works for me, and I'll check if jwm would like to jump on for a bit.
[16:45] sourcerer: 2022-06-23 13:01:36 (#jwrd) caai: dorion/jfw: shall we have the next management meeting on: June 28 - 23:00 UTC?
[17:05] jfw: ohai jwm. want to join the JWRD board meeting with Chad next Tuesday at 7pm local to discuss sales & strategy?
[17:06] jwm: for sure - sign me up.
[17:07] jfw: probably simplest to just sit at the same Zoom terminal.
[22:13] jwm: seedsigner.com
[22:13] jwm: interesting raspberry pi project
Day changed to 2022-06-25
[16:01] caai: jfw: message received about the 'excludes'. i am working on option #4.
[16:07] caai: in regards to FreeFileSync, understood. i will read more
[20:19] jfw: jwm: to me it just looks like - how to put it - some nail polish artists picked up a gadget from walmart and worked out how to paint "secure" on it in semi-gloss. but I dunno, what do you see interesting about it?
[20:23] jfw: perhaps that they're catching on to "airgap" as a magic keyword, only nine years late to the party
[20:24] jfw: https://seedsigner.com << as a proper link ftr
[20:24] jfw: not that there's anything proper about https
Day changed to 2022-06-26
[17:23] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4625 -- works for me.
[17:23] sourcerer: 2022-06-23 13:01:36 (#jwrd) caai: dorion/jfw: shall we have the next management meeting on: June 28 - 23:00 UTC?
Day changed to 2022-06-27
[14:23] caai: jfw: good morning. please note script option #4: http://welshcomputing.com/paste/aaeshpxtp8
[14:26] caai: i imagine the path to the exclude file needs to be absolute. therefore, http://welshcomputing.com/paste/gycwnajgux
[14:28] caai: please note the accompanying file '/home/user/scripts/EXCLUDE_sysbak.txt': http://welshcomputing.com/paste/6jkfrd766u
[17:51] jfw: caai: you've changed the form by which the exclude list is provided, but the substance is unchanged. actually it's more broken now because the exclude file isn't itself a shell script, just a list, so the "cd /;" and possibly even the comment will end up taken literally as paths.
[17:52] jfw: on the upside, looks like you got the linkage from the script to the exclude file correct.
[17:53] jfw: I'd say fix that exclude file then just give it a try, perhaps the problem will then become clearer.
[17:54] jfw: pay attention to the 'tar' output as it goes (or redirect to a log for later inspection)
Day changed to 2022-06-29
[19:57] jfw: caai, dorion: perhaps an obvious point but it occured to me, regarding why the training is so hard to sell: we're going for people who've worked hard for their money, i.e. been quite successfully distracted from what mattered in the world by their saltmine overlords and/or own stupidity, or else started out rich or otherwise lucked into riches, yet got themselves distracted just the same; so by
[19:57] jfw: pointing out that they need to get off ass and lift a finger or two of their own to protect that money from a hostile environment, we're basically asking them to accept the reality of their total, abject failure to do or even correctly perceive what they needed to be doing for the past however many years. Thus, "saving them time & money on the path" may be perfectly true in theory, psychologically
[19:57] jfw: we're perceived as the ones imposing a massive cost.
[19:58] jfw: *Thus, while ...
[20:07] jfw: as per ye olde Naggum on lisp advocacy misadventures : " They would not /use/ C if they understood this point, so if you actually cause them to understand it in the course of a discussion, you will only make them miserable and hate their lives. People are pretty good at detecting that this is a likely
[20:07] jfw: outcome of thinking, and it takes conscious effort to brace yourself and get through such experiences. Most people are not willing even to /listen/ to arguments or information that could threaten their comfortable view of their own existence, much less think about it, so when you cannot answer a C programmer's "arguments" that his way of life is just great the way it is, it is a pretty good sign
[20:07] jfw: that you let him set the agenda once he realized that his way of life was under threat."
[14:29] caai: Good morning! I have complete lesson 9, assigment 3. Please note the following: http://welshcomputing.com/paste/kj6w9adp4k
[14:33] caai: Please let me know if the directory names are advisable or you recommend another naming scheme.
[14:39] caai: I placed the script under /home/user/scripts. I named the file bcbakscript.txt and made it executable (chmod +x). I did not place it in any PATH because I didn't consider that necessary.
Day changed to 2022-06-06
[13:42] caai: please note lesson 9, assignment 4: http://welshcomputing.com/paste/r4fnv6hazd
[14:47] jfw: caai: got 'em and will have a look in a bit.
Day changed to 2022-06-07
[22:57] jfw: caai, dorion: zoom/windows has decided I no longer have a microphone (notwithstanding it's located on the webcam which otherwise is working fine). going to try some futzing.
[22:59] dorion: aok, standing by.
Day changed to 2022-06-10
[23:49] jfw: caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4273 - a reasonable start. (My own bitcoind backup script at this point is a good 80 lines long because it's trying to be as efficient & robust as possible under the circumstances - not that such lengths were necessary here.)
[23:49] sourcerer: 2022-06-05 14:29:33 (#jwrd) caai: Good morning! I have complete lesson 9, assigment 3. Please note the following: http://welshcomputing.com/paste/kj6w9adp4k
[23:51] jfw: note that 'exec' has a particular meaning which usually isn't want you want; in this case it's harmless but kind of sticks out. it's used for instance in daemontools run scripts at the final point where the target program is started, because the invoking shell is no longer necessary and would get in the way of signal handling.
[23:53] jfw: as to advisable directory names, that's up to you but generally for a local backup script the point is to be copying things to a physically separate drive at least, so you'll need your script to agree with the path at which you mount that drive.
[23:56] jfw: and if that's a temporary external drive, I'd use something under /mnt rather than my home dir, for instance because the mount/umount at least need to be done as root so you don't need the extra levels of path overhead.
[23:58] jfw: the final point would be that you'll need to take the node down manually before running your script, otherwise there's a risk that you get an inconsistent snapshot of the database (when it's modified at the same time that you're copying it).
Day changed to 2022-06-11
[00:00] jfw: there *are* sometimes better ways to deal with this but nobody's worked it out yet for poor bitcoind, to my knowledge.
[00:00] jfw: of course you could automate the taking down & putting back up by extending the script.
[00:02] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4275 - works; one convention is to use a ".sh" extension for shell scripts.
[00:02] sourcerer: 2022-06-05 14:39:03 (#jwrd) caai: I placed the script under /home/user/scripts. I named the file bcbakscript.txt and made it executable (chmod +x). I did not place it in any PATH because I didn't consider that necessary.
[00:04] jfw: Putting a script in the search path is like a second level once it's become important or frequently used enough; it's basically defining a new word (command) in your CLI environment.
[00:09] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4276 - this one's not so viable.
[00:09] sourcerer: 2022-06-06 13:42:09 (#jwrd) caai: please note lesson 9, assignment 4: http://welshcomputing.com/paste/r4fnv6hazd
[00:12] jfw: ! and () are shell metacharacters in some contexts but in any case won't do what you want there. for a full system backup, cp -r will make a mess of the result by not preserving metadata (timestamps, permissions, owner/group, possibly other aspects like hard links)
[00:18] jfw: a simple approach here would be to use tar instead of cp, with its -X option to specify a file with the list of exclusions. finally, the file or directory you're writing the backup into will likely need to be excluded too; perhaps that's a stronger argument for using /mnt really.
[00:23] jfw: of course the way to figure most of this out or at least make it stick will be to try it, look closely at the result, and finally test restoring things from the backup.
[00:24] jfw: you might also add -v (for either cp or tar) especially while testing so you can see more easily what it's spending time on.
Day changed to 2022-06-12
[00:31] rodbl: Hi, I'm back again!
[00:38] jfw: Welcome back rodbl / rodolfo.
[00:39] rodbl: Thanks man!
[00:39] rodbl: Moving forward with what a left behind a while back
[00:41] jfw: Glad to hear it. I dunno if dorion filled you in but one idea we had was to try to get your existing code running on another VM with more conventional Linux so it's not quite as big of a leap to take all at once.
[00:42] rodbl: Correct
[00:43] rodbl: A couple of hours ago, I sent him some high-level explanation regarding the development and basic library requirements in order to deploy everything as soon as the "more conventional VM" is available
[00:44] rodbl: This will speed up the delivering of the MVP
[00:46] jfw: sounds good, that should help narrow down what OS to use.
[00:46] rodbl: Yes sir
[00:50] jfw: rodbl: it'll be simplest for me to just wipe the existing VM and repurpose its resources, so give a look as to whether there's any data or work you need saved from that environment.
[00:55] rodbl: Well, there are some py files (modules) and csv files (data) but it's 0K if you remove them
[01:02] jfw: ok; I can just copy it over to the new machine too if you point me to the path.
[01:04] jfw: do you have my email address to send that document to, or should I have dorion copy it to me?
[12:43] rodbl: Hi @jfw
[12:44] rodbl: Sorry, I did't see your message earlier
[12:44] rodbl: What's your email address?
[12:44] rodbl: I'll be great to have it
[12:45] rodbl: Hehehe I was telling RD the other day about having a call with you, just to talk randomness and finally "meet you"
[16:39] dorion: rodbl, welcome back. jfw, I'll forward you the email.
[20:17] rodbl: I'm setting up a local VM with Ubuntu, so I can start implementing the "Linux-version" of the MVP. Sounds good?
[20:18] rodbl: This is until the new remote VM is up and running
[20:39] dorion: rodbl, hold up on ubuntu because I don't think we'll be using that on the VM. I think a higher priority task is working on the db schema so we can move from csv files to mysql.
[20:43] rodbl: Roger that. Let me know if there is anything else I can do in order to move forward
[20:46] rodbl: About the importing data to mysql, "as per my last email", the best way I could come up to store scraped data was to save each classified ad as a json dict
[20:48] rodbl: This is because in order to concatenate every scraped ad, a dataframe will ensure that each value will land on a specific column due to the associated key.
[20:48] rodbl: *concatenate them together as a whole
[21:03] jfw: rodbl: what does the structure of that json end up looking like? I'd imagine after a certain number of ads, a fixed set of fields begins to emerge (though each field might not be populated in each listing), and they're pretty much flat i.e. just a key-value list where the values are just strings or numbers or the like.
[21:06] jfw: as a first step, that would translate directly to a single database table. then you'd look at redundancies like perhaps the neighborhood, which could be factored out into a separate table with one-to-many relationship (so a property record just has a neighborhood ID, then you join that to the neighborhoods table to get its full name, and perhaps coordinates or whatever else we want to add later).
[21:06] rodbl: I can send you a couple of examples. The main reason of saving ads as json dicts is because even though the HTML structure is standard for each ad (for example, the table where all details are shown) doesn't necessarily means that each ads has a specific piece of information (for example, the HTML table has <td> for location but the ad might not have a location value)
[21:07] jfw: json is certainly better structured than just stashing the HTML as a string, if that's what you're comparing.
[21:12] rodbl: Correct. I could come up with a new approach for the scraping instead of pointing to XPATH and, if necessary, I could also come up with a method that could ingest the scraped data directly to the db
[21:12] rodbl: Which I think is could be the best practice
[21:12] rodbl: *it could
[21:13] rodbl: I sent you guys examples of these JSON files
[21:18] jfw: rodbl: what's going on in 21087301.json with >> }miento\r", << on line 34 ?
[21:20] rodbl: Malformed, maybe it's supposed to be "Estacionamiento"
[21:20] jfw: but it's your code putting out the json, isn't it?
[21:21] jfw: seeing similar in 21090422.json
[21:21] rodbl: Yeah
[21:21] rodbl: I did have some issues during that run, it was with my old computer.
[21:22] rodbl: I can send you verified JSON files
[21:22] jfw: anyway, besides that it looks to be as I suspected, except that Detalles is broken into an array of separate lines which ought to be joined into one string since it's just free-form text. we'll probably want to map the field names to English.
[21:22] rodbl: So you can perform an adequate test on your side
[21:23] rodbl: Yeah, the keys from origin are hard to handle. New labels are required for sure
[21:23] rodbl: "Detalles" is broken in lines, but it can be treated as a string for ease of use
[21:23] jfw: no need to send me updated JSON unless there's something specific you want to show; I was just pointing it out.
[21:25] rodbl: Yes, that batch (from where those example are coming from) doesn't look very good. I'm suspecting some issues related to my previous computer.
[21:25] jfw: heh, seems the exercise has been productive already.
[21:38] rodbl: Hahaha well QA is always necessary
[21:44] rodbl: In the mean time, I'm going to be restructuring the scraper in order to assert quality of the data. I had a pending experiment to do: instead of "crawling" over each HTML (ad), I could determine if this web sites uses internal API calls.. if they do, then a more efficient method can be achieved..
[21:44] rodbl: *quality of the data and efficiency in the process
[23:21] jfw: rodbl: if this is encuentra24, last I checked it doesn't do anything so civilized; it makes XHR calls just to fetch html strings and splice them into the page. what you can do is exploit that to bypass whatever browser mechanisms and script the download of the full data set. not sure how you were doing it before; looks like someone finally sent me the docs so I can have a look in a bit.
[23:22] jfw: rodbl, dorion: did either of you get my reply to the thread in your gmail accounts?
Day changed to 2022-06-13
[00:35] dorion: jfw, not here and not even in spam, god damn.
[02:04] jfw: http://fixpoint.welshcomputing.com/2022/fixed-width-bit-fiddling-tuneups-for-gbw-signer/
[02:07] jfw: dorion: hm, I see it was actually so polite as to give me an explicit rejection this time: "The IP you're using to send mail is not authorized to send email directly to our servers. Please use the SMTP relay at your service provider instead. Learn more at https://support.google.com/mail/?p=NotAuthorizedError"
[02:10] jfw: "The determination of whether or not an IP address is authorized to send mail is made by the ISP that provides you with the IP address" - fucking typical, no hint as to *how* such authorization is determined, it's certainly not a normal ISP function. most likely it's total nonsense.
[02:20] jfw: http://welshcomputing.com/paste/2juhnqyu5p << my poor censored message, which even knew its own fate in advance.
[04:54] jfw: rodbl: do me (and yourself) a favor, would you, and keep textual documents in a text-based format when sending to me, even if that just means "save web page as html", rather than printing to pdf (see how it turns a maybe 200 word doc into a 1.6 MB, unsearchable blob demanding a GUI environment)
[04:55] jfw: for the sample report I understand it, it's supposed to be shiny and appealing to banker types, fine. talking about our internal documentation though.
[04:57] jfw: "selenium, chromedriver, chrome_browser" - aha, that's the mess I had in mind with bypassing
[04:57] sourcerer: 2022-06-12 23:21:55 (#jwrd) jfw: rodbl: if this is encuentra24, last I checked it doesn't do anything so civilized; it makes XHR calls just to fetch html strings and splice them into the page. what you can do is exploit that to bypass whatever browser mechanisms and script the download of the full data set. not sure how you were doing it before; looks like someone finally sent me the docs so I can have a look in a bit.
[04:58] jfw: yet another tradeoff of up-front labor vs. system complexity I suppose.
[05:16] jfw: rodbl: where does the CSV data fit in your current process? I'm not seeing it mentioned in the docs.
[15:49] jfw: rodbl: also, are you familiar with git yet? (the original CLI program, not the "Hub".) we're thinking to set you up with a shared repository on our server to help with the code, docs and communications.
[15:52] jfw: for instance, I could get you my old scraper code, probably not usable directly for your thing but might be good for some ideas.
[21:55] jfw: "Note, the first time you ever run the render() method, it will download Chromium into your home directory" << I see the "convenience at all costs" agenda is progressing
[21:56] jfw: "Only Python 3.6 is supported." << but they're behind the times!11 why not "only python 3.10.5 is supported"?!
[21:57] jfw: "Requests officially supports Python 3.7+." << incompatible with the same dude's other stuff, lolz
[22:43] jfw: https://peps.python.org/pep-0644/ << looks like efforts to murder libressl have been stepping up. "sure openssl stabbed us in the kidneys but so what, it still loves us so all is forgiven!"
[22:56] jfw: and I gather the new python doesn't specify its build dependencies at all
[23:08] jfw: rodbl, here's my attempt to track down your stated dependencies at least to the first level, with some ??s for names that didn't resolve sensibly on PyPI that you might need to point me to more specifically.
[23:09] jfw: generally I'm guessing maybe 3.9 would be the way to go if we need to build a Python.
[23:11] jfw: let me know if the list looks sane.
Day changed to 2022-06-14
[00:20] rodolfo: Hi!
[00:20] rodolfo: I think I read everything
[00:21] dorion: in other noose, someone I'm curious to talk to said they only use signal and I should install that... so I go and do it, using anbox android emulator on a public toilet box and shortly after launching it I see a new, peculiarly named process running in my top output. turns out, under da hood, signal calls itself "crime.securems" ... myeah, totally NOT a honeypot, lolz.
[00:21] dorion: heya, rodolfo, back with a new name !
[00:22] rodolfo: Yes, there is a possibility that Encuentra24 is using XHR, so "injecting" a query might be a possibility. Further observation is required in order to understand the async calls
[00:25] rodolfo: What is anbox?
[00:26] rodolfo: Well, regarding the "documentation" I kindly share with you guys, I'll be more thoughtful about your disk resources. A plain file txt could be enough next time
[00:27] dorion: rodolfo, it's an android emulator.. let's you run android on a desk/laptop so, e.g. you can you a real keyboard.
[00:28] dorion: rodolfo, while jfw mentioned the file size, it's more about flat files vs binary than disk space.
[00:28] rodolfo: I was going to export a requirements.txt with each dependency's version, I still can if you needed.
[00:29] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4356 -- unsearchable blob + gui environment are the bigger keys.
[00:29] sourcerer: 2022-06-13 04:54:53 (#jwrd) jfw: rodbl: do me (and yourself) a favor, would you, and keep textual documents in a text-based format when sending to me, even if that just means "save web page as html", rather than printing to pdf (see how it turns a maybe 200 word doc into a 1.6 MB, unsearchable blob demanding a GUI environment)
[00:29] rodolfo: Regarding Git CLI, I do have experience with it.
[00:30] dorion: see how that echos rodolfo ? you can paste the link of the line you're replying to rather than preface with, "regarding xyz.."
[00:30] dorion: the link for each line is the timestamp shown in the logger.
[00:31] rodolfo: Hahaha yeah, it's pretty handy actually
[00:32] dorion: yeah, and even more so as the channel gets more active.
[00:33] rodolfo: BTW the "selenium" requirements are not fundamentally required since most of the work is done with requests.
[00:34] rodolfo: Did you have the time to check out the video?
[00:34] dorion: that's good news, should really try to minimize complexity because this is going to be a cat and mouse game with the data providers.
[00:34] dorion: video of what now ?
[00:35] rodolfo: I didn't have the to show you the functionality
[00:35] dorion: is this new functionality from what you showed me a few months back ?
[00:37] rodolfo: Not the heavy-JS version. The plain HTML one, that can support multiple calcs for a given submit
[00:38] rodolfo: I have an improved version of the "bulk calculation", where the user gets more detailed scenarios and comparisons
[00:38] dorion: sure then, go ahead and link it.
[00:39] rodolfo: https://drive.google.com/file/d/1n9uAghSOwucCLgMZ4H8XzrkzqRth-iSU/view?usp=sharing
[00:40] rodolfo: Feel free to skip every other minute, I had to make a detailed explanation about this back then
[00:43] rodolfo: Most importantly, I would like you to check out the analysis example I attached. This is what I was telling you about the other day that it is parameterizable in order to deliver a professional document for the end user.
[00:46] rodolfo: Analysis:
[00:46] rodolfo:
[00:46] rodolfo: https://drive.google.com/file/d/13oq7y3z3wctwCzEEBo7UYQ0OIOHC60A4/view?usp=drivesdk
[01:18] dorion: rodolfo, the analysis is a solid start.
[01:26] rodolfo: Great!
[02:41] jfw: dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4374 - heh, was this 'top' from a shell inside the emulator or what?
[02:41] sourcerer: 2022-06-14 00:21:26 (#jwrd) dorion: in other noose, someone I'm curious to talk to said they only use signal and I should install that... so I go and do it, using anbox android emulator on a public toilet box and shortly after launching it I see a new, peculiarly named process running in my top output. turns out, under da hood, signal calls itself "crime.securems" ... myeah, totally NOT a honeypot, lolz.
[03:00] dorion: jfw, nah I don't recall all the details, but this anbox via snap thing isn't a pure vm... can see from the host some of the emulator processes.
[03:01] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4378 - apparently I'm not always such a nice guy, despite what first impressions may suggest. rodolfo: think of us like a peculiar order of monks, perhaps; you can learn quite a lot by working with us, if you make the most of it; but you will have to pick up some of our customs, funny hats and robes in order to get along. I'll remind
[03:01] sourcerer: 2022-06-14 00:26:41 (#jwrd) rodolfo: Well, regarding the "documentation" I kindly share with you guys, I'll be more thoughtful about your disk resources. A plain file txt could be enough next time
[03:01] jfw: myself that you're still pretty new to this stuff and perhpas I didn't need to hit you with it quite so early; but this was about the politest way I could come up with to express the situation at the time. There's a lot more substance behind that simple and seemingly annoying request than might first meet the eye.
[03:01] sourcerer: 2022-06-13 04:54:53 (#jwrd) jfw: rodbl: do me (and yourself) a favor, would you, and keep textual documents in a text-based format when sending to me, even if that just means "save web page as html", rather than printing to pdf (see how it turns a maybe 200 word doc into a 1.6 MB, unsearchable blob demanding a GUI environment)
[03:01] jfw: dorion: huh, weird.
[03:05] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4381 - certainly couldn't hurt. and I didn't mean to step on your toes there, I was just looking up the things you mentioned to get some notion of what they are & what they require, then realized that since I was looking I might as well be recording it too.
[03:05] sourcerer: 2022-06-14 00:28:41 (#jwrd) rodolfo: I was going to export a requirements.txt with each dependency's version, I still can if you needed.
[03:08] rodolfo: Don't worry, I try to be as emotional numbed as possible in terms of not attaching a specific "tone of voice" to a text message.
[03:12] rodolfo: I'll try to absorb as much wisdom as possible.
[03:13] jfw: especially given the language gap that's probably sensible about not imputing "tone of voice"; not sure about the "numbing" part as such though.
[03:30] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4391 - funnily enough, keeping up with site layout changes or anti-scraping antics is sounding like the easier part compared to keeping up with python ecosystem changes and getting code to run at all, at the moment
[03:30] sourcerer: 2022-06-14 00:34:49 (#jwrd) dorion: that's good news, should really try to minimize complexity because this is going to be a cat and mouse game with the data providers.
[05:15] rodolfo: I do enjoy exercising some literature resources, as euphemisms and sarcasm. I'll try to make my communication as HTML-ish as possible: plain and simple.
[13:30] rodolfo: Good morning friends
[13:31] rodolfo: Trying to stay on the loop
[13:47] rodolfo: I forgot to mention last night about an idea I shared with <dorion> the other day, about carrying out activities based on sprints in a way we all commit to specific action items for specific dates.
[13:56] dorion: good morning rodolfo. see jfw's questions/comments about python packages and version.
[13:56] sourcerer: 2022-06-13 23:08:15 (#jwrd) jfw: rodbl, here's my attempt to track down your stated dependencies at least to the first level, with some ??s for names that didn't resolve sensibly on PyPI that you might need to point me to more specifically.
[13:56] sourcerer: 2022-06-13 23:09:02 (#jwrd) jfw: generally I'm guessing maybe 3.9 would be the way to go if we need to build a Python.
[14:39] dorion: rodolfo, clarification re the above will help jfw in picking the os for the new vm. leaning towards cent os 6 atm.
[15:19] rodolfo: Python 3.9 would be fine
[15:19] rodolfo:
[15:19] rodolfo: I can provide the requirements.txt, if needed.
[15:20] dorion: sure, go ahead.
[15:51] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4424 - at minimum, communicating more about our plans or upcoming work (prospective reporting) sounds like an excellent idea. deadlines I'm not so sure, they can be costly (if taken seriously, and if not then what's the point) and I gather we all have pre-existing commitments that would come before this
[15:51] sourcerer: 2022-06-14 13:47:43 (#jwrd) rodolfo: I forgot to mention last night about an idea I shared with <dorion> the other day, about carrying out activities based on sprints in a way we all commit to specific action items for specific dates.
[15:55] rodolfo: The purpose is to focus efforts, either online or offline, in order to make the best use of our time.
[15:56] rodolfo: Deadlines can be flexible, but some kind of progress needs to be achieved within a time window.
[15:58] rodolfo: This is a good practice, even more if there are third-parties potentially interested in adding capital to this initiative.
[15:59] jfw: you mean something like, sharing projected dates of completion for particular tasks?
[16:06] jfw: example: currently I'm waiting on you to provide your requirements.txt and/or feedback on my handmade list, so that I can hammer down the lower-level software versions; once this is decided I can probably get the VM rebuilt by the end of the week (possibly with some software still to be worked out).
[16:14] jfw: my thinking on python 3.9 is that 3.8 emerged as the minimum demanded by the current versions of the first-level dependencies; going a little newer might help it stay workable for more things for longer; but in 3.10 they got more aggressive about breaking older SSLs and I'd rather not have to hand-build that too.
[16:14] sourcerer: 2022-06-13 23:09:02 (#jwrd) jfw: generally I'm guessing maybe 3.9 would be the way to go if we need to build a Python.
[16:15] jfw: python 3.6 is the latest I'm seeing for centos 6 even via the "software collections"
[16:22] rodolfo: Yes, I mean like stepping on the gas. I know that is not very monk-alike, but sometimes stress is needed.
[16:22] rodolfo:
[16:22] rodolfo: Let's get specifics when it comes to requesting info, I'm not (neither are you) interested in the granularity of your request. If you need the requirements.txt, you will get by the end of the day.
[16:24] rodolfo: I hope you are also "emotional numbed" (no language barriers at all, pretty straight forward), but let's get straight to the points. Frankly, I don't have time to read about tech philosophy, I like both of you guys so I need you to understand that in this context, my mindset is completely focused on doing something we can all get a monetary benefit.
[16:26] rodolfo: We can leave the didactic aspects for future references. No hard feelings.
[16:26] jfw: rodolfo: as far as stepping on the gas, had you been waiting for something from us? I thought it was you that stepped out for a month or something.
[16:26] rodolfo: You already did: you need the requirements.txt
[16:26] rodolfo: The ball is on my side
[16:27] jfw: before that, I meant.
[16:28] rodolfo: Nahh, a month ago we were wasting time since it made no sense my intervention in setting up a Py environment and modules from scratch when all of this can be easily achieved.
[16:28] dorion: rodolfo there are 2 sides of the monetary aspect. 1 is top line wrt getting something to market that starts to cashflow. 2 is ongoing costs of maintenance and future development.
[16:29] rodolfo: Besides the info I already sent via email (including video, examples, etc.) plus the requirements.txt that I'm going to send today, what else do you need?
[16:29] dorion: so what we're aiming to do is strike a balance so the thing can be sustainably scalable.
[16:30] jfw: also things like file formats are not idle "philosophical" points but are with an object toward working and collaborating most effectively.
[16:32] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4452 - that's your conclusion, I don't necessarily share it but it was not clear to me that you thought this at the time and stepped out because of it, if that's what you're saying.
[16:32] sourcerer: 2022-06-14 16:28:24 (#jwrd) rodolfo: Nahh, a month ago we were wasting time since it made no sense my intervention in setting up a Py environment and modules from scratch when all of this can be easily achieved.
[16:33] jfw: and it's fine, no hard feelings; I'm simply pointing out ways within your own control to achieve the speedup you're looking for.
[16:34] rodolfo: Correct. From a product perspective, let's sat that an improved version can be achieved almost anywhere. The infrastructure-related costs (aka running an app from your server) most likely is going to demand testing of resource consumption. However, we can walk and chew gome at the same time. Meaning that, while the CentOS env is being setup, I can get my hands on the "improved version" since, in tech slang, the development is agnostic and I can do it in a
[16:34] rodolfo: Win env.
[16:35] rodolfo: *chew gum
[16:36] jfw: rodolfo: do you mean you're going to set up mysql in windows now? to me that sounds rather like a waste of time. in general, a total change of environment tends to be a major source of "unexpected" costs when going into production
[16:37] jfw: if you can make progress with your code for now in your existing environment before ours is ready, that sounds fine though.
[16:40] jfw: rodolfo: what was your "Correct" in regards to? on the reread I can't quite tell.
[16:47] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4454 - to be explicit, nothing more needed for now; in general, maybe a bit more re-reading of the log so you don't miss things like that ( http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4414 - http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4431 )
[16:47] sourcerer: 2022-06-14 16:29:33 (#jwrd) rodolfo: Besides the info I already sent via email (including video, examples, etc.) plus the requirements.txt that I'm going to send today, what else do you need?
[16:47] sourcerer: 2022-06-14 03:05:46 (#jwrd) jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4381 - certainly couldn't hurt. and I didn't mean to step on your toes there, I was just looking up the things you mentioned to get some notion of what they are & what they require, then realized that since I was looking I might as well be recording it too.
[16:47] sourcerer: 2022-06-14 15:19:20 (#jwrd) rodolfo: I can provide the requirements.txt, if needed.
[17:03] rodolfo: Not at all, I need parameters in order to focus and move forward.
[17:05] rodolfo: My "lifestyle" is based on caffeine and stress.
[17:06] dorion: eh, I've seen you laugh here and there ;P
[17:06] rodolfo: Hahaahaa
[17:07] rodolfo: Hey man, we need to keep it corporate gangsta
[17:07] rodolfo: No room for feelings
[18:20] dorion: there's room for feelings, as long as you're not an idiot.
[18:46] rodolfo: "Smart feelings"
[23:22] rodolfo: requirements.txt sent
[23:22] rodolfo:
[23:22] rodolfo: Ball is on the monk's side =)
[23:28] jfw: whew, looks like I'm going to need a bigger hammer to wrangle that list of 347 requirements.
[23:31] jfw: rodolfo, do you know if all those come in as downstream dependencies of the data science / machine learning ones you listed earlier, or what?
[23:32] jfw: thanks for the file, good to have the reference for sure.
[23:35] rodolfo: It depends, pip can panick if you try to update or install a package if a dependency is absent or obsolete. In order to avoid that, I usually do "pip install pack_name[all]"
[23:35] rodolfo:
[23:35] rodolfo: The "[all]" arg will take care of every dependency, including does that you might not need right away.
[23:36] rodolfo: So, you could get sklearn without dependencies if you want to
[23:50] jfw: what I'm after is, at the high level, what's causing all those to be brought in & why.
[23:52] jfw: if we actually need to be *this* promiscuous then it might warrant getting a dedicated server for it and nevermind "virtual machines".
Day changed to 2022-06-15
[11:35] caai: jfw: thank you for the feedback. i have created a directory named .sh and moved my scripts into it
[11:37] caai: please note the script for assignment 3 with the corrections: http://welshcomputing.com/paste/kpwd4xhas8
[11:40] caai: in regards to taking down the node before running the script, or extending the script to include such commands. what commands would those be since it is a daemontools service? svc -d /service/bitcoind to take it down, next run the script, then upon completion, svc -u /service/bitcoind to bring it back up?
[11:51] caai: do you suggest including the wallet in this backup? i have it backed up elsewhere externally
[12:27] rodolfo: I understand.. what you are saying is that we should aim for the dependencies that for sure we will need, and forget about those modules/libraries that might require "extra care" but don't really add value at the moment. Right?
[12:44] dorion: right, if we don't need them, they're liabilities.
[13:29] rodolfo: Understood. Is it necessary for me to rectify the requirements.txt?
[13:41] dorion: rodolfo, I think the more practical path forward is setting up a dedicated server for this rather than vm. we can chip away at the dependency list over time. focus first on the db schema and moving to mysql. btw, did you see jfw's q about where the csv files fit ?
[13:41] sourcerer: 2022-06-13 05:16:11 (#jwrd) jfw: rodbl: where does the CSV data fit in your current process? I'm not seeing it mentioned in the docs.
[13:52] rodolfo: Perfect
[13:59] rodolfo: Regarding Jacob's question: inside the ETL&Model PDF, you will see that in the last paragraph I make a reference to "panel data (previously parsed from JSON)".
[13:59] rodolfo:
[13:59] rodolfo: The intention was to point out that those JSON dicts (that resulted from the scraping) are then transformed into a more readable format. The output CSV is a byproduct of this "transformation". Makes sense?
[13:59] rodolfo:
[13:59] rodolfo: Scraper
[13:59] rodolfo: | JSON
[13:59] rodolfo: | Panel Data -> CSV
[13:59] rodolfo: | Input for the model
[15:48] jfw: caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4490 - I meant .sh as a file extension for scripts themselves instead of .txt, not as a directory.
[15:48] sourcerer: 2022-06-15 11:35:34 (#jwrd) caai: jfw: thank you for the feedback. i have created a directory named .sh and moved my scripts into it
[15:49] jfw: caai: updated script looks fine.
[15:50] jfw: to capture timestamps too, which can be nice for historical reference at least, you could add -p to the 'cp' flags.
[15:53] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4492 - that was the idea, yes; though parallel to the svc -t (TERM signal) vs svc -k (KILL signal), I'd use svc -kd rather than svc -d. The latter only requests it to shutdown, which may (and indeed will) take some time, during which your script would proceed to copying, defeating the purpose. with -k it's effectively instant.
[15:53] sourcerer: 2022-06-15 11:40:31 (#jwrd) caai: in regards to taking down the node before running the script, or extending the script to include such commands. what commands would those be since it is a daemontools service? svc -d /service/bitcoind to take it down, next run the script, then upon completion, svc -u /service/bitcoind to bring it back up?
[15:54] jfw: the "fully graceful" way to do it I suppose would be to poll 'svstat' until it shows actually down; but I wouldn't bother.
[15:55] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4493 - indeed treating the wallet separately makes sense to me.
[15:55] sourcerer: 2022-06-15 11:51:20 (#jwrd) caai: do you suggest including the wallet in this backup? i have it backed up elsewhere externally
[16:00] jfw: rodolfo: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4493 - I'd say that's the underlying idea, yes, but so far I'm just trying to understand what's there & why - the diagnosis rather than prescription stage, so to speak.
[16:00] sourcerer: 2022-06-15 11:51:20 (#jwrd) caai: do you suggest including the wallet in this backup? i have it backed up elsewhere externally
[16:00] jfw: ah, wrong line, I meant: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4494 - I'd say that's the underlying idea, yes, but so far I'm just trying to understand what's there & why - the diagnosis rather than prescription stage, so to speak.
[16:00] sourcerer: 2022-06-15 12:27:11 (#jwrd) rodolfo: I understand.. what you are saying is that we should aim for the dependencies that for sure we will need, and forget about those modules/libraries that might require "extra care" but don't really add value at the moment. Right?
[16:01] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4495 - worse, they're liabilities in any case whether they're also assets or not!
[16:01] sourcerer: 2022-06-15 12:44:42 (#jwrd) dorion: right, if we don't need them, they're liabilities.
[16:03] jfw: rodolfo: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4496 - I don't know; what process do you have in mind for changing it?
[16:03] sourcerer: 2022-06-15 13:29:26 (#jwrd) rodolfo: Understood. Is it necessary for me to rectify the requirements.txt?
[16:05] rodolfo: Creating a new environment and installing the basic libraries (aka no "pip install xyz[all]")
[16:06] jfw: rodolfo: what do you mean by "panel data" ? looking it up as a statistics term it seems to deal with data sampled over time which I gather yours is not.
[16:06] jfw looks up what that [all] really means
[16:12] jfw: as yet not finding anything on "all" as such, though there can be specifically-named "extras"
[16:13] rodolfo: Actually, it is. What I mean is that data is tabulated (from dict to dataframe) but also organized by chronological order (date of the ad).
[16:16] jfw: are you able to get historical data rather than just a snapshot of the current market from the currently active ads?
[16:16] rodolfo: For future initiatives, yes
[16:16] rodolfo: WebArchive sounds promising
[16:17] rodolfo: However, currently you can find that for a given location, there are older classifieds that are still running
[16:18] jfw: for a given property?
[16:18] jfw: or do you just mean that there's a range of how long the current ads have been listed?
[16:19] rodolfo: Perhaps for that case a closer monitoring is required, but there are good examples of "for a given apartment building"
[16:19] rodolfo: Correct, there are ads posted N amount of time ago that are still visible
[16:21] jfw: ok, but if it's still an active listing, the price (or other fields) may have been modified over its life; so while the age of the listing is certainly one data point to collect, it seems to me that the principal time value associated with all the data is the time it was sampled, which is the time of the scrape i.e. the same everywhere.
[16:22] jfw: if you get it running for a while collecting many snapshots over time, and you track all that, it could indeed become a time series.
[16:22] jfw: anyway, minor point for now.
[16:24] jfw: rodolfo: can you expand on this panic situation? it sounds like possibly a real problem but not quite fit solution
[16:24] sourcerer: 2022-06-14 23:35:20 (#jwrd) rodolfo: It depends, pip can panick if you try to update or install a package if a dependency is absent or obsolete. In order to avoid that, I usually do "pip install pack_name[all]"
[16:29] rodolfo: Right, but I'm not specifically pointing to build a time series with this particular exercise.
[16:29] rodolfo:
[16:29] rodolfo: The chronological comparison is relevant for feature engineering.
[16:33] rodolfo: Meaning that pip will return an error if you try to install libraries which dependencies are not present or updated.
[16:33] rodolfo:
[16:33] rodolfo: Then, a way to avoid any error related to dependencies is to execute "pip install XYZ[all]".
[16:33] rodolfo:
[16:33] rodolfo: This only works for certain libraries though
[16:35] rodolfo: Is not actually a great deal. Worst case scenario, if there is a dependency-related error, it will just be a matter of installing whatever is needed
[16:38] jfw: Are you quoting from something there (what)?
[16:41] jfw: otherwise it just reads to me as a restatement of the previous incomplete definition...
[16:42] rodolfo: https://usercontent.irccloud-cdn.com/file/WI4220FN/Screenshot_20220615_114225.jpg
[16:43] jfw: let me try this way: when/where do those "dependency-related errors" arise - when doing the initial 'pip install' or later (when running code that imports the module perhaps) ?
[16:44] jfw: that link is a screenshot of the current thread, not sure what you're trying to say by that, lol.
[16:55] rodolfo: It could be either at installation or coding time, because there might be modules within libraries that have specific dependencies.
[16:55] rodolfo:
[16:55] rodolfo: However, this is not a great problem because it can be addressed at any point if the environment that is going to be implemented allows it.
[16:55] rodolfo: I'm talking about regular use of Python, nothing fancy
[16:56] jfw: aren't libraries supposed to declare their dependencies upfront? i.e. if an import fails at runtime it would be a bug in the packaging, no?
[16:58] rodolfo: Again, if this freedom of functionalities represents an obstacle, I can throw parts of the code at a different instance and just call those functions from the new environment that you are setting up.
[16:59] jfw: rodolfo: I'm afraid I don't know what any of that means (freedom of functionalities? throw code at different instance? just call from new environment?)
[16:59] rodolfo: Not at import, but at execution time because the error will show up when a function from that module is called (assuming that the "special dependency" is invoked at that point).
[17:01] jfw: ah, is this like a lazy-loaded dependency then, something like an import statement found inside a function rather than at top-level ?
[17:02] rodolfo: My friend, neither you or I want to waste time. The cliché of "analysis paralysis" is getting to frequent. I'm sure you have had experiences when trying to make something work in Python and a dependency errors occurs.
[17:03] rodolfo: Let's just see what happens when trying to install whatever package in new setup, then the problem will be clear. For now, we are speculating
[17:07] jfw: Just see what happens without trying to figure out what anything actually is or means, that's your entire approach I suppose, it's not surprising then that you get randomly hit by errors you don't understand and then dig up some other band-aid to ...numb the feelings? but no, that's not the only possible way to work and I don't really have the same experience in my python-coding. but sure, let's
[17:07] jfw: try just dropping the [all] and see what comes out.
[17:10] jfw: I'm going afk, and can resume this tomorrow.
[17:24] rodolfo: I understand. Perhaps we are not in the same frequency, I'm sorry
[23:35] dorion: rodolfo, we started this knowing it was a prototype. when we met at krume, you said you were glad to try gales out and were happy that we could help you narrow down the decision making process.
[23:36] dorion: since then, we've agreed gales is a bridge too far at present and are trying to work with you to get the thing going.
[23:36] dorion: on the one hand, you say you want to absorb wisdom. that's encouraging and jacob has a lot to offer, with a decade and a half at least with python on top of other languages in addition to all the sys admin exprience.
[23:36] sourcerer: 2022-06-14 03:12:08 (#jwrd) rodolfo: I'll try to absorb as much wisdom as possible.
[23:37] dorion: we all have the same goal of monetizing this thing, but for it to be ours (the 3 of us), we have to understand it. as far as I can tell, that's what his questions are aimed at. however, for each line of questioning, you seem to be reframing as monk mode, paralysis by analysis, etc.
[23:37] dorion: we don't want paralysis by analysis, but help us help you.
[23:38] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4569 -- fine with me, as a starting point.
[23:38] sourcerer: 2022-06-15 17:07:30 (#jwrd) jfw: try just dropping the [all] and see what comes out.
[23:38] dorion: welcome back jwm.
Day changed to 2022-06-16
[00:11] jwm: thanks! finally turned on the machine to check on the node - shes still purring
[13:25] caai: jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4508 - hahaha alright. i have changed the file extension to .sh and renamed the directory to /home/user/scripts
[13:25] sourcerer: 2022-06-15 15:48:05 (#jwrd) jfw: caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4490 - I meant .sh as a file extension for scripts themselves instead of .txt, not as a directory.
[13:28] caai: i have added the -p to the 'cp' flags to capture the timestamps, too
[13:31] caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4512 - noted. i will use svc -kd rather than svc-d
[13:31] sourcerer: 2022-06-15 15:53:06 (#jwrd) jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4492 - that was the idea, yes; though parallel to the svc -t (TERM signal) vs svc -k (KILL signal), I'd use svc -kd rather than svc -d. The latter only requests it to shutdown, which may (and indeed will) take some time, during which your script would proceed to copying, defeating the purpose. with -k it's effectively instant.
[13:32] caai: in addition, i will treat the wallet separately
[16:17] jfw: caai: what's your next step in the technical exercises?
[18:33] caai: i will send you the corrections for lesson 9, exercise 4 and then start the homework from lesson 10. i know that i need to integrate regex more (one of the assignments from lesson 9), but i will set aside that task for now so that i progress faster
[18:37] jfw: I suppose regex is the sort of thing that comes in tremendously handy but in the relatively few situations that it's actually called for. so, possibly it's good enough to practice the basics a bit and then just know that it's there in the toolkit.
[18:40] jfw: lesson 10 being Gales Bitcoin Wallet operation and a preview of the Gales Linux bootstrap process - sounds good.
Day changed to 2022-06-17
[13:35] caai: jfw: yes! today i am searching for, reading and testing examples of 'tar -X'
[17:43] jfw: caai: I'm finding the details of how that works to be a bit of a mess unfortunately, and poorly documented even for the full-blown GNU tar.
[17:45] jfw: but in short, it's tar -X EXCLUDE_FILENAME, where EXCLUDE_FILENAME is the path to a plain-text file listing the exclusion patterns, one per line. An exclusion pattern can be a simple path or include shell-style wildcards like *.
[17:46] jfw: however, the pattern can apply anywhere within a file's path, for instance if you list 'dev' then any directory named 'dev' will be excluded, not just /dev.
[17:48] jfw: you might think you could give it in absolute form i.e. '/dev' to match it exactly - and this *would* work except that for (misguided) security reasons or something it strips any leading slashes from the filenames when building the archive.
[17:51] jfw: a workaround I found is to start instead from './' ; for instance, 'cd /; tar -cf /mnt/something.tar .' instead of 'cd /mnt; tar -cf something.tar /'
[17:51] jfw: then the exclusions can be 'anchored' to the start of the path as in './dev'.
[17:53] jfw: the whole 'exclude' mechanism looks to be a relatively newer extension to 'tar' functionality though I can't quite imagine why it wasn't there from the start.
[17:54] jfw: (or at least, why the need for it wasn't apparent from the start.)
[18:03] jfw: caai: does that make sufficient sense?
[18:07] jfw: the exercise wasn't quite intended to involve this much digging, but so goes the gap between tidy classroom exercises and making useful & reliable things in real life.
[22:00] jfw: http://fixpoint.welshcomputing.com/2022/fixpoint-security-advisory-freefilesync-misses-host-key-check-allowing-breach-of-data-confidentiality-or-authenticity/
Day changed to 2022-06-18
[18:21] caai: jfw: thank you for the explanation. i am fairly confident about, and have successfully tested, how to exclude a single file. however, i am still a bit unclear about how to write the exclusion patterns for directories. nevertheless, i have written 2 scripts that may work. i have located an external hard drive that i will use for this project, which i will test the sysbackup script on and then
[18:21] caai: report back to you
[18:24] caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4603 - nice! good discovery!
[18:24] sourcerer: 2022-06-17 22:00:56 (#jwrd) jfw: http://fixpoint.welshcomputing.com/2022/fixpoint-security-advisory-freefilesync-misses-host-key-check-allowing-breach-of-data-confidentiality-or-authenticity/
Day changed to 2022-06-19
[14:06] caai: i have tested the system backup scripts and they do not work. please note the scripts: http://welshcomputing.com/paste/i3bf63sfng
[14:22] caai: on the other hand, i have tested the btc database backup script and it works. i just got around to testing it today because i needed to change the external hard drive's system id to 83 via 'fdisk' and create a new file system 'mkfs -t ext4'
[15:40] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4604 - directories work pretty much the same as files; if a directory is excluded then so is everything inside it
[15:40] sourcerer: 2022-06-18 18:21:24 (#jwrd) caai: jfw: thank you for the explanation. i am fairly confident about, and have successfully tested, how to exclude a single file. however, i am still a bit unclear about how to write the exclusion patterns for directories. nevertheless, i have written 2 scripts that may work. i have located an external hard drive that i will use for this project, which i will test the sysbackup script on and then
[15:42] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4606 - perhaps I missed my calling as a minesweeper.
[15:42] sourcerer: 2022-06-18 18:24:59 (#jwrd) caai: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4603 - nice! good discovery!
[15:45] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4608 - the first problem with option 1 is that -X works indirectly: it names a file which itself contains the list of excludes. This keeps the main script a bit tidier I'd say. the use of --exclude as in option 2 should work as a more direct alternative there.
[15:45] sourcerer: 2022-06-19 14:06:04 (#jwrd) caai: i have tested the system backup scripts and they do not work. please note the scripts: http://welshcomputing.com/paste/i3bf63sfng
[15:46] jfw: the second problem (which is with both) appears to be confusion about what the basic tar parameters do.
[15:50] jfw: you use -f sysbackup.tar.gz to name the output file - good, but note that's a relative path so where it ends up will depend on where you happened to run the script from! then you give /mnt/flashdrive perhaps as an attempt to fix that, but that's where tar expects to find the *source* file or directory (or multiple thereof)
[15:52] jfw: then maybe give a closer read of this example to see how it gets the paths listed in the archive to start with './' to match the exclusions.
[15:52] sourcerer: 2022-06-17 17:51:05 (#jwrd) jfw: a workaround I found is to start instead from './' ; for instance, 'cd /; tar -cf /mnt/something.tar .' instead of 'cd /mnt; tar -cf something.tar /'
[15:54] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4609 - 83 being the partition type code for Linux, where a listing of codes can be found within the fdisk menus. well done!
[15:54] sourcerer: 2022-06-19 14:22:05 (#jwrd) caai: on the other hand, i have tested the btc database backup script and it works. i just got around to testing it today because i needed to change the external hard drive's system id to 83 via 'fdisk' and create a new file system 'mkfs -t ext4'
Day changed to 2022-06-23
[00:59] jfw: http://fixpoint.welshcomputing.com/2022/freeing-windows-files-with-freefilesync/ and with some next-level alliterative pattern there.
[12:38] caai: jfw: it appears as though tar has syntax that i need to study (and use) more. based on your feedback, and a little bit of reading the documentation from the link you provided (GNU tar), i have a produced another script; option 3. please let me know if i am getting closer: http://welshcomputing.com/paste/922uj78cc3
[12:48] caai: i will download FreeFileSync and give it a try. what is the storage limit?
[13:01] caai: dorion/jfw: shall we have the next management meeting on: June 28 - 23:00 UTC?
Day changed to 2022-06-24
[16:30] jfw: caai: yes, option 3 is getting closer, in that / is used as the source, as intended for a full-system backup, and the destination will end up predictably at /mnt/flashdrive/sysbackup.tar.gz. The excludes still won't work though.
[16:40] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4624 - well, if it was unclear, the point is that it's syncing to an SFTP server running on your own Linux box; thus any limit is entirely up to you / that machine's disk space (perhaps partitioned).
[16:40] sourcerer: 2022-06-23 12:48:37 (#jwrd) caai: i will download FreeFileSync and give it a try. what is the storage limit?
[16:43] jfw: that article is aimed at the end-user (or perhaps a junior admin assisting the end-user) in a scenario where we're administering the server for them, so the details of the server-side setup are omitted. TBD if we'll publish that part too.
[16:45] jfw: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4625 works for me, and I'll check if jwm would like to jump on for a bit.
[16:45] sourcerer: 2022-06-23 13:01:36 (#jwrd) caai: dorion/jfw: shall we have the next management meeting on: June 28 - 23:00 UTC?
[17:05] jfw: ohai jwm. want to join the JWRD board meeting with Chad next Tuesday at 7pm local to discuss sales & strategy?
[17:06] jwm: for sure - sign me up.
[17:07] jfw: probably simplest to just sit at the same Zoom terminal.
[22:13] jwm: seedsigner.com
[22:13] jwm: interesting raspberry pi project
Day changed to 2022-06-25
[16:01] caai: jfw: message received about the 'excludes'. i am working on option #4.
[16:07] caai: in regards to FreeFileSync, understood. i will read more
[20:19] jfw: jwm: to me it just looks like - how to put it - some nail polish artists picked up a gadget from walmart and worked out how to paint "secure" on it in semi-gloss. but I dunno, what do you see interesting about it?
[20:23] jfw: perhaps that they're catching on to "airgap" as a magic keyword, only nine years late to the party
[20:24] jfw: https://seedsigner.com << as a proper link ftr
[20:24] jfw: not that there's anything proper about https
Day changed to 2022-06-26
[17:23] dorion: http://fixpoint.welshcomputing.com/2022/jwrd-logs-for-Jun-2022/#4625 -- works for me.
[17:23] sourcerer: 2022-06-23 13:01:36 (#jwrd) caai: dorion/jfw: shall we have the next management meeting on: June 28 - 23:00 UTC?
Day changed to 2022-06-27
[14:23] caai: jfw: good morning. please note script option #4: http://welshcomputing.com/paste/aaeshpxtp8
[14:26] caai: i imagine the path to the exclude file needs to be absolute. therefore, http://welshcomputing.com/paste/gycwnajgux
[14:28] caai: please note the accompanying file '/home/user/scripts/EXCLUDE_sysbak.txt': http://welshcomputing.com/paste/6jkfrd766u
[17:51] jfw: caai: you've changed the form by which the exclude list is provided, but the substance is unchanged. actually it's more broken now because the exclude file isn't itself a shell script, just a list, so the "cd /;" and possibly even the comment will end up taken literally as paths.
[17:52] jfw: on the upside, looks like you got the linkage from the script to the exclude file correct.
[17:53] jfw: I'd say fix that exclude file then just give it a try, perhaps the problem will then become clearer.
[17:54] jfw: pay attention to the 'tar' output as it goes (or redirect to a log for later inspection)
Day changed to 2022-06-29
[19:57] jfw: caai, dorion: perhaps an obvious point but it occured to me, regarding why the training is so hard to sell: we're going for people who've worked hard for their money, i.e. been quite successfully distracted from what mattered in the world by their saltmine overlords and/or own stupidity, or else started out rich or otherwise lucked into riches, yet got themselves distracted just the same; so by
[19:57] jfw: pointing out that they need to get off ass and lift a finger or two of their own to protect that money from a hostile environment, we're basically asking them to accept the reality of their total, abject failure to do or even correctly perceive what they needed to be doing for the past however many years. Thus, "saving them time & money on the path" may be perfectly true in theory, psychologically
[19:57] jfw: we're perceived as the ones imposing a massive cost.
[19:58] jfw: *Thus, while ...
[20:07] jfw: as per ye olde Naggum on lisp advocacy misadventures : " They would not /use/ C if they understood this point, so if you actually cause them to understand it in the course of a discussion, you will only make them miserable and hate their lives. People are pretty good at detecting that this is a likely
[20:07] jfw: outcome of thinking, and it takes conscious effort to brace yourself and get through such experiences. Most people are not willing even to /listen/ to arguments or information that could threaten their comfortable view of their own existence, much less think about it, so when you cannot answer a C programmer's "arguments" that his way of life is just great the way it is, it is a pretty good sign
[20:07] jfw: that you let him set the agenda once he realized that his way of life was under threat."